Journal of Data Science and Software Engineering
Vol 1 No 01 (2020)

IMPLEMENTASI TEKNIK PENDEKATAN LEVEL DATA UNTUK MENYELESAIKAN KASUS DATA TIDAK SEIMBANG PADA KLASIFIKASI CACAT SOFTWARE

Hanif Rahardian (ULM)
Mohammad Reza Faisal (ULM)
Friska Abadi (ULM)
Radityo Adi Nugroho (ULM)
Rudy Herteno (ULM)



Article Info

Publish Date
29 Jun 2020

Abstract

Defects can cause significant software rework, delays, and high costs, to prevent disability it must be predictable the possibility of defects. To predict the disability the metrics software dataset is used. NASA MDP is one of the popular software metrics used to predict software defects by having 13 datasets and is generally unbalanced. The reward in the dataset can reduce the prediction of software defects because more unbalanced data produces a majority class. Data imbalance can be handled with 2 approaches, namely the data level approach technique and the algorithm level approach technique. The data level approach technique aims to improve class distribution by using resampling and data synthesis techniques. This research proposes a data level approach using resampling techniques, namely Random Oversampling (ROS), Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), Tomek Link (TL) and One-Sided Selection (OSS) which are classified with Naïve Bayes was also validated using 10 Fold Cross-Validation, then evaluated with the Area Under ROC Curve (AUC). Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587. Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587. Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587.

Copyrights © 2020






Journal Info

Abbrev

integer

Publisher

Subject

Computer Science & IT

Description

Journal of Data Science and Software Engineering adalah jurnal yang dikelola oleh program studi Ilmu Komputer Universitas Lambung Mangkurat untuk mempublikasikan artikel ilmiah mahasiswa tugas akhir. Terbit tiga kali dalam ...