Luthfi Rakan Nabila
Institut Teknologi Telkom Purwokerto

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : International Journal of Engineering and Computer Science Applications (IJECSA)

Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification Muhammad Abdul Latief; Luthfi Rakan Nabila; Wildan Miftakhurrahman; Saihun Ma'rufatullah; Henri Tantyoko
International Journal of Engineering and Computer Science Applications (IJECSA) Vol 3 No 1 (2024): March 2024
Publisher : Universitas Bumigora Mataram-Lombok

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/ijecsa.v3i1.3758

Abstract

The classification problem is one instance of a problem that is typically handled or resolved using machine learning. When there is an imbalance in the classes within the data, machine learning models have a tendency to overclassify a greater number of classes. The model will have low accuracy in a few classes and high accuracy in many classes as a result of the issue. The majority of the data has the same number of classes, but if the difference is too great, it will differ. The issue of data imbalance is also evident in the data on lung cancer, where there are 283 positive classes and negative classes 38. Therefore, this research aims to use a hybrid sampling technique, combining Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) and Random Forest, to balance the data of lung cancer patients who experience class imbalance. This research method involves the SMOTE-ENN preprocessing method to balance the data and the Random Forest method is used as a classification method to predict lung cancer by dividing training data and testing 10-fold cross validation. The results of this study show that using SMOTE-ENN with Random Forest has the best performance compared to SMOTE and without oversampling on all metrics used. The conclusion is using the SMOTE-ENN hybrid sampling technique with the Random Forest model significantly improves the model's ability to identify and classify data.