International Journal of Engineering and Computer Science Applications (IJECSA)
Vol 3 No 1 (2024): March 2024

Handling Imbalance Data using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification

Muhammad Abdul Latief (Institut Teknologi Telkom Purwokerto)
Luthfi Rakan Nabila (Institut Teknologi Telkom Purwokerto)
Wildan Miftakhurrahman (Institut Teknologi Telkom Purwokerto)
Saihun Ma'rufatullah (Institut Teknologi Telkom Purwokerto)
Henri Tantyoko (Institut Teknologi Telkom Purwokerto)



Article Info

Publish Date
04 Feb 2024

Abstract

The classification problem is one instance of a problem that is typically handled or resolved using machine learning. When there is an imbalance in the classes within the data, machine learning models have a tendency to overclassify a greater number of classes. The model will have low accuracy in a few classes and high accuracy in many classes as a result of the issue. The majority of the data has the same number of classes, but if the difference is too great, it will differ. The issue of data imbalance is also evident in the data on lung cancer, where there are 283 positive classes and negative classes 38. Therefore, this research aims to use a hybrid sampling technique, combining Synthetic Minority Over-sampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) and Random Forest, to balance the data of lung cancer patients who experience class imbalance. This research method involves the SMOTE-ENN preprocessing method to balance the data and the Random Forest method is used as a classification method to predict lung cancer by dividing training data and testing 10-fold cross validation. The results of this study show that using SMOTE-ENN with Random Forest has the best performance compared to SMOTE and without oversampling on all metrics used. The conclusion is using the SMOTE-ENN hybrid sampling technique with the Random Forest model significantly improves the model's ability to identify and classify data.

Copyrights © 2024






Journal Info

Abbrev

IJECSA

Publisher

Subject

Computer Science & IT

Description

Description of Journal : The International Journal of Engineering and Computer Science Applications (IJECSA) is a scientific journal that was born as a forum to facilitate scientists, especially in the field of computer science, to publish their research papers. The 12th of the 12th month of 2021 is ...