JOIV : International Journal on Informatics Visualization
Vol 7, No 1 (2023)

Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link

Hairani Hairani (Universitas Bumigora, Mataram, 83127, Indonesia)
Anthony Anggrawan (Universitas Bumigora, Mataram, 83127, Indonesia)
Dadang Priyanto (Universitas Bumigora, Mataram, 83127, Indonesia)



Article Info

Publish Date
28 Feb 2023

Abstract

Most of the health data contained unbalanced data that affected the performance of the classification method. Unbalanced data causes the classification method to classify the majority data more and ignore the minority class. One of the health data that has unbalanced data is Pima Indian Diabetes. Diabetes is a deadly disease caused by the body's inability to produce enough insulin. Complications of diabetes can cause heart attacks and strokes. Early diagnosis of diabetes is needed to minimize the occurrence of more severe complications. In the diabetes dataset used, there is an imbalanced data between positive and negative diabetes classes. Diabetes negative class data (500 data) is more than diabetes positive class (268), so it can affect the performance of the classification method. Therefore, this study aims to apply the Smote-Tomeklink and Random Forest methods in the classification of diabetes. The research methodology used is the collection of diabetes data obtained from Kaggle, as many as 768 data with eight input attributes and 1 output attribute as a class, pre-processing data is used to balance the dataset with Smote-Tomeklink, classification using the random forest method, and performance evaluation based on accuracy, sensitivity, precision, and F1-score. Based on the tests conducted by dividing data using 10-fold cross-validation, the Random Forest algorithm with Smote-TomekLink gets the highest accuracy, sensitivity, precision, and F1-score compared to Random Forest with Smote. The Random Forest algorithm with Smote-Tomeklink has 86.4% accuracy, 88.2% sensitivity, 82.3% precision, and 85.1% F1-score. Thus, using Smote-Tomeklink can improve the performance of the random forest method based on accuracy, sensitivity, precision, and F1-score.

Copyrights © 2023






Journal Info

Abbrev

joiv

Publisher

Subject

Computer Science & IT

Description

JOIV : International Journal on Informatics Visualization is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of Computer Science, Computer Engineering, Information Technology and Visualization. The journal publishes state-of-art ...