Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI)
Vol 9, No 4 (2023): December

Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets

Muhammad Misdram (Universitas Dian Nuswantoro)
Edi Noersasongko (Universitas Dian Nuswantoro)
Purwanto Purwanto (Universitas Dian Nuswantoro)
Muljono Muljono (Universitas Dian Nuswantoro)
Fandi Yulian Pamuji (Merdeka University)



Article Info

Publish Date
16 Oct 2023

Abstract

The problem of dataset imbalance needs special handling, because it often creates obstacles to the classification process. A very important problem in classification is to overcome a decrease in classification performance. There have been many published researches on the topic of overcoming dataset imbalances, but the results are still unsatisfactory. This is proven by the results of the average accuracy increase which is still not significant. There are several common methods that can be used to deal with dataset imbalances. For example, oversampling, undersampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, Adasyn, Cluster-SMOTE methods. These methods in testing the results of the classification accuracy average are still relatively low. In this research the selected dataset is a medical dataset which is classified as a small dataset of less than 200 records. The proposed method is Gaussian Based-SMOTE which is expected to work in a normal distribution and can determine excess samples for minority classes. The Gaussian Based-SMOTE method is a contribution of this research and can produce better accuracy than the previous research. The way the Gaussian Based-SMOTE method works is to start by determining the random location of synthesis candidates, determining the Gaussian distribution. The results of these two methods are substituted to produce perfect synthetic values. Generated synthetic values are combined with SMOTE sampling of the majority data from the training data, produce balanced data. The result of the balanced data classification trial from the influence of the Gaussian Based SMOTE result in a significant increase in accuracy values of 3% on average.

Copyrights © 2023






Journal Info

Abbrev

JITEKI

Publisher

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

JITEKI (Jurnal Ilmiah Teknik Elektro Komputer dan Informatika) is a peer-reviewed, scientific journal published by Universitas Ahmad Dahlan (UAD) in collaboration with Institute of Advanced Engineering and Science (IAES). The aim of this journal scope is 1) Control and Automation, 2) Electrical ...