Ariani Indrawati
Lembaga Ilmu Pengetahuan Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION Ariani Indrawati; Hendro Subagyo; Andre Sihombing; Wagiyah Wagiyah; Sjaeful Afandi
BACA: JURNAL DOKUMENTASI DAN INFORMASI Vol 41, No 2 (2020): DESEMBER
Publisher : Pusat Data dan Dokumentasi Ilmiah – Lembaga Ilmu Pengetahuan Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.14203/j.baca.v41i2.702

Abstract

The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.
PENERAPAN TEKNIK KOMBINASI OVERSAMPLING DAN UNDERSAMPLING UNTUK MENGATASI PERMASALAHAN IMBALANCED DATASET Ariani Indrawati
JIKO (Jurnal Informatika dan Komputer) Vol 4, No 1 (2021)
Publisher : Journal Of Informatics and Computer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33387/jiko.v4i1.2561

Abstract

Salah satu permasalahan pada machine learning yang cukup sering terjadi adalah ketidakseimbangan data yang digunakan atau sering disebut dengan imbalanced dataset. Cukup banyak penelitian yang melaporkan bahwa imbalanced dataset ini seringkali memberikan hasil yang keliru. Perlu ada penanganan khusus sebelum imbalanced dataset tersebut dapat digunakan pada machine learning. Cara paling populer dan efektif dalam mengatasi permasalahan imbalanced dataset adalah melakukan resampling, baik oversampling, undersampling, ataupun kombinasi keduanya. Pada penelitian ini akan dilakukan uji coba teknik kombinasi dengan menggabungkan teknik oversampling Synthetic Minority Oversampling Technique (SMOTE) dengan teknik undersampling Edited Nearest Neighbors (ENN) dan TomekLinks terhadap Support Vector Machine (SVM). Tiga public dataset UCI yaitu Breast Cancer Wisconsin, Pima Indian Diabetes, dan Heart Disease Detection digunakan pada penelitian ini dengan Python sebagai alat bantu pemrograman. Berdasarkan hasil uji coba yang dilakukan diketahui bahwa teknik kombinasi dapat membantu mengatasi permasalahan imbalanced dataset pada machine learning, SMOTE-ENN dapat meningkatkan performa akurasi dari SVM sebesar 2% hingga 23%.