Indonesian Journal of Statistics and Its Applications
Vol 5 No 1 (2021)

Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter: Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter

Muhammad Ilham Abidin (Department of Statistics, IPB University, Indonesia)
Khairil Anwar Notodiputro (Department of Statistics, IPB University, Indonesia)
Bagus Sartono (Department of Statistics, IPB University, Indonesia)



Article Info

Publish Date
31 Mar 2021

Abstract

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.

Copyrights © 2021






Journal Info

Abbrev

ijsa

Publisher

Subject

Computer Science & IT Mathematics Other

Description

Indonesian Journal of Statistics and Its Applications (eISSN:2599-0802): diterbitkan berkala 2 (dua) kali dalam setahun yang memuat tulisan ilmiah yang berhubungan dengan bidang statistika dan aplikasinya. Artikel yang dimuat berupa hasil penelitian bidang statistika dan aplikasinya dengan topik ...