Mohamad Nurkamal Fauzan
Advanced and Creative Networks Research Center, Telkom University

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

ETLE Sentiment Analysis Performance Increasement with TF-IDF, MDI Feature Selection, and SVM Muhammad Syiarul Amrullah; Aji Gautama Putrada; Mohamad Nurkamal Fauzan; Nur Alamsyah
Sistemasi: Jurnal Sistem Informasi Vol 13, No 4 (2024): Sistemasi: Jurnal Sistem Informasi
Publisher : Program Studi Sistem Informasi Fakultas Teknik dan Ilmu Komputer

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32520/stmsi.v13i4.2701

Abstract

In Indonesia, the government, through the Indonesian National Police (POLRI), has just released a new regulation, the Electronic Traffic Law Enforcement (ETLE). A traffic ticket policy is carried out electronically through camera monitoring connected directly to the vehicle registration certificates (STNK) database. The government can measure people's likes or dislikes of these public policies through sentiment analysis. There have been studies that have applied sentiment analysis to find out people's responses to ETLE. However, in terms of performance, this model only has an accuracy of 0.42. This study proposes the use of a support vector machine (SVM), term frequency-inversed document frequency (TF-IDF), and mean decrease in impurity (MDI) to evaluate polarization sentiment analysis on ETLE policies. First, we retrieve tweets about ETLE from Twitter. Then we do text analysis pre-processing and the remove stop words process. The next step is to carry out the TF-IDF process. We apply two feature selection methods for our comparison: MDI and recurrent feature elimination (RFE). Next, we compare two classification models, namely naïve Bayes and SVM. Some  of the metrics that we use to evaluate the pre-processing stage are the probability density function (PDF) and the t-test. We use the bag of words (BoW) to evaluate the remove stop words stage. Finally, sensitivity, specificity, and the receiver operating curve (ROC) are for evaluating feature selection methods and classification methods. The test results show that TF-IDF produces 1,022 new features. The combination of the methods we used resulted in the six models we compared. SVM+TF-IDF+MDI is the model with the best performance compared to the other five models. Accuracy and area under curve (AUC) scores are 0.99 and 0.97, respectively.