Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika
Vol. 8 No. 1 April 2022

Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis

Rindu Hafil Muhammadi (Unknown)
Tri Ginanjar Laksana (Institut Teknologi Telkom Purwokerto)
Amalia Beladinna Arifa (Institut Teknologi Telkom Purwokerto)



Article Info

Publish Date
10 Mar 2022

Abstract

Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%.

Copyrights © 2022






Journal Info

Abbrev

khif

Publisher

Subject

Computer Science & IT

Description

Khazanah Informatika: Jurnal Ilmiah Komputer dan Informatika, an Indonesian national journal, publishes high quality research papers in the broad field of Informatics and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...