Sukma Andini
Politeknik Statistika STIS

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS

A Sentiment Analysis and Topic Modelling of The Socio-Economic Registration 2022 Indah Simbolon; Nicholas H Manurung; Sukma Andini; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.301

Abstract

Socio-Economic Registration or Regsosek is an activity of Statistics Indonesia (BPS) that aims to collect data related to the profile, social and economic conditions, and welfare levels of all residents in 514 regencies/cities in Indonesia. One indicator of the success of Regsosek 2022 is the response and opinion from the community regarding the activity. The response and opinion can provide an overview of the implementation of Regsosek 2022 so that the picture can be used as a lesson learned to carry out the following population data collection. This study uses several methods to analyze the results of community responses and opinions on Regsosek activities, especially on Twitter social media. The method used in this research is sentiment analysis classification with four techniques: Naïve Bayes, Nearest Centroid, K-Nearest Neighbors, and Support Vector Machine. Then, the performance of the four techniques will be compared. In addition, the topic modeling method will also be used with two techniques, namely Latent Semantic Analysis and Latent Dirichlet Allocation. Data is collected using web scraping techniques. The results obtained from the sentiment analysis classification are that the Nearest Centroid method provides the best results with a relatively high and balanced f1-score value in positive and negative sentiments, which are 59% and 66%, respectively. Moreover, LDA modeling results are better than the LSA method for topic modeling results.