Lya Hulliyyatus Suadaa
Politeknik Statistika STIS

Published : 8 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 4 Documents
Search
Journal : PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS

A Sentiment Analysis and Topic Modelling of The Socio-Economic Registration 2022 Indah Simbolon; Nicholas H Manurung; Sukma Andini; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.301

Abstract

Socio-Economic Registration or Regsosek is an activity of Statistics Indonesia (BPS) that aims to collect data related to the profile, social and economic conditions, and welfare levels of all residents in 514 regencies/cities in Indonesia. One indicator of the success of Regsosek 2022 is the response and opinion from the community regarding the activity. The response and opinion can provide an overview of the implementation of Regsosek 2022 so that the picture can be used as a lesson learned to carry out the following population data collection. This study uses several methods to analyze the results of community responses and opinions on Regsosek activities, especially on Twitter social media. The method used in this research is sentiment analysis classification with four techniques: Naïve Bayes, Nearest Centroid, K-Nearest Neighbors, and Support Vector Machine. Then, the performance of the four techniques will be compared. In addition, the topic modeling method will also be used with two techniques, namely Latent Semantic Analysis and Latent Dirichlet Allocation. Data is collected using web scraping techniques. The results obtained from the sentiment analysis classification are that the Nearest Centroid method provides the best results with a relatively high and balanced f1-score value in positive and negative sentiments, which are 59% and 66%, respectively. Moreover, LDA modeling results are better than the LSA method for topic modeling results.
Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework Iftitah Athiyyah Rahma; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.324

Abstract

In real world, data and resources available for text classification are limited. One of issues on labelled data is imbalanced data. Problem of imbalanced data affects performance and accuracy of model because the model only focuses on data with majority label. Therefore, the measure of model accuracy cannot describe the true quality of model. To overcome this, an oversampling approach is carried out. Text-based oversampling is known as text augmentation. However, NLP resources for Indonesian, especially in performing text augmentation, are still limited. Therefore, this research conducts development of a web application to augment Indonesian text automatically. The application was bulit using prototype method. The application was successfully built and can facilitate users to perform augmentation automatically for all texts in the dataset. Users can select preferred augmentation technique and are required to upload datasets as input. The output of application is same dataset file as input with an additional column containing synthetic text augmented by the application. This application can contribute to further research in performing text augmentation for Indonesians.
Comparison of Naive Bayes, K-Nearest Neighbor, and Support Vector Machine Classification Methods in Semi-Supervised Learning for Sentiment Analysis of Kereta Cepat Jakarta Bandung (KCJB) Muhammad Farhan; Renata De La Rosa Manik; Hana Raihanatul Jannah; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.332

Abstract

Transportation technology has developed very rapidly in the 21st century; one of them is high-speed trains. Currently, the Indonesian government is implementing the construction of the Kereta Cepat Jakarta-Bandung (KCJB) project in collaboration with China. The construction of this fast train project has attracted various comments and opinions from the public on Twitter and social media. This research aims to compare the classification methods of Naïve Bayes, K-Nearest Neighbor (K-NN), and Support Vector Machine (SVM) in classifying sentiment in tweets about high-speed trains obtained by scraping Twitter. The comparison process was carried out using semi-supervised learning, and the results showed that the semi-supervised SVM model had the best performance with an average accuracy of 86%, followed by the semi-supervised Naïve Bayes model and semi-supervised K-NN with an average accuracy of 81% and 58% respectively. Overall, the prediction results from the three models conclude that there are more tweets with negative sentiment than tweets with positive and neutral sentiment.
Sentiment Classification of Community towards COVID-19 Issues on Twitter (Case Study: Indonesia, March-May 2020) Nur Ainun Daulay; Rifqi Ramadhan; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.360

Abstract

This study examines sentiment analysis related to COVID-19 in Indonesia (March-May 2020) using InSet Lexicon as training data in supervised machine learning models. The dataset comprises 7,967 tweets, divided into 90% training data and 10% testing data. The results reveal that Support Vector Machine (SVM) and Random Forest (RF) are the most effective methods, achieving accuracy above 80%, with SVM reaching 87% and RF at 86%. InSet Lexicon itself attains an accuracy of 75%, a macro average of 69%, and a weighted average of 74%, making it an effective alternative for large-scale data labeling. Research recommendations support further development of InSet Lexicon for sentiment classification and expansion of the lexicon for foreign languages to enhance sentiment analysis accuracy in a global context. This study provides valuable insights into understanding public sentiment regarding crucial issues such as COVID-19 in Indonesia.