Lya Hulliyyatus Suadaa
Politeknik Statistika STIS

Published : 8 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 8 Documents
Search

Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language Lya Hulliyyatus Suadaa; Ibnu Santoso; Amanda Tabitha Bulan Panjaitan
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 15, No 3 (2021): July
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.66205

Abstract

Nowadays, internet has become the most popular source of news. However, the validity of the online news articles is difficult to assess, whether it is a fact or a hoax. Hoaxes related to Covid-19 brought a problematic effect to human life. An accurate hoax detection system is important to filter abundant information on the internet.  In this research, a Covid-19 hoax detection system was proposed by transfer learning of pre-trained transformer models. Fine-tuned original pre-trained BERT, multilingual pre-trained mBERT, and monolingual pre-trained IndoBERT were used to solve the classification task in the hoax detection system. Based on the experimental results, fine-tuned IndoBERT models trained on monolingual Indonesian corpus outperform fine-tuned original and multilingual BERT with uncased versions. However, the fine-tuned mBERT cased model trained on a larger corpus achieved the best performance.
PENGUKURAN TINGKAT KEMIRIPAN DOKUMEN BERBASIS CLUSTER Ibnu Santoso; Lya Hulliyyatus Suadaa
KLIK- KUMPULAN JURNAL ILMU KOMPUTER Vol 6, No 1 (2019)
Publisher : Lambung Mangkurat University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20527/klik.v6i1.181

Abstract

Document similarity can be measured and used to discover other similar documents in a document collection (corpus). In a small corpus, measuring document similarity is not a problem. In a bigger corpus, comparing similarity rate between documents can be time consuming. A clustering method can be used to minimize number of document collection that has to be compared to a document to save time. This research is aimed to discover the effect of clustering technique in measuring document similarity and evaluate the performance. Corpus used was undergraduate thesis of Politeknik Statistika STIS students from year 2007-2016 as many as 2.049 documents. These documents were represented as bag of words model and clustered using k-means clustering method. Measurement of similarity used is Cosine similarity. From the simulation, clustering process for 3 clusters needs longer preparation time (17,32%) but resulting in faster query processing (77,88%) with accuracy of 0,98. Clustering process for 5 clusters needs longer preparation time (31,10%) but resulting in faster query processing (83,79%) with accuracy of 0,86. Clustering process for 7 clusters needs longer preparation time (45,10%) but resulting in faster query processing (85,30%) with accuracy of 0,98.
Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta) Rizka Maulida Yanti; Ibnu Santoso; Lya Hulliyyatus Suadaa
Indonesian Journal of Information Systems Vol. 4 No. 1 (2021): August 2021
Publisher : Program Studi Sistem Informasi Universitas Atma Jaya Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24002/ijis.v4i1.4677

Abstract

SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.
Aspect-Based Sentiment Analysis in Bromo Tengger Semeru National Park Indonesia Based on Google Maps User Reviews Cynthia As Bahri; Lya Hulliyyatus Suadaa
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 17, No 1 (2023): January
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.77354

Abstract

Technology can influence and shape a person's behavior patterns when planning tours, traveling, and after traveling. Visitors' reviews can be used as evaluation material to improve the quality of tourist destinations and become a determining factor for other tourists to visit or revisit the destinations. The process of utilizing these reviews can be done by assessing the aspects of tourist destinations based on reviews from visitors. This study aims to conduct an aspect-based sentiment analysis on one of the tourist destinations in Indonesia, namely Bromo Tengger Semeru National Park, based on reviews of Google Maps users. The aspects consist of attractions, facilities, access, and price. The sentiment classification model used is a machine learning model consisting of SVM, Complement Naïve Bayes, Logistic Regression, and transfer learning from pre-trained BERT, IndoBERT, and mBERT. Based on the experimental results, transfer learning from the IndoBERT model achieved the best performance with accuracy and F1-Score of 91.48% and 71.56%, respectively. In addition, among the machine learning models used, the SVM model gives the best results with an accuracy of 89.16% and an F1-Score of 62.23%.
A Sentiment Analysis and Topic Modelling of The Socio-Economic Registration 2022 Indah Simbolon; Nicholas H Manurung; Sukma Andini; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.301

Abstract

Socio-Economic Registration or Regsosek is an activity of Statistics Indonesia (BPS) that aims to collect data related to the profile, social and economic conditions, and welfare levels of all residents in 514 regencies/cities in Indonesia. One indicator of the success of Regsosek 2022 is the response and opinion from the community regarding the activity. The response and opinion can provide an overview of the implementation of Regsosek 2022 so that the picture can be used as a lesson learned to carry out the following population data collection. This study uses several methods to analyze the results of community responses and opinions on Regsosek activities, especially on Twitter social media. The method used in this research is sentiment analysis classification with four techniques: Naïve Bayes, Nearest Centroid, K-Nearest Neighbors, and Support Vector Machine. Then, the performance of the four techniques will be compared. In addition, the topic modeling method will also be used with two techniques, namely Latent Semantic Analysis and Latent Dirichlet Allocation. Data is collected using web scraping techniques. The results obtained from the sentiment analysis classification are that the Nearest Centroid method provides the best results with a relatively high and balanced f1-score value in positive and negative sentiments, which are 59% and 66%, respectively. Moreover, LDA modeling results are better than the LSA method for topic modeling results.
Automated Indonesian Text Augmentation with Web-Based Application Using Flask Framework Iftitah Athiyyah Rahma; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.324

Abstract

In real world, data and resources available for text classification are limited. One of issues on labelled data is imbalanced data. Problem of imbalanced data affects performance and accuracy of model because the model only focuses on data with majority label. Therefore, the measure of model accuracy cannot describe the true quality of model. To overcome this, an oversampling approach is carried out. Text-based oversampling is known as text augmentation. However, NLP resources for Indonesian, especially in performing text augmentation, are still limited. Therefore, this research conducts development of a web application to augment Indonesian text automatically. The application was bulit using prototype method. The application was successfully built and can facilitate users to perform augmentation automatically for all texts in the dataset. Users can select preferred augmentation technique and are required to upload datasets as input. The output of application is same dataset file as input with an additional column containing synthetic text augmented by the application. This application can contribute to further research in performing text augmentation for Indonesians.
Comparison of Naive Bayes, K-Nearest Neighbor, and Support Vector Machine Classification Methods in Semi-Supervised Learning for Sentiment Analysis of Kereta Cepat Jakarta Bandung (KCJB) Muhammad Farhan; Renata De La Rosa Manik; Hana Raihanatul Jannah; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.332

Abstract

Transportation technology has developed very rapidly in the 21st century; one of them is high-speed trains. Currently, the Indonesian government is implementing the construction of the Kereta Cepat Jakarta-Bandung (KCJB) project in collaboration with China. The construction of this fast train project has attracted various comments and opinions from the public on Twitter and social media. This research aims to compare the classification methods of Naïve Bayes, K-Nearest Neighbor (K-NN), and Support Vector Machine (SVM) in classifying sentiment in tweets about high-speed trains obtained by scraping Twitter. The comparison process was carried out using semi-supervised learning, and the results showed that the semi-supervised SVM model had the best performance with an average accuracy of 86%, followed by the semi-supervised Naïve Bayes model and semi-supervised K-NN with an average accuracy of 81% and 58% respectively. Overall, the prediction results from the three models conclude that there are more tweets with negative sentiment than tweets with positive and neutral sentiment.
Sentiment Classification of Community towards COVID-19 Issues on Twitter (Case Study: Indonesia, March-May 2020) Nur Ainun Daulay; Rifqi Ramadhan; Lya Hulliyyatus Suadaa
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2023 No. 1 (2023): Proceedings of 2023 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2023i1.360

Abstract

This study examines sentiment analysis related to COVID-19 in Indonesia (March-May 2020) using InSet Lexicon as training data in supervised machine learning models. The dataset comprises 7,967 tweets, divided into 90% training data and 10% testing data. The results reveal that Support Vector Machine (SVM) and Random Forest (RF) are the most effective methods, achieving accuracy above 80%, with SVM reaching 87% and RF at 86%. InSet Lexicon itself attains an accuracy of 75%, a macro average of 69%, and a weighted average of 74%, making it an effective alternative for large-scale data labeling. Research recommendations support further development of InSet Lexicon for sentiment classification and expansion of the lexicon for foreign languages to enhance sentiment analysis accuracy in a global context. This study provides valuable insights into understanding public sentiment regarding crucial issues such as COVID-19 in Indonesia.