Kartikasari Kusuma Agustiningsih
Universitas Amikom Yogyakarta

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter Using Pre-Trained and Self-Training Word Embeddings Kartikasari Kusuma Agustiningsih; Ema Utami; Muhammad Altoumi Alsyaibani
Jurnal Ilmu Komputer dan Informasi Vol 15, No 1 (2022): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21609/jiki.v15i1.1044

Abstract

Sentiment analysis regarding the COVID-19 vaccine can be obtained from social media because users usually express their opinions through social media. One of the social media that is most often used by Indonesian people to express their opinion is Twitter. The method used in this research is Bidirectional LSTM which will be combined with word embedding. In this study, fastText and GloVe were tested as word embedding. We created 8 test scenarios to inspect performance of the word embeddings, using both pre-trained and self-trained word embedding vectors. Dataset gathered from Twitter was prepared as stemmed dataset and unstemmed dataset. The highest accuracy from GloVe scenario group was generated by model which used self-trained GloVe and trained on unstemmed dataset. The accuracy reached 92.5%. On the other hand, the highest accuracy from fastText scenario group generated by model which used self-trained fastText and trained on stemmed dataset. The accuracy reached 92.3%. In other scenarios that used pre-trained embedding vector, the accuracy was quite lower than scenarios that used self-trained embedding vector, because the pre-trained embedding data was trained using the Wikipedia corpus which contains standard and well-structured language while the dataset used in this study came from Twitter which contains non-standard sentences. Even though the dataset was processed using stemming and slang words dictionary, the pre-trained embedding still can not recognize several words from our dataset.
Sentiment Analysis and Topic Modelling of The COVID-19 Vaccine in Indonesia on Twitter Social Media Using Word Embedding Kartikasari Kusuma Agustiningsih; Ema Utami; Omar Muhamammad Altoumi Alsyaibani
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol 8, No 1 (2022): March
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v8i1.23009

Abstract

This study aims to analyze the sentiments of the Indonesian people towards the COVID-19 vaccine on Twitter. Data collection was carried out from September 2020 to June 2021 with the keyword "covid vaccine," which resulted in 262306 tweets. After filtering and cleaning, there are 83384 tweets left. The labeling process was done manually by an expert. The label composition in the data is 35209 tweets of positive sentiment, 41596 tweets of neutral sentiment, and 6579 tweets of negative sentiment. The remaining data is preprocessed using case folding, removing punctuation, stopword removal, stemming, and the application of slang words. The highest number of tweets appeared in January 2021, after Joko Widodo became the first person in Indonesia to receive a vaccine injection. The number of tweets reached 23492 tweets. At the topic modeling stage, measurements were conducted using the Coherence Score. The distribution of the optimal number of topics is 3 topics. The first topic, with a token percentage value of 51.8%, leads to positive sentiment, while the second and third topics, with token percentage values of 24.5% and 23.7%, lead to neutral sentiment. Bidirectional LSTM architecture was implemented to perform sentiment classification. Fasttext and GloVe word embedding was tested to vectorize tweet data. The test accuracy generated by Fasttext word embedding reached 75,7690%, while the test accuracy produced with GloVe word embedding reached 74.7017%. The usage of slang words could not increase the test accuracy in this study. The use of the Modelcheckpoint to monitor model performance during training could produce a model with a slightly higher test accuracy, about 1.07% (in scenario 1 and scenario 6), compared to a model whose performance was monitored using Early Stopping. In future research, it can be tried to apply a lower learning rate to produce better accuracy in a large number of epochs, or it could be by changing the dropout parameter.