Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Applied Data Sciences

Active learning on Indonesian Twitter sentiment analysis using uncertainty sampling Muhaza Liebenlito; Nur Inayah; Esti Choerunnisa; Taufik Edy Sutanto; Suma Inna
Journal of Applied Data Sciences Vol 5, No 1: JANUARY 2024
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v5i1.144

Abstract

Nowadays, sentiment analysis research in social media is rapidly developing. Sentiment analysis typically falls under supervised learning, which requires annotating data. However, the annotation process for sentiment analysis tasks is notoriously time-consuming. Fortunately, an effective strategy to overcome this challenge has emerged, known as active learning. Active learning involves labeling only a small subset of the dataset, leaving the rest for annotation through sampling strategies. This study focuses on comparing two active learning strategies: random sampling and boundary sampling. These strategies are applied to machine learning models such as logistic regression and random forests. In addition, we present an evaluation of the model performance and data savings achieved by implementing these strategies in the context of traditional machine learning for sentiment analysis on Twitter. The dataset considered consists of two labels: positive and negative sentiments. The results of our investigation show that active learning can significantly reduce the amount of training data required, saving up to 65% of the total training data required to achieve peak model accuracy. The most successful model identified uses a random forest with a margin sampling strategy, yielding an accuracy of 81.12% and an F1 score of 88.60%. This research highlights the effectiveness of active learning strategies in sentiment analysis, demonstrating their potential to improve model performance and resource efficiency. The results underscore the viability of employing active learning methods, particularly the combination of random forest models with margin sampling, for more efficient sentiment analysis in social media.