The COVID-19 pandemic has seen a marked increase in the spread of misinformation throughout various media channels, most notably social media. This is particularly true of Indonesia where a combination of middling digital literacy and the slow speed of fact-checking contributes to the continued spread of misinformation. Many of the solutions proposed by other researchers to address this problem do not use transformers despite the existence of Indonesian language BERT models. Thus, in order to both provide a potential solution to the problem of misinformation as well as a baseline for future research we propose an IndoBERT-based model for detecting misinformation in Indonesian language Tweets. For model training, we use the "small" version of the MuMiN dataset which is a comprehensive multi-lingual dataset containing fact checked Tweets. The authors of MuMiN provide a baseline LaBSE model which achieves a macro average F1-score of 54.5% when trained on the MuMiN "small" dataset. We train and evaluate our proposed model on this dataset in order to compare it to the LaBSE model. We also train and evaluate our model on a subset of the dataset containing only Tweets related to COVID-19 that we first translate into Indonesian. Our model achieves a best macro average F1-score of 59.5% on the MuMiN dataset and 79.04% on the subset.
Copyrights © 2023