Arif Farhan Bukhori
Program Studi Ilmu Komputer, Departemen Ilmu Komputer dan Elektronika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Gadjah Mada, Yogyakarta

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Model Klasifikasi Berbasis Multiclass Classification dengan Kombinasi Indobert Embedding dan Long Short-Term Memory untuk Tweet Berbahasa Indonesia Thariq Iskandar Zulkarnain Maulana Putra; Suprapto Suprapto; Arif Farhan Bukhori
Jurnal Ilmu Siber dan Teknologi Digital Vol. 1 No. 1 (2022): November
Publisher : Penerbit Goodwood

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35912/jisted.v1i1.1509

Abstract

Purpose: This research aims to improve the performance of the text classification model from previous studies, by combining the IndoBERT pre-trained model with the Long Short-Term Memory (LSTM) architecture in classifying Indonesian-language tweets into several categories. Method: The classification text based on multiclass classification was used in this research, combined with pre-trained IndoBERT namely Long Short-Term Memory (LTSM). The dataset was taken using crawling method from API Twitter. Then, it will be compared with Word2Vec-LTSM and fined-tuned IndoBERT. Result: The IndoBERT-LSTM model with the best hyperparameter combination scenario (batch size of 16, learning rate of 2e-5, and using average pooling) managed to get an F1-score of 98.90% on the unmodified dataset (0.70% increase from the Word2Vec-LSTM model and 0.40% from the fine-tuned IndoBERT model) and 92.83% on the modified dataset (4.51% increase from the Word2Vec-LSTM model and 0.69% from the fine-tuned IndoBERT model). However, the improvement from the fine-tuned IndoBERT model is not very significant and the Word2Vec-LSTM model has a much faster total training time.