Journal Of Engineering Sciences (Improsci)
Vol 1 No 4 (2024): Vol 1 No.4 17 Februari 2024

Utilizing Translation to Enhance NLP Models in Offensive Language and Hate Speech Identification

Sandy Kurniawan (Universitas Diponegoro)
Indra Budi (Universitas Indonesia)



Article Info

Publish Date
16 Feb 2024

Abstract

The number of social media users in Indonesia has increased in recent years. The surge in social media users leads to more offensive language on these platforms. The use of offensive language can trigger conflicts between users. Therefore, it is necessary to identify the use of offensive language on social media. This study focused on identifying offensive language, hate speech, and hate speech targets on Twitter. The data used were obtained from previous research on identifying offensive language and hate speech. The amount of data is very influential on the performance of the classification. Therefore, data was added using translation in this study. Classical machine learning (SVM et al.) and deep learning (BiLSTM, CNN, and LSTM) algorithms are used as classification algorithms with word n-gram and word embedding as the features. Three scenarios were done based on the training data used in the classification model development. The result shows that scenario 3, which uses translation for data augmentation, can improve the classification model’s performance by 5%.

Copyrights © 2024






Journal Info

Abbrev

improsci

Publisher

Subject

Civil Engineering, Building, Construction & Architecture Control & Systems Engineering Electrical & Electronics Engineering Industrial & Manufacturing Engineering Mechanical Engineering

Description

Journal Of Engineering Sciences (Improsci) merupakan peer-reviewed jurnal yang mempublikasikan artikel-artikel ilmiah dalam bidang industri. Artikel-artikel yang dipublikasikan di Jurnal Improsci meliputi hasil penelitian ilmiah asli (prioritas utama), artikel ulasan ilmiah yang bersifat baru (tidak ...