Indonesian Journal of Electrical Engineering and Computer Science
Vol 25, No 3: March 2022

An empirical evaluation of phrase-based statistical machine translation for Indonesia slang-word translator

Kyrie Cettyara Eleison (Del Institute of Technology)
Sari Uli Inggrid Hutahaean (Del Institute of Technology)
Sarah Christine Tampubolon (Del Institute of Technology)
Teamsar Muliadi Panggabean (Del Institute of Technology)
Ike Fitriyaningsih (Del Institute of Technology)



Article Info

Publish Date
01 Mar 2022

Abstract

The use of slang (non-standard language), especially in social media, is increasing. It causes reducing the level of understanding when communicating because not everyone understands slang (non-standard language). The purpose of this work is to develop a slang-word translator. The other objective is to find the minimum number of sentences and BiLingual Evaluation Understudy (BLEU) score used as a benchmark to determine that the translation is understandable. The approach used in this project is a Phrase-based statistical machine translation (PBSMT) approach, suitable for low resource language, with a dataset of 100,000 sentences taken from the comments column of several online political news portals. The comments are then manually translated to produce a parallel corpus of non-standard language-standard language. The sample sentences are taken from the dataset then distributed using questionnaires to obtain the human understanding level regarding the translation result. The result of the implementation is a BLEU score of 64 and the minimum number of sentences to have an understandable machine translation is 500. The conclusion drawn from the distributed questionnaires is that humans can understand the sentences produced by the translation machine.

Copyrights © 2022