Jurnal Linguistik Komputasional
Vol 1 No 2 (2018): Vol. 1, No. 2

Corpus Quality Improvement to Improve the Quality of Statistical Translator Machines (Case Study of Indonesian Language to Java Krama)

Muhammad Gerdy Asparilla (Unknown)
Herry Sujaini (Unknown)
Rudy Dwi Nyoto (Unknown)



Article Info

Publish Date
28 Sep 2018

Abstract

Language is a communication tool that is used as a means to interact with the surrounding community. The ability to master many languages will certainly make it easier to interact with other people from different regions. Therefore, translators are needed to increase knowledge of various languages. Statistical Machine Translation (Statistical Machine Translation) is a machine translation approach with translation results produced on the basis of statistical models whose parameters are taken from the results of parallel corpus analysis. Parallel body is a pair of corpus containing sentences in a language and translation. One feature that is used to improve the quality of translation results is with corpus optimization. The aim to be achieved in this study is to look at the influence of the quality of the corpus by filtering out pairs of sentences with quality translation. The filter used is the minimum value of each sentence that is tested by the Bilingual Evaluation Understudy (BLEU) method. Testing is done by comparing the accuracy of the results of the translation before and after corpus optimization. From the results of the research, the use of corpus optimization can improve the quality of translation for Indonesian translation machines to Javanese manners. This can be seen from the results of testing by adding corpus optimization to 15 test sentences outside the corpus, there is an average increase in BLEU values of 10.53% and by using 100 test sentences derived from corpus optimization there is an average increase in BLEU values of 11.63% in automated testing and 0.03% on testing by linguists. Based on this, the machine translating Indonesian statistics into Javanese language using the corpus optimization feature can increase the accuracy of the translation results

Copyrights © 2018






Journal Info

Abbrev

jlk

Publisher

Subject

Computer Science & IT

Description

Jurnal Linguistik Komputasional (JLK) menerbitkan makalah orisinil di bidang lingustik komputasional yang mencakup, namun tidak terbatas pada : Phonology, Morphology, Chunking/Shallow Parsing, Parsing/Grammatical Formalisms, Semantic Processing, Lexical Semantics, Ontology, Linguistic Resources, ...