cover
Contact Name
-
Contact Email
-
Phone
-
Journal Mail Official
-
Editorial Address
-
Location
Unknown,
Unknown
INDONESIA
Jurnal Linguistik Komputasional
ISSN : -     EISSN : 26219336     DOI : -
Core Subject : Science,
Jurnal Linguistik Komputasional (JLK) menerbitkan makalah orisinil di bidang lingustik komputasional yang mencakup, namun tidak terbatas pada : Phonology, Morphology, Chunking/Shallow Parsing, Parsing/Grammatical Formalisms, Semantic Processing, Lexical Semantics, Ontology, Linguistic Resources, Statistical and Knowledge based methods, POS tagging, Discourse, Paraphrasing/Entailment/Generation, Machine Translation, Information Retrieval, Text Mining, Information Extraction, Summarization, Question Answering, Dialog Systems, Spoken Language Processing, Speech Recognition and Synthesis.
Arjuna Subject : -
Articles 45 Documents
Analisis Penggabungan Korpus dari Hadits Nabi dan Alquran untuk Mesin Penerjemah Statistik Hafidz Ardhi; Herry Sujaini; Arif Bijaksana Putra
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (649.03 KB) | DOI: 10.26418/jlk.v1i1.1

Abstract

Each region has different language to communicate. A communication can run well if each other can understand the language that use in communication process. Machine translation is an automatic translation machine to translate a text from a language to another language. In the machine translation there will be an automatic evaluation. Automatic evaluation is needed to measure the quality of translation text from machine translation using automatic metric. The metric is use to determine score toward quality in various ways until get percentage at the final result. Evaluation of translation machine system using automatic metric is quick, easy and inexpensive way rather that human evaluation. BLUE is a common metric used by researcher to evaluate machine translation. For this research, researcher used Arabic Languange. Corpus that used are corpus of Al-quran, corpus of hadith, and combined corpus. The corpus will be tested with the type of sentence and 4 level numbers of sentences. The test will be done in two times. First, test without MADAMIRA. Second, test using MADAMIRA. The result of tested without MADAMIRA produce BLEU score for corpus of Al-quran in amount of 10,56%, corpus of hadith 27,65%, and combined corpus 15,41%. In the other hand, the result of tested corpus used MADAMIRA got result of BLEU for corpus of Al-quran 1,44%, corpus of hadith 32,90% and combined corpus 41,46%.
Penjernihan Derau pada Suara Kanal Tunggal dengan Pembelajaran Faktorisasi Matriks Non-negatif tanpa Pengawasan Tirtadwipa Manunggal; Oskar Riandi; Ardhi Ma’arik; Lalan Suryantoro; Achmad Satria Putera; Izzul Al-Hakam
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1684.625 KB) | DOI: 10.26418/jlk.v1i1.2

Abstract

This article examines an approach of denoising method on single channel using Non-negative Matrix Factorization (NMF) on unsupervised-learning scheme. This technique utilizes the property of NMF which unravels spectrogram matrices of noise-interfered speech and noise itself into their building-block vector. As extension for NMF, Wiener filter is applied in the end of steps. This method is designated to run in low latency system, hence preparing certain noise model for particular condition beforehand is impractical. Thus the noise model is taken automatically from the unvoiced part of noise-interfered speech. The contribution achieved in this research is the kind of NMF learning using linear and non-linear constraint which is done without explicitly providing noise models. Therefore the denoising process could be undergone flexibly in any noise condition.
Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia Maryamah Maryamah; Made Agus Putra Subali; Lailly Qolby; Agus Zainal Arifin; Ali Fauzi
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (840.487 KB) | DOI: 10.26418/jlk.v1i1.4

Abstract

Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.
Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Fatra Nonggala Putra; Ari Effendi; Agus Zainal Arifin
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (892.663 KB) | DOI: 10.26418/jlk.v1i1.5

Abstract

Peringkasan Multidokumen Otomatis dengan Menggunakan Log-Likelihood Ratio (LLR) dan Maximal Marginal Relevance (MMR) untuk Artikel dengan Topik Penyakit Menular Bahasa Indonesia Ikhwan Nizwar Akhmad; Anto Satriyo Nugroho; Bambang Harjito
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1098.054 KB) | DOI: 10.26418/jlk.v1i1.6

Abstract

Increasing number of information available on the Internet, along with its benefit, also comes with various problems. Modern search engines are smart enough to bring the most relevant information, but the immense number of information provided often brings more confusion than clarity. This condition is known as information overload. Automatic multidocument summarization is a way to overcome this particular problem. Nevertheless, despite of being heavily studied more than 20 years, its implementations for Indonesian language are limited. In this paper, we reported our experimental results on multidocument summarization in Indonesian language. Articles about infectious disease is one of the ideal case study for multidocument summarization for Indonesian language. Information about infectious disease are essential for general public therefore many information about it is available on the Internet. This condition could trigger information overload when someone do an internet search in this topic. In this research, we try to implement multidocument summarization technique for articles with infectious disease topic in Bahasa Indonesia utilizing Log Likelihood Ratio (LLR) to obtain topic signatures and Maximal Marginal Relevance (MMR) to generate relevant summary with minimal information redundancy. Our summarization method generated a summary with 0.4 F-measure using ROUGE-S9 evalution. Also, we found that topic signature (with its accuracy) takes an important role on generating good summaries.
Sistem Pencarian Ayat Al-Quran Berdasarkan Kemiripan Ucapan Menggunakan Algoritma Soundex dan Damerau-Levenshtein Distance Puruhita Ananda Arsaningtyas; Moch. Arif Bijaksana; Said Al Faraby
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (583.631 KB) | DOI: 10.26418/jlk.v1i2.10

Abstract

Building Monolingual Word Alignment For Indonesian Al-Quran Translation Galih Rizky Prabowo; Moch Arif Bijaksana
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (986.575 KB) | DOI: 10.26418/jlk.v1i2.11

Abstract

Corpus Quality Improvement to Improve the Quality of Statistical Translator Machines (Case Study of Indonesian Language to Java Krama) Muhammad Gerdy Asparilla; Herry Sujaini; Rudy Dwi Nyoto
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1019.163 KB) | DOI: 10.26418/jlk.v1i2.12

Abstract

Language is a communication tool that is used as a means to interact with the surrounding community. The ability to master many languages will certainly make it easier to interact with other people from different regions. Therefore, translators are needed to increase knowledge of various languages. Statistical Machine Translation (Statistical Machine Translation) is a machine translation approach with translation results produced on the basis of statistical models whose parameters are taken from the results of parallel corpus analysis. Parallel body is a pair of corpus containing sentences in a language and translation. One feature that is used to improve the quality of translation results is with corpus optimization. The aim to be achieved in this study is to look at the influence of the quality of the corpus by filtering out pairs of sentences with quality translation. The filter used is the minimum value of each sentence that is tested by the Bilingual Evaluation Understudy (BLEU) method. Testing is done by comparing the accuracy of the results of the translation before and after corpus optimization. From the results of the research, the use of corpus optimization can improve the quality of translation for Indonesian translation machines to Javanese manners. This can be seen from the results of testing by adding corpus optimization to 15 test sentences outside the corpus, there is an average increase in BLEU values of 10.53% and by using 100 test sentences derived from corpus optimization there is an average increase in BLEU values of 11.63% in automated testing and 0.03% on testing by linguists. Based on this, the machine translating Indonesian statistics into Javanese language using the corpus optimization feature can increase the accuracy of the translation results
Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (575.381 KB) | DOI: 10.26418/jlk.v1i2.7

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
Uji Coba Korpus Data Wicara BPPT sebagai Data Latih Sistem Pengenalan Wicara Bahasa Indonesia Made Gunawan; Elvira Nurfadhilah; Lyla Ruslana Aini; M. Teduh Uliniansyah; Gunarso -; Agung Santosa; Juliati Junde
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (555.048 KB) | DOI: 10.26418/jlk.v1i2.8

Abstract

Kami menyajikan hasil uji coba pengenalan wicara menggunakan Korpus Data Wicara BPPT yang dikembangkan tahun 2013 (KDW-BPPT-2013) dengan menggunakan anggaran DIPA tahun 2013. Korpus ini digunakan sebagai data latih dan data uji. Korpus ini berisi ujaran dari 200 pembicara yang terdiri dari 50 laki-laki dewasa, 50 laki-laki remaja, 50 perempuan dewasa, dan 50 perempuan remaja dengan masing-masing mengucapkan 250 kalimat. Total lama ujaran data wicara ini sekitar 92 jam. Uji coba dilakukan dengan menggunakan Kaldi dan menghasilkan Word Error Rate (WER) GMM 2,52 % dan DNN 1,64%.