cover
Contact Name
-
Contact Email
-
Phone
-
Journal Mail Official
-
Editorial Address
-
Location
Unknown,
Unknown
INDONESIA
Jurnal Linguistik Komputasional
ISSN : -     EISSN : 26219336     DOI : -
Core Subject : Science,
Jurnal Linguistik Komputasional (JLK) menerbitkan makalah orisinil di bidang lingustik komputasional yang mencakup, namun tidak terbatas pada : Phonology, Morphology, Chunking/Shallow Parsing, Parsing/Grammatical Formalisms, Semantic Processing, Lexical Semantics, Ontology, Linguistic Resources, Statistical and Knowledge based methods, POS tagging, Discourse, Paraphrasing/Entailment/Generation, Machine Translation, Information Retrieval, Text Mining, Information Extraction, Summarization, Question Answering, Dialog Systems, Spoken Language Processing, Speech Recognition and Synthesis.
Arjuna Subject : -
Articles 56 Documents
Sistem Pencarian Ayat Al-Quran Berdasarkan Kemiripan Ucapan Menggunakan Algoritma Soundex dan Damerau-Levenshtein Distance Puruhita Ananda Arsaningtyas; Moch. Arif Bijaksana; Said Al Faraby
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (583.631 KB) | DOI: 10.26418/jlk.v1i2.10

Abstract

Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen Muhammad Zidny Naf'an; Auliya Burhanuddin; Ade Riyani
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (476.917 KB) | DOI: 10.26418/jlk.v2i1.17

Abstract

Plagiarisme merupakan tindakan mengambil sebagian atau seluruh ide seseorang berupa dokumen maupun teks tanpa mencantumkan sumber pengambilan informasi. Penelitian ini bertujuan untuk mendeteksi kemiripan dokumen teks menggunakan algoritma cosine similarity dan pembobotan TF-IDF sehingga dapat digunakan untuk menentukan nilai plagiarisme. Dokumen yang digunakan untuk perbandingan teks ini adalah abstrak bahasa Indonesia. Hasil penelitian yaitu saat dilakukan stemming nilai kemiripan lebih tinggi rata-rata 10% daripada tidak dilakukan proses stemming. Penelitian ini menghasilkan nilai similaritas diatas 50% untuk dokumen yang tingkat kemiripannya tinggi. Sedangkan untuk dokumen dengan tingkat kemiripan rendah atau tidak berplagiat menghasilkan nilai similarity dibawah 40%. Dengan metode yang digunakan pada preprocessing yang terdiri dari case folding, tokenizing, stopword removeal, dan stemming. Setelah proses preprocessing maka tahap selanjutnya dilakukan perhitungan pembobotan TF-IDF dan nilai kemiripan menggunakan cosine similarity sehingga mendapatkan nilai persentase kemiripan. Berdasarkan hasil percobaan algoritma cosine similarity dan pembobotan TF-IDF mampu menghasilkan nilai kemiripan dari masing-masing dokumen pembanding
Analisis Penggabungan Korpus dari Hadits Nabi dan Alquran untuk Mesin Penerjemah Statistik Hafidz Ardhi; Herry Sujaini; Arif Bijaksana Putra
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (649.03 KB) | DOI: 10.26418/jlk.v1i1.1

Abstract

Each region has different language to communicate. A communication can run well if each other can understand the language that use in communication process. Machine translation is an automatic translation machine to translate a text from a language to another language. In the machine translation there will be an automatic evaluation. Automatic evaluation is needed to measure the quality of translation text from machine translation using automatic metric. The metric is use to determine score toward quality in various ways until get percentage at the final result. Evaluation of translation machine system using automatic metric is quick, easy and inexpensive way rather that human evaluation. BLUE is a common metric used by researcher to evaluate machine translation. For this research, researcher used Arabic Languange. Corpus that used are corpus of Al-quran, corpus of hadith, and combined corpus. The corpus will be tested with the type of sentence and 4 level numbers of sentences. The test will be done in two times. First, test without MADAMIRA. Second, test using MADAMIRA. The result of tested without MADAMIRA produce BLEU score for corpus of Al-quran in amount of 10,56%, corpus of hadith 27,65%, and combined corpus 15,41%. In the other hand, the result of tested corpus used MADAMIRA got result of BLEU for corpus of Al-quran 1,44%, corpus of hadith 32,90% and combined corpus 41,46%.
Corpus Quality Improvement to Improve the Quality of Statistical Translator Machines (Case Study of Indonesian Language to Java Krama) Muhammad Gerdy Asparilla; Herry Sujaini; Rudy Dwi Nyoto
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1019.163 KB) | DOI: 10.26418/jlk.v1i2.12

Abstract

Language is a communication tool that is used as a means to interact with the surrounding community. The ability to master many languages will certainly make it easier to interact with other people from different regions. Therefore, translators are needed to increase knowledge of various languages. Statistical Machine Translation (Statistical Machine Translation) is a machine translation approach with translation results produced on the basis of statistical models whose parameters are taken from the results of parallel corpus analysis. Parallel body is a pair of corpus containing sentences in a language and translation. One feature that is used to improve the quality of translation results is with corpus optimization. The aim to be achieved in this study is to look at the influence of the quality of the corpus by filtering out pairs of sentences with quality translation. The filter used is the minimum value of each sentence that is tested by the Bilingual Evaluation Understudy (BLEU) method. Testing is done by comparing the accuracy of the results of the translation before and after corpus optimization. From the results of the research, the use of corpus optimization can improve the quality of translation for Indonesian translation machines to Javanese manners. This can be seen from the results of testing by adding corpus optimization to 15 test sentences outside the corpus, there is an average increase in BLEU values of 10.53% and by using 100 test sentences derived from corpus optimization there is an average increase in BLEU values of 11.63% in automated testing and 0.03% on testing by linguists. Based on this, the machine translating Indonesian statistics into Javanese language using the corpus optimization feature can increase the accuracy of the translation results
Penjernihan Derau pada Suara Kanal Tunggal dengan Pembelajaran Faktorisasi Matriks Non-negatif tanpa Pengawasan Tirtadwipa Manunggal; Oskar Riandi; Ardhi Ma’arik; Lalan Suryantoro; Achmad Satria Putera; Izzul Al-Hakam
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1684.625 KB) | DOI: 10.26418/jlk.v1i1.2

Abstract

This article examines an approach of denoising method on single channel using Non-negative Matrix Factorization (NMF) on unsupervised-learning scheme. This technique utilizes the property of NMF which unravels spectrogram matrices of noise-interfered speech and noise itself into their building-block vector. As extension for NMF, Wiener filter is applied in the end of steps. This method is designated to run in low latency system, hence preparing certain noise model for particular condition beforehand is impractical. Thus the noise model is taken automatically from the unvoiced part of noise-interfered speech. The contribution achieved in this research is the kind of NMF learning using linear and non-linear constraint which is done without explicitly providing noise models. Therefore the denoising process could be undergone flexibly in any noise condition.
Employing Dependency Tree in Machine Learning Based Indonesian Factoid Question Answering System Irfan Afif; Ayu Purwarianti
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (577.209 KB) | DOI: 10.26418/jlk.v2i1.9

Abstract

We proposed the usage of dependency tree information to increase the accuracy of Indonesian factoid question answering. We employed MSTParser and Universal Dependency corpus to build the Indonesian dependency parser. The dependency tree information as the result of the Indonesian dependency parse is used in the answer finder component of Indonesian factoid question answering system. Here, we used dependency tree information in two ways: 1) as one of the features in machine learning based answer finder (classifying each term in the retrieved passage as part of a correct answer or not); 2) as an additional heuristic rule after conducting the machine learning technique. For the machine learning technique, we combined word based calculation, phrase based calculation and similarity dependency relation based calculation as the complete features. Using 203 data, we were able to enhance the accuracy for the Indonesian factoid QA system compared to related work by only using the phrase information. The best accuracy was 84.34% for the correct answer classification and the best MRR was 0.954.
Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (575.381 KB) | DOI: 10.26418/jlk.v1i2.7

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia Ahmad Fathan Hidayatullah; Aufa Aulia Fadila; Kiki Purnama Juwairi; Royan Abida Nayoan
Jurnal Linguistik Komputasional Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (497.009 KB) | DOI: 10.26418/jlk.v2i1.15

Abstract

This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values ​​of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.
Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia Maryamah Maryamah; Made Agus Putra Subali; Lailly Qolby; Agus Zainal Arifin; Ali Fauzi
Jurnal Linguistik Komputasional Vol 1 No 1 (2018): Vol. 1, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (840.487 KB) | DOI: 10.26418/jlk.v1i1.4

Abstract

Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.
Uji Coba Korpus Data Wicara BPPT sebagai Data Latih Sistem Pengenalan Wicara Bahasa Indonesia Made Gunawan; Elvira Nurfadhilah; Lyla Ruslana Aini; M. Teduh Uliniansyah; Gunarso -; Agung Santosa; Juliati Junde
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (555.048 KB) | DOI: 10.26418/jlk.v1i2.8

Abstract

Kami menyajikan hasil uji coba pengenalan wicara menggunakan Korpus Data Wicara BPPT yang dikembangkan tahun 2013 (KDW-BPPT-2013) dengan menggunakan anggaran DIPA tahun 2013. Korpus ini digunakan sebagai data latih dan data uji. Korpus ini berisi ujaran dari 200 pembicara yang terdiri dari 50 laki-laki dewasa, 50 laki-laki remaja, 50 perempuan dewasa, dan 50 perempuan remaja dengan masing-masing mengucapkan 250 kalimat. Total lama ujaran data wicara ini sekitar 92 jam. Uji coba dilakukan dengan menggunakan Kaldi dan menghasilkan Word Error Rate (WER) GMM 2,52 % dan DNN 1,64%.