Aminul Wahib
Unknown Affiliation

Published : 4 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 4 Documents
Search

Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (575.381 KB) | DOI: 10.26418/jlk.v1i2.7

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
Perangkingan Dokumen Berbahasa Arab Menggunakan Latent Semantic Indexing Aminul Wahib; Pasnur Pasnur; Putu Praba Santika; Agus Zainal Arifin
Jurnal Buana Informatika Vol. 6 No. 2 (2015): Jurnal Buana Informatika Volume 6 Nomor 2 April 2015
Publisher : Universitas Atma Jaya Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24002/jbi.v6i2.411

Abstract

Berbagai metode perangkingan dokumen dalam aplikasi InformationRetrieval telah dikembangkan dan diimplementasikan. Salah satu metode yangsangat populer adalah perangkingan dokumen menggunakan vector space modelberbasis pada nilai term weighting TF.IDF. Metode tersebut hanya melakukanpembobotan term berdasarkan frekuensi kemunculannya pada dokumen tanpamemperhatikan hubungan semantik antar term. Dalam kenyataannya hubungansemantik antar term memiliki peranan penting untuk meningkatkan relevansi hasilpencarian dokumen. Penelitian ini mengembangkan metode TF.IDF.ICF.IBFdengan menambahkan Latent Semantic Indexing untuk menemukan hubungansemantik antar term pada kasus perangkingan dokumen berbahasa Arab. Datasetyang digunakan diambil dari kumpulan dokumen pada perangkat lunak MaktabahSyamilah. Hasil pengujian menunjukkan bahwa metode yang diusulkanmemberikan nilai evaluasi yang lebih baik dibandingkan dengan metodeTF.IDF.ICF.IBF. Secara berurut nilai f-measure metode TF.IDF.ICF.IBF.LSIpada ambang cosine similarity 0,3, 0,4, dan 0,5 adalah 45%, 51%, dan 60%. Namun metode yang disulkan memiliki waktu komputasi rata-rata lebih tinggidibandingkan dengan metode TF.IDF.ICF.IBF sebesar 2 menit 8 detik.
Peringkasan Dokumen Berdasarkan Metode Semantic Sebaran Kalimat Aminul Wahib; Dita Lupita Sari
Jurnal Buana Informatika Vol. 8 No. 1 (2017): Jurnal Buana Informatika Volume 8 Nomor 1 Januari 2017
Publisher : Universitas Atma Jaya Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24002/jbi.v8i1.1073

Abstract

Abstract. Sentence distribution method performs weighting based on the sentence distribution without taking the semantic meaning of the sentence spread into account. In fact, the semantic relation between sentences is believed to increase the relevance of the search results document. This study proposes new strategies to summarize documents using the semantic sentence distribution method in an effort to improve the quality of the summary. The experimental results show that the proposed method has better performance with the average performance ROUGE-1 0.412, an increase of 1,9% compared to "Sentence distribution method" and ROUGE-2 by 4,7% compared to 0.127 "sentence distribution method".Keywords: Semantic Sentence Distribution, Summarizing Document, ROUGE. Abstrak. Peringkasan dokumen menggunakan metode sebaran kalimat terbukti memiliki hasil yang lebih baik jika dibanding dengan penelitian-penelitian sebelumnya. Metode tersebut melakukan pembobotan kalimat berdasarkan sebaran kalimat tanpa memperhitungkan makna semantic kalimat yang tersebar. Faktanya hubungan semantic antar kalimat telah terbukti mampu meningkatkan relevansi hasil dalam pencarian dokumen. Penelitian ini mengajukan strategi baru dalam peringkasan dokumen yaitu menggunakan metode semantic sebaran kalimat sebagai upaya untuk meningkatkan kualitas hasil ringkasan. Hasil eksperimen didapatkan bahwa metode yang diusulkan memiliki performa lebih baik dengan capaian rata-rata ROUGE-1 0,412, meningkat 1,9% dibanding metode sebaran kalimat dan ROUGE-2 0,127 meningkat 4,7% dibanding metode sebaran kalimat.Kata Kunci: Semantic Sebaran Kalimat, Peringkasan Dokumen, ROUGE.
Peringkasan Otomatis Multi Dokumen menggunakan Hirarki Kluster Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/jlk.v1i2.86

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values ​​in each cluster experienced a decrease in similarity values.