Fadli Husein Wattiheluw
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (575.381 KB) | DOI: 10.26418/jlk.v1i2.7

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.
Peringkasan Otomatis Multi Dokumen menggunakan Hirarki Kluster Lukman Hakim; Fadli Husein Wattiheluw; Agus Zainal Arifin; Aminul Wahib
Jurnal Linguistik Komputasional Vol 1 No 2 (2018): Vol. 1, No. 2
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26418/jlk.v1i2.86

Abstract

Multi-document summarization is a technique for getting information. The information consists of several lines of sentences that aim to describe the contents of the entire document relevantly. Several algorithms with various criteria have been carried out. In general, these criteria are the preprocessing, cluster, and representative sentence selection to produce summaries that have high relevance. In some conditions, the cluster stage is one of the important stages to produce summarization. Existing research cannot determine the number of clusters to be formed. Therefore, we propose clustering techniques using cluster hierarchy. This technique measures the similarity between sentences using cosine similarity. These sentences are clustered based on their similarity values. Clusters that have the highest level of similarity with other clusters will be merged into one cluster. This merger process will continue until one cluster remains. Experimental results on the 2004 Document Understanding Document (DUC) dataset and using two scenarios that use 132, 135, 137 and 140 clusters resulting in fluctuating values. The smaller the number of clusters does not guarantee an increase in the value of ROUGE-1. The method proposed using the same number of clusters has a lower ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values ​​in each cluster experienced a decrease in similarity values.