Aminul Wahib
Institut Teknologi Sepuluh Nopember

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Sentence Extraction Based on Sentence Distribution and Part of Speech Tagging for Multi-Document Summarization Agus Zainal Arifin; Moch Zawaruddin Abdullah; Ahmad Wahyu Rosyadi; Desepta Isna Ulumi; Aminul Wahib; Rizka Wakhidatus Sholikah
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 16, No 2: April 2018
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v16i2.8431

Abstract

Automatic multi-document summarization needs to find representative sentences not only by sentence distribution to select the most important sentence but also by how informative a term is in a sentence. Sentence distribution is suitable for obtaining important sentences by determining frequent and well-spread words in the corpus but ignores the grammatical information that indicates instructive content. The presence or absence of informative content in a sentence can be indicated by grammatical information which is carried by part of speech (POS) labels. In this paper, we propose a new sentence weighting method by incorporating sentence distribution and POS tagging for multi-document summarization. Similarity-based Histogram Clustering (SHC) is used to cluster sentences in the data set. Cluster ordering is based on cluster importance to determine the important clusters. Sentence extraction based on sentence distribution and POS tagging is introduced to extract the representative sentences from the ordered clusters. The results of the experiment on the Document Understanding Conferences (DUC) 2004 are compared with those of the Sentence Distribution Method. Our proposed method achieved better results with an increasing rate of 5.41% on ROUGE-1 and 0.62% on ROUGE-2.
Improving Multi-Document Summary Method Based on Sentence Distribution Aminul Wahib; Agus Zainal Arifin; Diana Purwitasari
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 14, No 1: March 2016
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v14i1.2330

Abstract

Automatic multi-document summaries had been developed by researchers. The method used to select sentences from the source document would determine the quality of the summary result. One of the most popular methods used in weighting sentences was by calculating the frequency of occurrence of words forming the sentences. However, choosing sentences with that method could lead to a chosen sentence which didn't represent the content of the source document optimally. This was because the weighting of sentences was only measured by using the number of occurrences of words. This study proposed a new strategy of weighting sentences based on sentences distribution to choose the most important sentences which paid much attention to the elements of sentences that were formed as a distribution of words. This method of sentence distribution enables the extraction of an important sentence in multi-document summarization which served as a strategy to improve the quality of sentence summaries. In that respect were three concepts used in this study: (1) clustering sentences with similarity based histogram clustering, (2) ordering cluster by cluster importance and (3) selection of important sentence by sentence distribution. Results of experiments showed that the proposed method had a better performance when compared with SIDeKiCK and LIGI methods. Results of ROUGE-1 showed the proposed method increasing 3% compared with the SIDeKiCK method and increasing 5.1% compared with LIGI method. Results of ROUGE-2 proposed method increase 13.7% compared with the SIDeKiCK and increase 14.4% compared with LIGI method.