Jurnal Ilmu Komputer
Vol 6 No 2: September 2013

PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN

I Putu Gede Hendra Suputra (Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember (ITS) Kampus ITS, Sukolilo, Surabaya 60111, Indonesia)
Agus Zainal Arifin (Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia)
Anny Yuniarti (Jurusan Teknik Informatika, Fakultas Teknologi Informasi, Institut Teknologi Sepuluh Nopember(ITS)Kampus ITS, Sukolilo, Surabaya 60111, Indonesia)



Article Info

Publish Date
01 Sep 2013

Abstract

Coverage and saliency are major problems in Automatic Text Summarization. Sentence clusteringapproaches are methods able to provide good coverage on all topics, but the point to be considered is theselection of important sentence that can represent the cluster’s topic. The salient sentences selected asconstituent to the final summary should have information density so that can convey important informationcontained in the cluster. Information density from the sentence can be mined by extracting the sentenceinformation density (SID) feature that built from positional text graph approach of every sentence in the cluster.This paper proposed a cluster representative sentence selection strategy that used the positional text graphapproach in multi-document summarization. There are three concepts that used in this paper: (1) sentenceclustering based on similarity based histogram clustering, (2) cluster ordering based on cluster importance and(3) representative sentence selection based on sentence information density feature score. The candidatesummary sentence is a sentence that has greatest sentence information density feature score of a cluster. Trialsconducted on task 2 DUC 2004 dataset. ROUGE-1 measurement was used as performance metric to comparethe use of SID feature with other method namely Local Importance and Global Importance (LIGI). Test resultshowed that the use of SID feature was successfully outperform LIGI method based on ROUGE-1 values wherethe greatest average value of ROUGE-1 that achieved by SID features is 0.3915.

Copyrights © 2013






Journal Info

Abbrev

jik

Publisher

Subject

Computer Science & IT Languange, Linguistic, Communication & Media Library & Information Science

Description

JIK is a peer-reviewed scientific journal published by Informatics Department, Faculty of Mathematics and Natural Science, Udayana University which has been published since 2008. The aim of this journal is to publish high-quality articles dedicated to all aspects of the latest outstanding ...