Jazi Eko Istyanto
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Perbandingan Feature Kata dan Frasa dalam Kinerja Clustering Dokumen Teks Berbahasa Indonesia Amir Hamzah; Adhi Susanto; F. Soesianto; Jazi Eko Istyanto
Seminar Nasional Aplikasi Teknologi Informasi (SNATI) 2007
Publisher : Jurusan Teknik Informatika, Fakultas Teknologi Industri, Universitas Islam Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Text document clustering has been intensively studied because of its important role in text-mining andinformation retrieval. High dimensionality problem caused by high number of words is always happened inword-based clustering technique using vector space model. Although extracting words in the preprocessingphase is simple, the collection itself is not only can be viewed as a set of words but also a set of partly more thanone word phrase. Separating a phrase into its parts can eliminate the actual meaning of phrase. Therefore inorder to maintain the context of words a phrase must be maintain as a phrase. It is assumed that by addingphrases to words as features in clustering will improve the performance. This paper will study the comparison ofword-base and phrase-based clustering. Three clustering models was chosen i.e. hierachical, partional andhybrid model. Four similarity technique i.e. GroupAverage, CompleteLink, SingleLink, and ClusterCenter wastried for hierarchical, K-Means and Bisecting K-Mean for partitonal and buckshot for hybrid. Documentcollections from 200-800 news text that has been categorized manually was used to test these algorithms byusing F-measure as criteria of clustering performance. This value was derived from Recall and Precision andcan be used to measure the performance of the algorithms to correctly classify the collections. Results show thatby adding phrases or simply word pair, although it’s still not statistically significant, it slightly improves theperformance of clustering.Keywords: word-base document clustering, phraset-based document clustering, clustering performance