Claim Missing Document
Check
Articles

Found 2 Documents
Search

EFISIENSI PHRASE SUFFIX TREE DENGAN SINGLE PASS CLUSTERING UNTUK PENGELOMPOKAN DOKUMEN WEB BERBAHASA INDONESIA Desmin Tuwohingide; Mika Parwita; Agus Zainal Arifin
JURNAL TEKNOLOGI TECHNOSCIENTIA Technoscientia Vol 8 No 2 Februari 2016
Publisher : Lembaga Penelitian & Pengabdian Kepada Masyarakat (LPPM), IST AKPRIND Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (407.317 KB) | DOI: 10.34151/technoscientia.v8i2.162

Abstract

The number of indonesian documents which available on internet is growing very rapidly. Automatic documents clustering shown to improving the relevant documents search results of many found documents. Suffix tree is one of documents clustering method that developed, because it is proven to increase precision. In this paper, we propose a new method to clustering indonesian web documents based on phrase efficiency in the choice process of base cluster with the combination of documents frequency and term frequency calculation on the phrase with a single pass clustering algorithm (SPC). Every phrase that is considered as the base cluster will be vectored then calculate of the term frequency and document frequency. Furthermore, the documents will be calculate their similarity based on the tf-idf weighted using the cosine similarity and documents clustering is done by using a single pass clustering algorithm. The proposed method is tested on 6 dataset with number of different document 10, 20, 30, 40, 50 and 60 documents. The experiment result show that the proposed method succeeded clustering indonesian web documents by reducing the leaf node with no derivative and produces the F-measure an average of 0.78 while STC traditional produces the F-measure an average of 0.55.This result prove that the efficiency of phrase by phrase choice on internal nodes and leaf nodes that have derivative, and a combination of term frequency and document frequency calculation on the base cluster, gives a significant impact on the process of clustering documents.
EFISIENSI PHRASE SUFFIX TREE DENGAN SINGLE PASS CLUSTERING UNTUK PENGELOMPOKAN DOKUMEN WEB BERBAHASA INDONESIA Desmin Tuwohingide; Mika Parwita; Agus Zainal Arifin
JURNAL TEKNOLOGI TECHNOSCIENTIA Technoscientia Vol 8 No 2 Februari 2016
Publisher : Lembaga Penelitian & Pengabdian Kepada Masyarakat (LPPM), IST AKPRIND Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34151/technoscientia.v8i2.162

Abstract

The number of indonesian documents which available on internet is growing very rapidly. Automatic documents clustering shown to improving the relevant documents search results of many found documents. Suffix tree is one of documents clustering method that developed, because it is proven to increase precision. In this paper, we propose a new method to clustering indonesian web documents based on phrase efficiency in the choice process of base cluster with the combination of documents frequency and term frequency calculation on the phrase with a single pass clustering algorithm (SPC). Every phrase that is considered as the base cluster will be vectored then calculate of the term frequency and document frequency. Furthermore, the documents will be calculate their similarity based on the tf-idf weighted using the cosine similarity and documents clustering is done by using a single pass clustering algorithm. The proposed method is tested on 6 dataset with number of different document 10, 20, 30, 40, 50 and 60 documents. The experiment result show that the proposed method succeeded clustering indonesian web documents by reducing the leaf node with no derivative and produces the F-measure an average of 0.78 while STC traditional produces the F-measure an average of 0.55.This result prove that the efficiency of phrase by phrase choice on internal nodes and leaf nodes that have derivative, and a combination of term frequency and document frequency calculation on the base cluster, gives a significant impact on the process of clustering documents.