In this paper, we will report our work on text segmentation on Indonesian speech documents. As a result of using Automatic Speech Recognition (ASR), the speech documents are transcribed into the text without any boundary for each document. The documents are certainly needed to be segmented regarding to its topics. We apply TextTiling method with various term weighted techniques such as TF-IDF, TF-IDF-Mutual Information, TF-IDF Mutual Information-Word Similarity, and TF-IDF-Word Frequency for measuring the similarity between segments. The result show TF-IDF-Mutual Information performed better in most of the collections.
Copyrights © 2022