Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. The selection of term candidate on automatic subject indexing is very important, because it can influence the result of topic extraction on document. Recently on the automatic subject indexing especially in the term candidate selection only consider terms in the document collection. In contrast, indexer prefers to choose general term on manual subject indexing for selection of term candidate. In this paper, we proposed a new strategy for selecting term candidate on automatic subject indexing for extraction the main topic from the document. The proposed method uses a combination of Term Frequency Inverse Document Frequency (TF*IDF) and Random Walk on the structure of thesaurus. Experimental results show that the proposed method can select the terms candidate that relevant to the topic of the document with F-Measure of 0.24.
Copyrights © 2021