The text preprocessing stage within a natural language processing application framework helps eliminate parts that are not helpful in the text analysis process or particular noise. Despite having a potential impact on the final performance of the application, text preprocessing has not received attention in the text analysis application literature, especially in the named entity recognition application in Indonesian texts. This paper aims to comprehensively examine the impact of text preprocessing in the Indonesian named entity recognition based on a baseline model, namely Conditional Random Field, to find the fittest preprocessing procedures for a NER model compelling performance. Various forms of text preprocessing contribute to the successful recognition of named entities assessed comparatively across three categories: people, places, and organizations. Experimental analysis of the data set reveals that several combinations of preprocessing text forms are useful. Rather than enabling or disabling them all, several combinations can significantly improve the accuracy of Indonesian named entity recognition depending on the entity category.
Copyrights © 2022