Jurnal Mantik
Vol. 6 No. 1 (2022): May: Manajemen, Teknologi Informatika dan Komunikasi (Mantik)

Impact of Text Preprocessing on Named Entity Recognition Based on Conditional Random Field in Indonesian Text

Samuel Situmeang (Institut Teknologi Del)



Article Info

Publish Date
16 May 2022

Abstract

The text preprocessing stage within a natural language processing application framework helps eliminate parts that are not helpful in the text analysis process or particular noise. Despite having a potential impact on the final performance of the application, text preprocessing has not received attention in the text analysis application literature, especially in the named entity recognition application in Indonesian texts. This paper aims to comprehensively examine the impact of text preprocessing in the Indonesian named entity recognition based on a baseline model, namely Conditional Random Field, to find the fittest preprocessing procedures for a NER model compelling performance. Various forms of text preprocessing contribute to the successful recognition of named entities assessed comparatively across three categories: people, places, and organizations. Experimental analysis of the data set reveals that several combinations of preprocessing text forms are useful. Rather than enabling or disabling them all, several combinations can significantly improve the accuracy of Indonesian named entity recognition depending on the entity category.

Copyrights © 2022






Journal Info

Abbrev

mantik

Publisher

Subject

Computer Science & IT Economics, Econometrics & Finance Languange, Linguistic, Communication & Media

Description

Jurnal Mantik (Manajemen, Teknologi Informatika dan Komunikasi) is a scientific journal in information systems/informati containing the scientific literature on studies of pure and applied research in information systems/information technology,Comptuer Science and management science and public ...