Knowledge Engineering and Data Science
Vol 4, No 1 (2021)

Indonesian Sentence Boundary Detection using Deep Learning Approaches

Joan Santoso (Department of Information Technology, Institut Sains dan Teknologi Terpadu Surabaya, Surabaya, Indonesia)
Esther Irawati Setiawan (Department of Information Technology, Institut Sains dan Teknologi Terpadu Surabaya, Surabaya, Indonesia)
Christian Nathaniel Purwanto (Electrical Engineering and Computer Science, National Yan Ming Chiao Tung University, Taiwan)
Fachrul Kurniawan (Department of Informatics Engineering, Maulana Malik Ibrahim State Islamic University, Malang, Indonesia)



Article Info

Publish Date
30 Jun 2021

Abstract

Detecting the sentence boundary is one of the crucial pre-processing steps in natural language processing. It can define the boundary of a sentence since the border between a sentence, and another sentence might be ambiguous. Because there are multiple separators and dynamic sentence patterns, using a full stop at the end of a sentence is sometimes inappropriate. This research uses a deep learning approach to split each sentence from an Indonesian news document. Hence, there is no need to define any handcrafted features or rules. In Part of Speech Tagging and Named Entity Recognition, we use sequence labeling to determine sentence boundaries. Two labels will be used, namely O as a non-boundary token and E as the last token marker in the sentence. To do this, we used the Bi-LSTM approach, which has been widely used in sequence labeling. We have proved that our approach works for Indonesian text using pre-trained embedding in Indonesian, as in previous studies. This study achieved an F1-Score value of 98.49 percent. When compared to previous studies, the achieved performance represents a significant increase in outcomes..

Copyrights © 2021






Journal Info

Abbrev

keds

Publisher

Subject

Computer Science & IT Engineering

Description

Knowledge Engineering and Data Science (2597-4637), KEDS, brings together researchers, industry practitioners, and potential users, to promote collaborations, exchange ideas and practices, discuss new opportunities, and investigate analytics frameworks on data-driven and knowledge base systems. ...