Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI)

Similar Questions Identification on Indonesian Language Subject Using Machine Learning Hasmawati; Ade Romadhony
Jurnal Nasional Pendidikan Teknik Informatika : JANAPATI Vol. 12 No. 2 (2023)
Publisher : Prodi Pendidikan Teknik Informatika Universitas Pendidikan Ganesha

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.23887/janapati.v12i2.62582

Abstract

Question similarity is carried out to evaluate similarities between questions in a collection of questions in the question and answer forum and on other platforms. This is done to improve the performance of the question-and-answer forum so that new questions submitted by users can be identified as similar to existing questions in the database. Currently, research related to question similarity is still being carried out on foreign language datasets. The purpose of this research is to identify the similarity of questions in a collection of questions in Indonesian. The method used is Support Vector Machine and IndoBERT. For feature extraction, we evaluate the lexical features and syntax features of each question. For lexical feature extraction, we use the cosine similarity algorithm to calculate the distance between two objects which are represented as vectors. For syntax feature extraction we use the Indonesian part of speech tagger (POS Tag). The dataset used is a collection of questions on Indonesian subjects at the primary and secondary school levels. The results of this study show that the best performance of the Support Vector Machine is obtained from the use of the cosine similarity feature with an accuracy of 85%. While the use of the POS Tag feature or the combination of POS Tag and cosine similarity causes the model to be overfitted and the accuracy decreases to 77%. Meanwhile, for the IndoBERT model, an accuracy of 95% was obtained.