Sinkron : Jurnal dan Penelitian Teknik Informatika
Vol. 7 No. 4 (2022): Article Research: Volume 7 Number 4, October 2022

The effect of Chi-Square Feature Selection on Question Classification using Multinomial Naïve Bayes

Novi Yusliani (Universitas Sriwijaya, Indonesia)
Syechky Al Qodrin Aruda (Universitas Sriwijaya, Indonesia)
Mastura Diana Marieska (Universitas Sriwijaya, Indonesia)
Danny Mathew Saputra (Universitas Sriwijaya, Indonesia)
Abdiansah Abdiansah (Universitas Sriwijaya, Indonesia)



Article Info

Publish Date
09 Oct 2022

Abstract

Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.

Copyrights © 2022






Journal Info

Abbrev

sinkron

Publisher

Subject

Computer Science & IT

Description

Scope of SinkrOns Scientific Discussion 1. Machine Learning 2. Cryptography 3. Steganography 4. Digital Image Processing 5. Networking 6. Security 7. Algorithm and Programming 8. Computer Vision 9. Troubleshooting 10. Internet and E-Commerce 11. Artificial Intelligence 12. Data Mining 13. Artificial ...