Register: Jurnal Ilmiah Teknologi Sistem Informasi
Vol. 8 No. 1 (2022): January

Effect of information gain on document classification using k-nearest neighbor

Rifki Indra Perwira (Universitas Pembangunan Nasional “Veteran” Yogyakarta)
Bambang Yuwono (Universitas Pembangunan Nasional “Veteran” Yogyakarta)
Risya Ines Putri Siswoyo (Universitas Pembangunan Nasional “Veteran” Yogyakarta)
Febri Liantoni (Universitas Sebelas Maret)
Hidayatulah Himawan (Universiti Teknikal Malaysia Melaka)



Article Info

Publish Date
05 Jan 2022

Abstract

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.

Copyrights © 2022






Journal Info

Abbrev

register

Publisher

Subject

Computer Science & IT

Description

Register: Jurnal Ilmiah Teknologi Sistem Informasi published by the Department of Information Systems Unipdu Jombang. Register published twice a year, in January and July, Registerincludes research in the field of Information Technology, Information Systems Engineering, Intelligent Business Systems, ...