Hidayatulah Himawan
Informatika, Universitas Pembangunan Nasional Veteran Yogyakarta

Published : 22 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Register: Jurnal Ilmiah Teknologi Sistem Informasi

Effect of information gain on document classification using k-nearest neighbor Rifki Indra Perwira; Bambang Yuwono; Risya Ines Putri Siswoyo; Febri Liantoni; Hidayatulah Himawan
Register: Jurnal Ilmiah Teknologi Sistem Informasi Vol. 8 No. 1 (2022): January
Publisher : Information Systems - Universitas Pesantren Tinggi Darul Ulum

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26594/register.v8i1.2397

Abstract

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.