Jurnal Ilmu Komputer dan Informasi
Vol 9, No 2 (2016): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)

FEATURE SELECTION METHODS BASED ON MUTUAL INFORMATION FOR CLASSIFYING HETEROGENEOUS FEATURES

Ratri Enggar Pawening (Department of Informatics, STT Nurul Jadid Paiton, Jl. Pondok Pesantren Nurul Jadid Paiton)
Tio Darmawan (Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember)
Rizqa Raaiqa Bintana (Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember (ITS), Surabaya, 60111, Indonesia Department of Informatics, Faculty of Science and Technology, UIN Sultan Syarif Kasim Riau, Jl. H.R Soebrantas, Pekanb)
Agus Zainal Arifin (Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember (ITS), Surabaya, 60111, Indonesia)
Darlis Herumurti (Department of Informatics, Faculty of Information Technology, Institut Teknologi Sepuluh Nopember (ITS), Surabaya, 60111, Indonesia)



Article Info

Publish Date
25 Jun 2016

Abstract

Datasets with heterogeneous features can affect feature selection results that are not appropriate because it is difficult to evaluate heterogeneous features concurrently. Feature transformation (FT) is another way to handle heterogeneous features subset selection. The results of transformation from non-numerical into numerical features may produce redundancy to the original numerical features. In this paper, we propose a method to select feature subset based on mutual information (MI) for classifying heterogeneous features. We use unsupervised feature transformation (UFT) methods and joint mutual information maximation (JMIM) methods. UFT methods is used to transform non-numerical features into numerical features. JMIM methods is used to select feature subset with a consideration of the class label. The transformed and the original features are combined entirely, then determine features subset by using JMIM methods, and classify them using support vector machine (SVM) algorithm. The classification accuracy are measured for any number of selected feature subset and compared between UFT-JMIM methods and Dummy-JMIM methods. The average classification accuracy for all experiments in this study that can be achieved by UFT-JMIM methods is about 84.47% and Dummy-JMIM methods is about 84.24%. This result shows that UFT-JMIM methods can minimize information loss between transformed and original features, and select feature subset to avoid redundant and irrelevant features.

Copyrights © 2016






Journal Info

Abbrev

JIKI

Publisher

Subject

Computer Science & IT

Description

Jurnal Ilmu Komputer dan Informasi is a scientific journal in computer science and information containing the scientific literature on studies of pure and applied research in computer science and information and public review of the development of theory, method and applied sciences related to the ...