ComEngApp : Computer Engineering and Applications Journal
Vol 9 No 2 (2020)

Author Matching Classification with Anomaly Detection Approach for Bibliomethric Repository Data

Zaqqi Yamani (Universitas Sriwijaya)
Siti Nurmaini (Unknown)
Dian Palupi Rini (Unknown)



Article Info

Publish Date
01 Jun 2020

Abstract

Authors name disambiguation (AND) is a complex problem in the process of identifying an author in a digital library (DL). The AND data classification process is very much determined by the grouping process and data processing techniques before entering the classifier algorithm. In general, the data pre-processing technique used is pairwise and similarity to do author matching. In a large enough data set scale, the pairwise technique used in this study is to do a combination of each attribute in the AND dataset and by defining a binary class for each author matching combination, where the unequal author is given a value of 0 and the same author is given a value of 1. The technique produces very high imbalance data where class 0 becomes 98.9% of the amount of data compared to 1.1% of class 1. The results bring up an analysis in which class 1 can be considered and processed as data anomaly of the whole data. Therefore, anomaly detection is the method chosen in this study using the Isolation Forest algorithm as its classifier. The results obtained are very satisfying in terms of accuracy which can reach 99.5%.

Copyrights © 2020






Journal Info

Abbrev

comengapp

Publisher

Subject

Computer Science & IT Engineering

Description

ComEngApp-Journal (Collaboration between University of Sriwijaya, Kirklareli University and IAES) is an international forum for scientists and engineers involved in all aspects of computer engineering and technology to publish high quality and refereed papers. This Journal is an open access journal ...