Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika
Vol. 6 No. 2 October 2020

Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions

Waspada, Indra (Unknown)
Bahtiar, Nurdin (Unknown)
Wirawan, Panji Wisnu (Unknown)
Awan, Bagus Dwi Ari (Unknown)



Article Info

Publish Date
27 Oct 2020

Abstract

Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228.

Copyrights © 2020






Journal Info

Abbrev

khif

Publisher

Subject

Computer Science & IT

Description

Khazanah Informatika: Jurnal Ilmiah Komputer dan Informatika, an Indonesian national journal, publishes high quality research papers in the broad field of Informatics and Computer Science, which encompasses software engineering, information system development, computer systems, computer network, ...