Xplore: Journal of Statistics
Vol. 12 No. 1 (2023): Vol. 12 No. 1 (2023)

Perbandingan Performa Metode Pohon Model Logistik dan Random Forest pada Pengklasifikasian Data

Purnama Sari (Department of Statistics, IPB University, Indonesia)
Kusman Sadik (Department of Statistics, IPB University, Indonesia)
Mulianto Raharjo (Kementerian Dalam Negeri Republik Indonesia, Indonesia)



Article Info

Publish Date
15 Jan 2023

Abstract

Multicollinearity and missing data are two common problems in big data. Missing data could decrease the prediction accuracy. Logistic model tree (LMT) is used to handle multicollinearity because multicollinearity does not affect the decision tree. Random forest can be used to decrease variance in prediction case. This study aimed to study the comparison of two methods, LMT and random forest, in multicollinearity and missing data in various cases using simulation study and real data as dataset. Evaluation model is based on classification accuracy and AUC measurement. The result stated that random forest had better performance if the multicollinearity level is moderate. LMT with omitted missing data is proven to have better performance for big data and when a high percentage of missing data occurred, and the multicollinearity level is severe. The next step is analysed real data with different sample size. The result stated that random forest have better performance. Omitted missing data have better performance in classification “breast cancer” data which consist 0,3 % missing data.

Copyrights © 2023






Journal Info

Abbrev

xplore

Publisher

Subject

Decision Sciences, Operations Research & Management Engineering Mathematics

Description

Xplore: Journal of Statistics diterbitkan berkala 3 (tiga) kali dalam setahun yang memuat tulisan ilmiah yang berhubungan dengan bidang statistika. Artikel yang dimuat berupa hasil penelitian atau kajian pustaka dalam bidang statistika dan atau penerapannya. ISSN: 2302-5751 Mulai Desember 2018, ...