Indonesian Journal of Electrical Engineering and Computer Science
Vol 21, No 2: February 2021

Knowledge discovery from gene expression dataset using bagging lasso decision tree

Umu Sa'adah (Universitas Brawijaya)
Masithoh Yessi Rochayani (Universitas Brawijaya)
Ani Budi Astuti (Universitas Brawijaya)



Article Info

Publish Date
01 Feb 2021

Abstract

Classifying high-dimensional data are a challenging task in data mining. Gene expression data is a type of high-dimensional data that has thousands of features. The study was proposing a method to extract knowledge from high-dimensional gene expression data by selecting features and classifying. Lasso was used for selecting features and the classification and regression tree (CART) algorithm was used to construct the decision tree model. To examine the stability of the lasso decision tree, we performed bootstrap aggregating (Bagging) with 50 replications. The gene expression data used was an ovarian tumor dataset that has 1,545 observations, 10,935 gene features, and binary class. The findings of this research showed that the lasso decision tree could produce an interpretable model that theoretically correct and had an accuracy of 89.32%. Meanwhile, the model obtained from the majority vote gave an accuracy of 90.29% which showed an increase in accuracy of 1% from the single lasso decision tree model. The slightly increasing accuracy shows that the lasso decision tree classifier is stable.

Copyrights © 2021