ComTech: Computer, Mathematics and Engineering Applications
Vol. 11 No. 2 (2020): ComTech

Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso

Masithoh Yessi Rochayani (Universitas Brawijaya)
Umu Sa'adah (Universitas Brawijaya)
Ani Budi Astuti (Universitas Brawijaya)



Article Info

Publish Date
16 Dec 2020

Abstract

The research conducted undersampling and gene selection as a starting point for cancer classification in gene expression datasets with a high-dimensional and imbalanced class. It investigated whether implementing undersampling before gene selection gave better results than without implementing undersampling. The used undersampling method was Random Undersampling (RUS), and for gene selection, it was Lasso. Then, the selected genes based on theory were validated. To explore the effectiveness of applying RUS before gene selection, the researchers used two gene expression datasets. Both of the datasets consisted of two classes, 1.545 observations and 10.935 genes, but had a different imbalance ratio. The results show that the proposed gene selection methods, namely Lasso and RUS + Lasso, can produce several important biomarkers, and the obtained model has high accuracy. However, the model is complicated since it involves too many genes. It also finds that undersampling is not affected when it is implemented in a less imbalanced class. Meanwhile, when the dataset is highly imbalanced, undersampling can remove a lot of information from the majority class. Nevertheless, the effectiveness of undersampling remains unclear. Simulation studies can be carried out in the next research to investigate when undersampling should be implemented.

Copyrights © 2020






Journal Info

Abbrev

comtech

Publisher

Subject

Computer Science & IT Engineering Mathematics

Description

The journal invites professionals in the world of education, research, and entrepreneurship to participate in disseminating ideas, concepts, new theories, or science development in the field of Information Systems, Architecture, Civil Engineering, Computer Engineering, Industrial Engineering, Food ...