Biogenesis: Jurnal Ilmiah Biologi
Vol 8 No 1 (2020)

Performance Comparison of Data Sampling Techniques to Handle Imbalanced Class on Prediction of Compound-Protein Interaction

Akhmad Rezki Purnajaya (Universal University)
Wisnu Ananta Kusuma (IPB University)
Medria Kusuma Dewi Hardhienata (IPB University)



Article Info

Publish Date
30 Jun 2020

Abstract

The prediction of Compound-Protein Interactions (CPI) is an essential step in the drug-target analysis for developing new drugs as well as for drug repositioning. One challenging issue in this field is that commonly there are more numbers of non-interacting compound-protein pairs than interacting pairs. This problem causes bias, which may degrade the prediction of CPI. Besides, currently, there is not much research on CPI prediction that compares data sampling techniques to handle the class imbalance problem. To address this issue, we compare four data sampling techniques, namely Random Under-sampling (RUS), Combination of Over-Under-sampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link). The benchmark CPI data: Nuclear Receptor and G-Protein Coupled Receptor (GPCR) are used to test these techniques. Area Under Curve (AUC) applied to evaluate the CPI prediction performance of each technique. Results show that the AUC values for RUS, COUS, SMOTE, and T-Link are 0.75, 0.77, 0.85 and 0.79 respectively on Nuclear Receptor data and 0.70, 0.85, 0.91 and 0.72 respectively on GPCR data. These results indicate that SMOTE has the highest AUC values. Furthermore, we found that the SMOTE technique is more capable of handling class imbalance problems on CPI prediction compared to the remaining three other techniques.

Copyrights © 2020






Journal Info

Abbrev

biogenesis

Publisher

Subject

Agriculture, Biological Sciences & Forestry Biochemistry, Genetics & Molecular Biology Immunology & microbiology

Description

Biogenesis: Jurnal Ilmiah Biologi is a peer-reviewed and open-access journal that publishes original scientific work with the advancement of tropical bioscience in Asia. The integration between Islam and tropical bioscience explicitly represents the Biogenesis: Jurnal Ilmiah Biologi as an academic ...