Sita Nabila
Department of Computer Sciences, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University Bioinformatics Working Group, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling Lailan Sahrina Hasibuan; Sita Nabila; Nurul Hudachair; Muhammad Abrar Istiadi
Indonesian Journal of Artificial Intelligence and Data Mining Vol 1, No 1 (2018): March 2018
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (566.185 KB) | DOI: 10.24014/ijaidm.v1i1.4616

Abstract

Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.