Claim Missing Document
Check
Articles

Found 7 Documents
Search

Improving DNA Barcode-based Fish Identification System on Imbalanced Data using SMOTE Wisnu Ananta Kusuma; Nurdevi Noviana; Lailan Sahrina Hasibuan; Mala Nurilmala
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 15, No 3: September 2017
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.12928/telkomnika.v15i3.5011

Abstract

Problem in imbalanced data is very common in classification or identification. The problem is raised when the number of instances of one class far exceeds the other. In the previous research, our DNA barcode-based Identification System of Tuna and Mackerel was developed in imbalanced dataset. The number of samples of Tuna and Mackerel were much more than those of other fish samples. Therefore, the accuracy of the classification model was probably still in bias. This research aimed at employing Synthetic Minority Oversampling Technique (SMOTE) to yield balanced dataset. We used k-mers frequencies from DNA barcode sequences as features and Support Vector Machine (SVM) as classification method. In this research we used trinucleotide (3-mers) and tetranucleotide (4-mers). The training dataset was taken from Barcode of Life Database (BOLD). For evaluating the model, we compared the accuracy of model using SMOTE and without SMOTE in order to classify DNA barcode sequences which is taken from Department of Aquatic Product Technology, Bogor Agricultural University. The results showed that the accuracy of the model in the species level using SMOTE was 7% and 13% higher than those of non-SMOTE for trinucleotide (3-mers) and tetranucleotide (4-mers), respectively. It is expected that the use of SMOTE, as one of data balancing technique, could increase the accuracy of DNA barcode based fish classification system, particularly in the species level which is difficult to be identified.
Model Spasial untuk Prediksi Konsentrasi Polutan Kabut Asap Kebakaran Lahan Gambut Menggunakan Support Vector Regression Muhammad Asyhar Agmalaro; Imas Sukaesih Sitanggang; Lailan Sahrina Hasibuan; Muhammad Murtadha Ramadhan
Jurnal Ilmu Komputer & Agri-Informatika Vol. 5 No. 2 (2018)
Publisher : Departemen Ilmu Komputer - IPB University

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (498.701 KB) | DOI: 10.29244/jika.5.2.119-127

Abstract

Kabut asap dari kebakaran lahan gambut mengandung berbagai macam polutan seperti CO dan CO2. Polutan tersebut dapat berimplikasi buruk pada kesehatan masyarakat sekitar peristiwa itu terjadi yang berupa Infeksi Saluran Pernafasan Atas (ISPA). Penelitian ini bertujuan untuk membuat model spasial untuk prediksi konsentrasi polutan kabut asap yang berupa CO dan CO2 dari kebakaran lahan gambut di Sumatra tahun 2015. Model spasial dibentuk menggunakan algoritme support vector regression (SVR) dengan kernel radial basis function (RBF) dengan melihat konsentrasi polutan dari beberapa titik tetangga. Parameter tuning dilakukan untuk mendapatkan nilai parameter paling optimal dari SVR. Hasil penelitian menunjukkan bahwa model spasial prediksi konsentrasi CO terbaik didapatkan pada gamma dengan nilai 20 yang menghasilkan root mean squared error (RMSE) dan nilai koefisien korelasi sebesar 1,174242×10-8 dan 0,5879287. Model spasial prediksi konsentrasi CO2 terbaik dibentuk pada gamma dengan nilai 10 yang menghasilkan RMSE dan nilai koefisien korelasi sebesar 9,843717×10-8 dan 0,6058418. Hasil prediksi dari model yang dibentuk telah dapat mengikuti pola nilai aktual konsentrasi polutan. Kata Kunci: CO, CO2, kabut asap, model spasial, support vector regression.
Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling Lailan Sahrina Hasibuan; Sita Nabila; Nurul Hudachair; Muhammad Abrar Istiadi
Indonesian Journal of Artificial Intelligence and Data Mining Vol 1, No 1 (2018): March 2018
Publisher : Universitas Islam Negeri Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (566.185 KB) | DOI: 10.24014/ijaidm.v1i1.4616

Abstract

Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.
Asosiasi Single Nucleotide Polymorphism pada Diabetes Mellitus Tipe 2 Menggunakan Random Forest Regression Lina Herlina Tresnawati; Wisnu Ananta Kusuma; Sony Hartono Wijaya; Lailan Sahrina Hasibuan
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 8 No 4: November 2019
Publisher : Departemen Teknik Elektro dan Teknologi Informasi, Fakultas Teknik, Universitas Gadjah Mada

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (1046.657 KB)

Abstract

Precision medicine can be developed by determining association between genomic data, represented by Single Nucleotide Polymorphism (SNP), and phenotype of diabetes mellitus type 2 (T2D). The number of SNP is actually very abundance. Thus, sorting and filtering the SNP is required before conducting association. The purpose of this paper was to associate SNP with T2D phenotypes. SNP ranking was conducted to choose significant SNPs by calculating importance score. Selected SNPs were associated with T2D phenotype using random forest regression. Moreover, the epistasis was also examined to show the interactions among SNPs affecting phenotype. This paper obtained 301 importance SNPs. Top ten SNPs have association with five T2D protein candidates. The evaluation results of the proposed models showed the Mean Absolute Error (MAE) of 0.062. This results indicate the success of random forest regression in conducting SNP and phenotype association and epistatic examination between two SNPs.
Pemodelan Berbasis Jaringan untuk Pengklasifikasian Kanker Payudara Berdasarkan Data Molekuler Mushthofa; Chamdan L Abdulbaaqiy; Sony Hartono Wijaya; Muhammad Asyhar Agmalaro; Lailan Sahrina Hasibuan
Jurnal Ilmu Komputer dan Agri-Informatika Vol 9 No 1 (2022)
Publisher : Departemen Ilmu Komputer, Institut Pertanian Bogor

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/jika.9.1.101-113

Abstract

Cancer is a disease characterized by uncontrolled cell growth. One of the characteristics of uncontrolled growth is the presence of estrogen-receptor-positive (ER+). About 67% of breast cancer test results have ER+. Breast cancer profiles are divided into 4 subtypes, namely: Luminal A, Luminal B, basal-like, and HER-2 enriched. Each category has a different effect on adjuvant chemotherapy. In this study, a network-based approach was used to select features/molecular biomarkers that have the potential to assist modeling and classifying sub-types of breast cancer. The molecular features used are Copy Number Alteration (CNA) and gene expression. The feature selection results were compared with the PAM50 feature-based accuracy from the literature study. The results indicate that the features selected from this network-based approach can obtain a comparable performance w.r.t the original PAM50 features, and can be used as alternative to perform breast cancer subtyping.
Prediksi Harga Minyak Goreng Curah dan Kemasan Menggunakan Algoritme Long Short-Term Memory (LSTM) Lailan Sahrina Hasibuan; Yanda Novialdi
Jurnal Ilmu Komputer dan Agri-Informatika Vol 9 No 2 (2022)
Publisher : Departemen Ilmu Komputer, Institut Pertanian Bogor

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/jika.9.2.149-157

Abstract

A very significant increase in the price of basic necessities will affect the economy of the Indonesian people, such as lowering purchasing power. Based on the monitoring of the Strategic Food Price Information Center from November 2021 to August 2022, cooking oil is a necessities that experienced a very significant increase of price in Indonesia. This increase was spread evenly across 34 provinces of Indonesia, including the province of West Java. This significant increase can be prevented by taking preventive actions before, if this increase has been predicted. Deep Learning is a supervised learning method that is widely used today because of its reliability in solving various problems in the field of data mining. Deep learning can predict future cooking oil prices using time series data. This study develops a model to predict the price of cooking oil in bulk and packaged form using deep learning that specifically manages time series data, namely Long Short Term Memory (LSTM). Based on the NRMSE evaluation metric, the model built is able to recognize the price fluctuation of cooking oil in the form of bulk and packaging. The NRMSE value of the LSTM model in the training process is 0.019 for bulk cooking oil data training, and 0.037 for packaged cooking oil data.
Integrasi data Protein-Protein Interactions dan Pathway untuk Menentukan Score pada pathway Menggunakan Analisis Graf Lailan Sahrina Hasibuan; Ahmad Fariqi; Lilik Prayitno; Melly Br Bangun
KLIK: Kajian Ilmiah Informatika dan Komputer Vol. 3 No. 6 (2023): Juni 2023
Publisher : STMIK Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/klik.v3i6.932

Abstract

The development of molecular biology technology produces large amounts of omics data. Integration of omics data is useful for the analysis of biological processes at the molecular level, such as protein expression, drug mechanisms against diseases, and mechanisms of inheritance. This study aims to integrate protein molecular biology data through protein-protein interactions (PPIs), pathways, modules and orthology, to calculate pathway scores. The score calculation uses the degree calculation on the graph concept. Proteins, pathways, modules and orthologs act as nodes, while the interactions between them act as edges. Furthermore, according to the concept of a graph, nodes with a high degree represent nodes that have an important role in a graph. Based on this concept, the most important pathway related to a protein is the pathway with the highest degree in a multipartite graph formed by PPIs, modules, orthologs and pathways. The output of this study is a package in the R language to integrate data on molecular biology of proteins, pathways, modules and orthology, then displays the pathways that have the most role in protein based on the order of the highest score. This package was tested using protein Insulin (INS) and Xanthine dehydrogenase (XDH) inputs. The results of calculating the score on the pathway for INS produced the pathway with the highest score, namely MAPK signaling pathway (0.18) lane 1, Pathways in cancer (0.137) lane 2, Ubiquitin mediated proteolysis (0.28) lane 3. XDH protein input produces Purine metabolism pathway (0.67) lane 1, Metabolic pathways (0.48) lane 2 and Purine metabolism (0.23) lane 3. These results can be used for enrichment analysis regarding the relationship between proteins and pathways.