Bandung Conference Series: Statistics
Vol. 3 No. 2 (2023): Bandung Conference Series: Statistics

Metode Random Forest untuk Klasifikasi Penyakit Diabetes

Dhea Agustina Hadi (Statistika, Fakultas Mattematika dan Ilmu Pengetahuan Alam, Universitas Islam Bandung)
Dwi Agustin Nuriani Sirodj (Statistika, Fakultas Matematika dan Ilmu Pengetahuan Alam, Universitas Islam Bandung)



Article Info

Publish Date
02 Aug 2023

Abstract

Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age. Abstract. Random Forest is a supervised learning algorithm developed from decision trees with the application of boostrap aggregating (bagging). This method grows trees from decision trees to produce a forest or the best model called the random forest model. Tree growth is done with randomly selected data with returns through the bagging process. Random forest is considered to provide better performance results for diabetes data among other supervised learning methods, because random forest and has the lowest error rate compared to other methods. Random forest is also an important technique for medical data classification, especially for diagnosing diabetics. In this study, classification was carried out using Pima Indian Diabetes data, which is an American tribe that lives in Arizona and Mexico. Classification analysis was carried out using an algorithm to see the level of accuracy in random forest classification on Pima Indian diabetes data. The results show that the accuracy value of random forest classification is 74.78%, this value is in the accuracy category at the fair classification level. In this random forest classification, there are three main variables that become importance variables, namely glucose then BMI, and age.

Copyrights © 2023






Journal Info

Abbrev

BCSS

Publisher

Subject

Decision Sciences, Operations Research & Management Education Mathematics

Description

Bandung Conference Series: Statistics (BCSS) menerbitkan artikel penelitian akademik tentang kajian teoritis dan terapan serta berfokus pada Statistika dengan ruang lingkup sebagai berikut: Alternating Least Square, Analisis Konjoin, Autoregressive, Auxiliary Variabel, Baby Birth, Block Maxima, ...