Claim Missing Document
Check
Articles

Found 2 Documents
Search

Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest Fachrul Mustofa; Achmad Nuruddin Safriandono; Ahmad Rofiqul Muslikh; De Rosal Ignatius Moses Setiadi
Journal of Computing Theories and Applications Vol 1, No 1 (2023): August-September
Publisher : LPPM Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33633/jcta.v1i1.9190

Abstract

Diabetes Mellitus is a hazardous disease, and according to the World Health Organization (WHO), diabetes will be one of the main causes of death by 2030. One of the most popular diabetes datasets is PIMA Indians, and this dataset has been widely tested on various machine learning (ML) methods, even deep learning (DL). But on average, ML methods are not able to produce good accuracy. The quality of the dataset and features is the most influential thing in this case, so deeper investment is needed to examine this dataset. This research will analyze and compare the PIMA Indians and Abelvikas datasets using the Random Forest (RF) method. The two datasets are imbalanced, in fact, the Abelvikas dataset is more imbalanced and has a larger number of classes so it is be more complex. The RF was chosen because it is one of the ML methods that has the best results on various diabetes datasets. Based on the test results, very contrasting results were obtained on the two datasets. Abelvikas had accuracy, precision, and recall, reaching 100%, and PIMA Indians only achieved 75% for accuracy, 87% for precision, and 80% for the best recall. Testing was done with 3, 5, 7, 10, and 15 tree number parameters. Apart from that, it was also tested with k-fold validation to get valid results. This determines that the features in the Abelvikas dataset are much better because more complete glucose features support them.
Multi-label Classification of Indonesian Al-Quran Translation based CNN, BiLSTM, and FastText Ahmad Rofiqul Muslikh; Ismail Akbar; De Rosal Ignatius Moses Setiadi; Hussain Md Mehedul Islam
Techno.Com Vol 23, No 1 (2024): Februari 2024
Publisher : LPPM Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/tc.v23i1.9925

Abstract

Studying the Qur'an is a pivotal act of worship in Islam, which necessitates a structured understanding of its verses to facilitate learning and referencing. Reflecting this complexity, each Quranic verse is rich with unique thematic elements and can be classified into a range of distinct categories. This study explores the enhancement of a multi-label classification model through the integration of FastText. Employing a CNN+Bi-LSTM architecture, the research undertakes the classification of Quranic translations across categories such as Tauhid, Ibadah, Akhlak, and Sejarah. Based on model evaluation using F1-Score, it shows significant differences between the CNN+Bi-LSTM model without FastText, with the highest result being 68.70% in the 80:20 testing configuration. Conversely, the CNN+Bi-LSTM+FastText model, combining embedding size and epoch parameters, achieves a result of 73.30% with an embedding size of 200, epoch of 100, and a 90:10 testing configuration. These findings underscore the significant impact of FastText on model optimization, with an enhancement margin of 4.6% over the base model.