Journal of Data Science and Its Applications
Vol 4 No 1 (2021): Journal of Data Science and Its Applications

Multi Label Topic Classification for Hadith Bukhari in Indonesian Translation using Random Forest

Adhitia Wiraguna (Telkom Indonesia)
said al faraby (Telkom University)
Adiwijaya Adiwijaya (Unknown)



Article Info

Publish Date
23 Oct 2021

Abstract

Hadith is a mandatory thing to be studied and practiced by Muslims. There are many types of teachingsthat humans can take by studying the hadith. To assist Muslims in studying the hadith, a multi labelclassification system is needed to categorize Sahih Bukhari Hadi in Indonesian translation based on threetopics, namely prohibition, advice and information. In building a text classification system, there are variousclassification methods that can be used, in this study using Random Forest (RF). The simplicity of the RFalgorithm and good ability to deal with high dimensional data, make RF a suitable method of textclassification. But, there is not widely known RF capability for the multi label classification. This study usesthe Problem Transformation approach method, namely Binary Relevance (BR) and Label Powerset (LP)to adapt RF in building a multi label classification system. The results showed that the best hamming lossperformance obtained from a system that used BR and does not use stemming which is equal to 0,0663.These results indicate that the BR method is better than the LP method in adapting the RF algorithm toperform multi label classification of hadith data. This is happened because the BR method produces aclassification model of the number of labels in the hadith data and on the other hand, the transformation ofdata from the use of LP makes the data are imbalanced.

Copyrights © 2021






Journal Info

Abbrev

jdsa

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

JDSA welcomes all topics that are relevant to data science, computational linguistics, and information sciences. The listed topics of interest are as follows: Big Data Analytics Computational Linguistics Data Clustering and Classifications Data Mining and Data Analytics Data Visualization ...