Journal of Dinda : Data Science, Information Technology, and Data Analytics
Vol 2 No 1 (2022): February

Perbandingan Performa Antara Algoritma Naive Bayes Dan K-Nearest Neighbour Pada Klasifikasi Kanker Payudara

Annisa Nugraheni (Institut Teknologi Telkom Purwokerto)
Rima Dias Ramadhani (Institut Teknologi Telkom Purwokerto)
Amalia Beladinna Arifa (Institut Teknologi Telkom Purwokerto)
Agi Prasetiadi (Institut Teknologi Telkom Purwokerto)



Article Info

Publish Date
23 Feb 2022

Abstract

Breast cancer is the second most common cause of death from cancer after lung cancer is in the first place. Breast cancer occurs when cells in breast tissue begin to grow uncontrollably and can disrupt existing healthy tissue. Therefore, there is a need for a classification to distinguish breast cancer patients and healthy people. Based on previous research, the Naïve Bayes and K-Nearest Neighbor algorithms are considered capable of classifying breast cancer. In the research process using the breast cancer dataset from the Breast Cancer Coimbra dataset in 2018 UCI Machine Learning Repository with a total of 116 data, while for the calculation of the feasibility of the method using the Confusion Matrix (Accuracy, Precision, and Recall) and the ROC-AUC curve. The purpose of this study is to compare the performance of the Naïve Bayes and K-Nearest Neighbor algorithms. In testing using the Naïve Bayes algorithm and the K-Nearest Neighbor algorithm, there are several test scenarios, namely, data testing before and after normalization, model testing based on a comparison of training data and testing data, model testing based on K values ​​in K-Nearest Neighbors, and model testing. based on the selection of the strongest attribute with the Pearson correlation test. The results of this study indicate that the Naïve Bayes algorithm has the highest average accuracy of 69.12%, healthy precision 64.90%, pain precision 83%, healthy recall 88%, sick recall 61.11% and AUC 0.82 which is included in the good classification category. Meanwhile, the highest average results of the K-Nearest Neighbor algorithm are 76.83% for accuracy, 76% healthy precision, 80.21% pain precision, 74.18% for healthy recall, 80.81% sick recall and 0.91 AUC which is included in the excellent classification category.

Copyrights © 2022






Journal Info

Abbrev

dinda

Publisher

Subject

Computer Science & IT

Description

Journal of Dinda : Data Science, Information Technology, and Data Analytics as a publication media for research results in the fields of Data Science, Information Technology, and Data Analytics, but not implicitly limited. Published 2 times a year in February and August. The journal is managed by ...