Muhammad Ibnu Choldun Rachmatullah
Politeknik Pos Indonesia

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

Proposed Modification of K-Means Clustering Algorithm with Distance Calculation Based on Correlation Muhammad Ibnu Choldun Rachmatullah
Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Vol 8, No 1 (2022): March
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26555/jiteki.v8i1.23696

Abstract

Clustering is a technique in data mining that groups a set of data into groups (clusters) of similar data. In general, there are two methods of clustering, namely the hierarchical method and the partition method. One of the most commonly used partition clustering methods in clustering is K-Means. The use of K-means method has been widely used in various fields with various purposes. Many research has been carried out to improve the performance of the K-Means method, for example, by modifying the method of determining the initial centroid or determining the appropriate number of clusters. In this research, the modification of the K-Means algorithm was carried out in calculating the distance by considering the correlation value between attributes. Attributes that have a high correlation value are assumed to have similar characteristics so that they determine the location of data in a particular cluster. The steps of the proposed method are: calculating the correlation value between attributes, determining the cluster centroid, calculating the distance by considering the value of correlation, and determining the data into certain clusters. The first contribution of this research is to propose a new distance calculation technique in the K-Means algorithm by considering correlation and the second contribution is to apply the proposed algorithm to a specific dataset, namely Iris dataset. In this research, the performance calculation of the modified algorithm was also carried out. From the experimental results using the Iris dataset, the proposed modification of the K-Means algorithm has fewer iterations than the original K-Means method, so that it requires less processing time. The original K-Means method requires 8 iterations, while the proposed method requires only 6 iterations. The proposed method also produces a higher accuracy rate of 89.33% than the original K-Means method, which is 82.67%.
The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data Muhammad Ibnu Choldun Rachmatullah
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol 22 No 1 (2022)
Publisher : LPPM Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v22i1.1803

Abstract

One of the problems that are often faced by classifier algorithms is related to the problem of imbalanced data. One of the recommended improvement methods at the data level is to balance the number of data in different classes by enlarging the sample to the minority class (oversampling), one of which is called The Synthetic Minority Oversampling Technique (SMOTE). SMOTE is commonly used to balance data consisting of two classes. In this research, SMOTE was used to balance multi-class data. The purpose of this research is to balance multi-class data by applying SMOTE repeatedly. This iterative process needs to be applied if the number of unbalanced data classes is more than two classes, because the one-time SMOTE process is only suitable for binary classification or the number of unbalanced data classes is only one class. To see the performance of iterative SMOTE, the SMOTE datasets were classified using a neural network, k-NN, Nave Bayes, and Random Forest and the performance measures were measured in terms of accuracy, sensitivity, and specificity. The experiment in this research used the Glass Identification dataset which had six classes, and the SMOTE process was repeated five times. The best performance was achieved by the Random Forest classifier method with accuracy = 86.27%, sensitivity = 86.18%, and specificity = 95.82%. The result of experiment present that repeated SMOTE results can increase the performance of classification.