This research analyzed 176 Palembang public universities’ students’ theses which were published in 2020. The data was analyzed by conducting text processing and extraction with TF-IDF feature by using two scenarios, the reduced feature value and the unreduced one, with SVD method. In each scenario, three metrics, cosine, euclidean, and, manhattan were used, which generated six scenarios in total. The result found that the best quality of cluster which was measured by silhouette coefficient comes from metric cosine and reducted by SVD with the silhouette coefficient value of 0.88382763, intracluster value of 0.08688583, and intercluster value of 0.74671096. Therefore, the cluster quality value of the reducted feature is the best among all metrics. In addition, the use of DBSCAN method showed a positive correlation between epsilon and intracluster with the value of 0.97669, and also showed a negative correlation between epsilon and silhouette with the value of 0.9789.
Copyrights © 2022