TEPIAN
Vol 2 No 2 (2021): June 2021

Android App Rating Classification on Google Play Store Using Random Forest Algorithm with SQL Server Preprocessing

Raissa Maringka (Amikom University)
Aulia Khoirunnita (Amikom University)
Rodney Maringka (Amikom University)
Erna Utami (Amikom University)
Kusnawi (Amikom University)



Article Info

Publish Date
01 Jun 2021

Abstract

The increasing number of Android applications available on the Google Play Store with the benefits the developers get has attracted the attention of many Android application developers. To benefit from developing Android apps, one way is to know the characteristics of highly rated apps on the Google Play Store. This research will investigate the features of size, installs, reviews, type (free / paid), rating, category, content rating, and price on applications on the Google Play Store to determine the characteristics of high-rated applications. This study uses the Random Forest algorithm to identify the most influential features in high ranking applications on the Google Play Store. At the preprocessing stage, this research uses data cleaning methods and data reduction using SQL Server. This study uses feature important to find out the attributes that most influence the high ranking of Android apps on the Google Play Store. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy. To classify high-ranking applications, the authors use 8-fold cross validation using the Random Forest algorithm and get better results than the Gradient Boost, K-NN, and Decision Tree algorithms with an accuracy of 83%. The results of the Random Forest algorithm also have better performance than the algorithm from the previous research conclusions, with a 0.8% increase in accuracy.

Copyrights © 2021






Journal Info

Abbrev

tepian

Publisher

Subject

Computer Science & IT

Description

The purpose of TEPIAN is to publish original research studies directly relevant to computer science. TEPIAN encompasses the full spectrum of information technology and computer science, including information system, hardware technology, intelligent system, and multimedia applications. TEPIAN ...