Journal of Applied Data Sciences
Vol 5, No 1: JANUARY 2024

Deciphering Digital Social Dynamics: A Comparative Study of Logistic Regression and Random Forest in Predicting E-Commerce Customer Behavior

Po Abas Sunarya (University of Raharja, Tangerang)
Untung Rahardja (University of Raharja, Tangerang)
Shih Chih Chen (National Kaohsiung University of Science and Technology)
Yung-Ming Lic (National Yang Ming Chiao Tung University)
Marviola Hardini (University of Raharja, Tangerang)



Article Info

Publish Date
29 Jan 2024

Abstract

This study compares Logistic Regression and Random Forest in predicting e-commerce customer churn. Utilizing the E-commerce Customer dataset, it navigates the complexities of customer interactions and behaviors, offering a rich context for analysis. The methodology focuses on meticulous data preprocessing to ensure data integrity, setting the stage for applying and evaluating Logistic Regression and Random Forest. Both models were assessed using accuracy, precision, recall, F1-Score, and AUC-ROC. Logistic Regression showed an accuracy of 90%, precision of 91% for class 0 and 82% for class 1, recall of 98% for class 0 and 50% for class 1, F1-Score of 94% for class 0 and 62% for class 1, and AUC-ROC of 0.88. Random Forest, with its ability to handle complex patterns, demonstrated higher overall performance with an accuracy of 95%, precision of 95% for class 0 and 93% for class 1, recall of 99% for class 0 and 74% for class 1, F1-Score of 97% for class 0 and 82% for class 1, and an AUC-ROC of 0.97. This comparative analysis offers insights into each model's strengths and suitability for predicting customer churn. The findings contribute to a deeper understanding of machine learning applications in e-commerce, guiding stakeholders in enhancing customer retention strategies. This research provides a foundation for further exploration into the digital social dynamics that shape customer behavior in the evolving digital marketplace.

Copyrights © 2024






Journal Info

Abbrev

JADS

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management

Description

One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes ...