Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : SENTRI: Jurnal Riset Ilmiah

CLASSIFICATION OF SMS SPAM WITH N-GRAM AND PEARSON CORRELATION BASED USING MACHINE LEARNING TECHNIQUES Nova Tri Romadloni; Nisa Dwi Septiyanti; Cucut Hariz Pratomo; Wakhid Kurniawan; Rauhulloh Ayatulloh Khomeini Noor Bintang
SENTRI: Jurnal Riset Ilmiah Vol. 3 No. 2 (2024): SENTRI : Jurnal Riset Ilmiah, February 2024
Publisher : LPPM Institut Pendidikan Nusantara Global

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.55681/sentri.v3i2.2252

Abstract

The Short Message Service (SMS) has garnered widespread popularity due to its simplicity, reliability, and ubiquitous accessibility.This study aims to enhance the efficacy of SMS classification by refining the classification process itself. Specifically, it strives to streamline the process by diminishing feature dimensions and eliminating inconsequential attributes. The textual data undergoes preprocessing, which involves employing the N-Gram technique for feature representation, followed by meticulous feature selection utilizing Pearson Correlation. The study employs 5 of classification algorithms. Notably, the findings underscore that the optimal outcomes emerge from the fusion of the N-Gram methodology with feature selection through Pearson Correlation. Among these, the Support Vector Machine methodology stands out, exhibiting a remarkable 91.41% enhancement in accuracy without feature selection, a further improvement to 91.96% through N-Gram utilization, and a final performance of 70.80% following the inclusion of weighted correlation. However, it is imperative to acknowledge the limitations inherent in the model's generalizability, primarily stemming from the utilization of a relatively modest dataset. Despite the efficacy of Pearson correlation and N-gram-based feature selection in curbing data dimensionality and enhancing processing efficiency, certain pertinent features may have been overlooked, or the chosen attributes might not be optimally suited for specific classifications.