Journal of Data Science and Its Applications
Vol 2 No 2 (2019): Journal of Data Science and Its Applications

Sentiment Analysis of Movie Review using Naïve Bayes Method with Gini Index Feature Selection

Riko Bintang Purnomoputra (Telkom University)
Adiwijaya Adiwijaya (Unknown)
Untari Novia Wisesty (Unknown)



Article Info

Publish Date
15 Nov 2019

Abstract

In movie reviews, there is information that determines whether the movie is good or bad. Sentiment analysis is used to process information to determine the polarity of the sentence. With unstructured reviews and a lot of data attributes so that it requires much time and computational capabilities that become a problem in the classification process. To process a lot of data selection features becomes a solution to reduce dimensions so it accelerate the classification process and reduce the occurrence of misclassification. The first Gini Index Text feature selection used to classify documents and successfully enhanced the classifier performance. Multinomial Naïve Bayes (MNNB) is a popular classifier used for document classification however, will the Gini Index Text feature selection able to improve MNNB classification performance. Therefore in this study the author aims to use the Gini Index Text (GIT) for text feature selection with MNNB classifier to classify movie review into positive and negative classes. The data used is IMDB dataset that contains reviews in English sentences, the data will be divided into two parts, training data is 90% and data testing is 10%. The test results prove that the Gini index as a selection feature can increase accuracy where accuracy without feature selection is 56% and with feature selection of 59.54% with an increase of 3.54%.

Copyrights © 2019






Journal Info

Abbrev

jdsa

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

JDSA welcomes all topics that are relevant to data science, computational linguistics, and information sciences. The listed topics of interest are as follows: Big Data Analytics Computational Linguistics Data Clustering and Classifications Data Mining and Data Analytics Data Visualization ...