Sentiment analysis or opinion mining, which is one of the application of Natural Language Processing (NLP), aims to find a method to facilitate human in communicating with a computer using their common language. To simplify the process of understanding human language, there are three important stages that must be carried out by a computer, which are tokenizing, stemming and filtering. The tokenizing that breaks down the sentence into a single word will make the computer assume all words (token) are the same. If there is a phrase formed from one of unimportant words, which is happened to be in the stoplist, the phrase will be deleted. Solution for the aforementioned problem is tokenizing based on phrase detection using Hidden Markov Model (HMM) POS-Tagger to improve classification performance using Support Vector Machine (SVM).With this approach, computer will be able to distinguish a phrase from others, then store the phrase into a single entity. There is an increase in accuracy by approximately 6% on Dataset I and 3% on Dataset II in the classification process using phrase detection, due to reduction of missing features that usually occurs in the filtering process. In addition, the detection of the phrase-based approach also produces the most optimal classification model, as seen from the ROC value that reaches 0.897.
Copyrights © 2016