Arif Ridho Lubis
Universitas Sumatera Utara

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 2 Documents
Search

The feature extraction for classifying words on social media with the Naïve Bayes algorithm Arif Ridho Lubis; Mahyuddin Khairuddin Matyuso Nasution; Opim Salim Sitompul; Elviawaty Muisa Zamzami
IAES International Journal of Artificial Intelligence (IJ-AI) Vol 11, No 3: September 2022
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijai.v11.i3.pp1041-1048

Abstract

To classify Naïve Bayes classification (NBC), however, it is necessary to have a previous pre-processing and feature extraction. Generally, pre-processing eliminates unnecessary words while feature extraction processes these words. This paper focuses on feature extraction in which calculations and searches are used by applying word2vec while in frequency using term frequency-Inverse document frequency (TF-IDF). The process of classifying words on Twitter with 1734 tweets which are defined as a document to weight the calculation of frequency with TF-IDF with words that often come out in tweet, the value of TF-IDF decreases and vice versa. Following the achievement of the weight value of the word in the tweet, the classification is carried out using Naïve Bayes with 1734 test data, yielding an accuracy of 88.8% in the Slack word category tweet and while in the tweet category of verb 78.79%. It can be concluded that the data in the form of words available on twitter can be classified and those that refer to slack words and verbs with a fairly good level of accuracy. so that it manifests from the habit of twitter social media user.
The effect of the TF-IDF algorithm in times series in forecasting word on social media Arif Ridho Lubis; Mahyuddin K. M. Nasution; Opim Salim Sitompul; Elviawaty Muisa Zamzami
Indonesian Journal of Electrical Engineering and Computer Science Vol 22, No 2: May 2021
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v22.i2.pp976-984

Abstract

Forecasting is one of the main topics in data mining or machine learning in which forecasting, a group of data used, has a label class or target. Thus, many algorithms for solving forecasting problems are categorized as supervised learning with the aim of conducting training. In this case, the things that were supervised were the label or target data playing a role as a 'supervisor' who supervise the training process in achieving a certain level of accuracy or precision. Time series is a method that is generally used to forecast based on time and can forecast words in social media. In this study had conducted the word forecasting on twitter with 1734 tweets which were interpreted as weighted documents using the TF-IDF algorithm with a frequency that often comes out in tweets so the TF-IDF value is getting smaller and vice versa. After getting the word weight value of the tweets, a time series forecast was performed with the test data of 1734 tweets that the results referred to 1203 categories of Slack words and 531 verb tweets as training data resulting in good accuracy. The division of word forecasting was classified into two groups i.e. inactive users and active users. The results obtained were processed with a MAPE calculation process of 50% for inactive users and 0.1980198% for active users.