Twitter is a social media that interacts through 140-character text-based tweet posts including photos, videos and hyperlinks. Spam tweets contain harmful messages sent continuously. Besides disturbing it is also dangerous for the recipient, exacerbated by the use of bots that automatically and quickly spread spam messages that can cause data damage. This study aims to detect spam bots by utilizing the similarity of tweets using Smith Waterman and the posting time interval. Data tweets are collected using scrap libraries in python in the form of id, text, time, link, based on datasets labeled as available. The data is carried out by text preprocessing steps to clean the text and then do the calculations. The calculation results of both the similarity method and the post time interval are then classified with k-Neaset Neighbor with the previous dataset that has been labeled to get the spam or legitimate bot prediction results. The results of classification experiments with several combinations of k to detect spam bots with similarity criteria and entropy interval obtained the best results k = 3 Neirest Neighbor and 10 fold Cross Validation with a predictive value of detection accuracy of 80%, 84% precission and 84% recall.
Copyrights © 2018