Indonesian Journal of Electrical Engineering and Computer Science
Vol 31, No 2: August 2023

Summarizing twitter posts regarding COVID-19 based on n-grams

Noralhuda N. Alabid (University of Kufa)
Zahraa Naseer (University of Kufa)



Article Info

Publish Date
01 Aug 2023

Abstract

The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including the economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of the most active social media that enable users to tweet in different conversations they are concerned about. The problem arises when tweeters want to search about a specific topic. They can only sort tweets by its recency to understand conversation and not by relevancy. This makes tweeters read through the most tweets to understand what was firstly discussed about the related topic. Some strategies were developed for summarizing tweets but summarizing topics of COVID-19 are still at the beginning. The current research aims to introduce a technique to present a short summary related COVID-19 topics with consuming little time and effort. Thus, summarization task started by clustering topics based on latent dirichlet allocation (LDA) method and K-means clustering and then selected the important sentences to format summarization. The study also compares bigram-based and unigram-based summarization. Different metrics were used to evaluate results and experiments at each stage, and the output of the proposal system was evaluated using ROUGE metrics.

Copyrights © 2023