Knowledge Engineering and Data Science
Vol 6, No 1 (2023)

Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts

Gunawan Gunawan (Institut Sains dan Teknologi Terpadu Surabaya)
Fitria Fitria (Institut Sains dan Teknologi Terpadu Surabaya)
Esther Irawati Setiawan (Institut Sains dan Teknologi Terpadu Surabaya)
Kimiya Fujisawa (Tokyo University of Technology)



Article Info

Publish Date
01 Aug 2023

Abstract

Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.

Copyrights © 2023






Journal Info

Abbrev

keds

Publisher

Subject

Computer Science & IT Engineering

Description

Knowledge Engineering and Data Science (2597-4637), KEDS, brings together researchers, industry practitioners, and potential users, to promote collaborations, exchange ideas and practices, discuss new opportunities, and investigate analytics frameworks on data-driven and knowledge base systems. ...