Selamat Subagio
Universitas Al Washliyah, Rantauprapat

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

BERTopic Modeling of Natural Language Processing Abstracts: Thematic Structure and Trajectory Samsir Samsir; Reagan Surbakti Saragih; Selamat Subagio; Rahmad Aditiya; Ronal Watrianthos
JURNAL MEDIA INFORMATIKA BUDIDARMA Vol 7, No 3 (2023): Juli 2023
Publisher : Universitas Budi Darma

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30865/mib.v7i3.6426

Abstract

The rapid growth in the academic literature presents challenges in identifying relevant studies. This research aimed to apply unsupervised clustering techniques to 13,027 Scopus abstracts to uncover structure and themes in natural language processing (NLP) publications. Abstracts were pre-processed with tokenization, lemmatization, and vectorization. The BERTopic algorithm was used for clustering, using the MiniLM-L6-v2 embedding model and a minimum topic size of 50. Quantitative analysis revealed eight main topics, with sizes ranging from 205 to 4089 abstracts per topic. The language models topic was most prominent with 4089 abstracts. The topics were evaluated using coherence scores between 0.42 and 0.58, indicating meaningful themes. Keywords and sample documents provided interpretable topic representations. The results showcase the ability to produce coherent topics and capture connections between NLP studies. Clustering supports focused browsing and identification of relevant literature. Unlike human-curated classifications, the unsupervised data-driven approach prevents bias. Given the need to understand research trends, clustering abstracts enables efficient knowledge discovery from scientific corpora. This methodology can be applied to various datasets and fields to uncover overlooked patterns. The ability to adjust parameters allows for customized analysis. In general, unsupervised clustering provides a versatile framework for navigating, summarizing, and analyzing academic literature as volumes expand exponentially.