JURNAL MEDIA INFORMATIKA BUDIDARMA
Vol 7, No 3 (2023): Juli 2023

BERTopic Modeling of Natural Language Processing Abstracts: Thematic Structure and Trajectory

Samsir Samsir (Universitas Al Washliyah, Rantauprapat)
Reagan Surbakti Saragih (Universitas HKBP Nommensen, Pematangsiantar)
Selamat Subagio (Universitas Al Washliyah, Rantauprapat)
Rahmad Aditiya (Universitas Al Washliyah, Rantauprapat)
Ronal Watrianthos (Universitas Al Washliyah, Rantauprapat)



Article Info

Publish Date
31 Jul 2023

Abstract

The rapid growth in the academic literature presents challenges in identifying relevant studies. This research aimed to apply unsupervised clustering techniques to 13,027 Scopus abstracts to uncover structure and themes in natural language processing (NLP) publications. Abstracts were pre-processed with tokenization, lemmatization, and vectorization. The BERTopic algorithm was used for clustering, using the MiniLM-L6-v2 embedding model and a minimum topic size of 50. Quantitative analysis revealed eight main topics, with sizes ranging from 205 to 4089 abstracts per topic. The language models topic was most prominent with 4089 abstracts. The topics were evaluated using coherence scores between 0.42 and 0.58, indicating meaningful themes. Keywords and sample documents provided interpretable topic representations. The results showcase the ability to produce coherent topics and capture connections between NLP studies. Clustering supports focused browsing and identification of relevant literature. Unlike human-curated classifications, the unsupervised data-driven approach prevents bias. Given the need to understand research trends, clustering abstracts enables efficient knowledge discovery from scientific corpora. This methodology can be applied to various datasets and fields to uncover overlooked patterns. The ability to adjust parameters allows for customized analysis. In general, unsupervised clustering provides a versatile framework for navigating, summarizing, and analyzing academic literature as volumes expand exponentially.

Copyrights © 2023






Journal Info

Abbrev

mib

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering

Description

Decission Support System, Expert System, Informatics tecnique, Information System, Cryptography, Networking, Security, Computer Science, Image Processing, Artificial Inteligence, Steganography etc (related to informatics and computer ...