Jurnal Informatika Polinema (JIP)
Vol. 9 No. 2 (2023): Vol 9 No 2 (2023)

TOPIC GROUPING BASED ON DESCRIPTION TEXT IN MICROSOFT RESEARCH VIDEO DESCRIPTION CORPUS DATA USING FASTTEXT, PCA AND K-MEANS CLUSTERING

Ahmad Hafidh Ayatullah (Unknown)
Nanik Suciati (Unknown)



Article Info

Publish Date
28 Feb 2023

Abstract

Video data retrieval can be done based on voice, image, or text data that represents video content. Searching for videos using text data can be done by calculating the similarity between the text descriptions provided by the user and the text descriptions of all the video data in the database. Only video data with a certain level of similarity will be provided to the user as a fetch result. Determining the similarity of the description text can be based on the clustering results of the feature representation of the description text with the word embedding used. This research groups topics of the Microsoft Research Video Description Corpus (MRVDC) based on text descriptions of Indonesian language dataset. The Microsoft Research Video Description Corpus (MRVDC) is a video dataset developed by Microsoft Research, which contains paraphrased event expressions in English and other languages. The results of grouping these topics show how the patterns of similarity and interrelationships between text descriptions from different video data, which will be useful for the topic-based video retrieval. The topic grouping process is based on text descriptions using fastText as word embedding, PCA as features reduction method and K[1]means as the clustering method. The experiment on 1959 videos with 43753 text descriptions to vary the number of k and with/without PCA result that the optimal clustering number is 180 with silhouette coefficient of 0.123115. The optimal clustering results in this study can be used for video data retrieval systems in the Indonesian language MRVDC dataset.

Copyrights © 2023






Journal Info

Abbrev

jip

Publisher

Subject

Computer Science & IT Control & Systems Engineering Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Library & Information Science

Description

The focus and scope of articles published in JIP (Journal of Informatics Polinema) encompasses the game technology, information system, computer network, computing, which covers the following scope: Game Technology Artificial Intelligence Intelligent System Machine Learning Image Processing Computer ...