Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2019 - 2024

0.408

P-Index

This Author published in this journals

All Journal International Journal of Electrical and Computer Engineering Journal of Data Science and Its Applications

Yohei Murakami

Ritsumeikan University

Author-ID : 2384229

Computer Science & IT Decision Sciences, Operations Research & Management Electrical & Electronics Engineering

Published : 2 Documents Claim Missing Document

Claim Missing Document

Articles

Generating similarity cluster of Indonesian languages with semi-supervised clustering Arbi Haza Nasution; Yohei Murakami; Toru Ishida
International Journal of Electrical and Computer Engineering (IJECE) Vol 9, No 1: February 2019
Publisher : Institute of Advanced Engineering and Science

Lexicostatistic and language similarity clusters are useful for computational linguistic researches that depends on language similarity or cognate recognition. Nevertheless, there are no published lexicostatistic/language similarity cluster of Indonesian ethnic languages available. We formulate an approach of creating language similarity clusters by utilizing ASJP database to generate the language similarity matrix, then generate the hierarchical clusters with complete linkage and mean linkage clustering, and further extract two stable clusters with high language similarities. We introduced an extended k-means clustering semi-supervised learning to evaluate the stability level of the hierarchical stable clusters being grouped together despite of changing the number of cluster. The higher the number of the trial, the more likely we can distinctly find the two hierarchical stable clusters in the generated k-clusters. However, for all five experiments, the stability level of the two hierarchical stable clusters is the highest on 5 clusters. Therefore, we take the 5 clusters as the best clusters of Indonesian ethnic languages. Finally, we plot the generated 5 clusters to a geographical map.

Visualizing Language Lexical Similarity Clusters: A Case Study of Indonesian Ethnic Languages Arbi Haza Nasution; Yohei Murakami
Journal of Data Science and Its Applications Vol 2 No 2 (2019): Journal of Data Science and Its Applications
Publisher : Telkom University

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34818/jdsa.2019.2.23

Language similarity clusters are useful for computational linguistic researches that rely on language similarity or cognate recognition. The existing language similarity clustering approach which utilizes hierarchical clustering and k-means clustering has difficulty in creating clusters with a middle range of language similarity. Moreover, it lacks an interactive visualization that user can explore. To address these issues, we formalize a graph-based approach of creating and visualizing language lexical similarity clusters by utilizing ASJP database to generate the language similarity matrix, then formalize the data as an undirected graph. To create the clusters, we apply a connected components algorithm with a threshold of language similarity range. Our interactive online tool allows a user to dynamically create new clusters by changing the threshold of language similarity range and explore the data based on language similarity range and number of speakers. We provide an implementation example of our approach to 119 Indonesian ethnic languages. The experiment result shows that for the case of low system execution burden, the system performance was quite stable. For the case of high system execution burden, despite the fluctuated performance, the response times were still below 25 seconds, which is considered acceptable.

Co-Authors Arbi Haza Nasution, Arbi Haza Toru Ishida

Title

Found 2 Documents
Search

Abstract

Abstract

Title Search

Found 2 Documents Search

Abstract

Abstract

Title

Found 2 Documents
Search