INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi
Vol 6 No 2 (2022): August 2022

Similarity Identification Based on Word Trigrams Using Exact String Matching Algorithms

Abdul Fadlil (Universitas Ahmad Dahlan)
Sunardi Sunardi (Universitas Ahmad Dahlan)
Rezki Ramdhani (Universitas Ahmad Dahlan)



Article Info

Publish Date
13 Aug 2022

Abstract

Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.

Copyrights © 2022






Journal Info

Abbrev

intensif

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

INTENSIF Journal is a publication container for research in various fields related to information systems. These fields includeInformation System, Software Engineering, Data Mining, Data Warehouse, Computer Networking, Artificial Intelligence, e-Bussiness, e-Government, Big Data, Application ...