Indonesian Journal of Electrical Engineering and Computer Science
Vol 26, No 3: June 2022

An arrangement of the number of K-grams in the performance of Rabin Karp algorithm in text adjustment

Yuli Astuti (Universitas Amikom Yogyakarta)
Irma Rofni Wulandari (Universitas Amikom Yogyakarta)



Article Info

Publish Date
01 Jun 2022

Abstract

Rabin Karp algorithm is frequently used to determine the similarity between texts, using the hash function to compare the string identified and the substring in the text. The choice of the k value in the K-gram is often unrestricted. The number of k values used when cutting some terms will take longer if tried one by one. This research will perform a word cutting test on a script using K-gram 0 to 8. The results will cover the effect of the value of each K used on the similarity percentage produced. This research aims to determine the effect of the number of K-grams on the performance of Rabin Karp in text matching. The test underwent 20 sentences and 10 times using the dice coefficient for text similarity testing. The conclusion of this research should not use the K-gram 0 to 2 due to the K-gram basic principle: character deduction. Subsequently, if the character is 0,1,2, it does not have a meaning yet; thus, it gets a high similarity percentage. Based on trials by taking samples of K-gram 0 to 8 from 10 test data sets; the K-gram 3 is the best among K-grams 0 to 8.

Copyrights © 2022