JAIS (Journal of Applied Intelligent System)
Vol 8, No 1 (2023): Journal of Applied Intelligent System

Comparison of String Similarity Algorithm in post-processing OCR

Al Birr Karim Susanto (Teknik Informatika, Universitas Dian Nuswantoro)
Nuraziz Muliadi (Teknik Informatika, Universitas Dian Nuswantoro)
Bagus Nugroho (Teknik Informatika, Universitas Dian Nuswantoro)
Muljono Muljono (Teknik Informatika, Universitas Dian Nuswantoro, Semarang)



Article Info

Publish Date
17 Feb 2023

Abstract

The Optical Character Recognition (OCR) problem that often occurs is that the image used, has a lot of noise covering letters in a word partially. This can cause misspellings in the process of word recognition or detection in the image. After the OCR process, we must do some post-processing for correcting the word. The words will be corrected using a string similarity algorithm. So what is the best algorithm? We conducted a comparison algorithm including the Levenshtein distance, Hamming distance, Jaro-Winkler, and Sørensen – Dice coefficient. After testing, the most effective algorithm is the Sørensen-Dice coefficient with a value of 0.88 for the value of precision, recall, and F1 score

Copyrights © 2023






Journal Info

Abbrev

JAIS

Publisher

Subject

Description

Journal of Applied Intelligent System (JAIS) is published by LPPM Universitas Dian Nuswantoro Semarang in collaboration with CORIS and IndoCEISS, that focuses on research in Intelligent System. Topics of interest include, but are not limited to: Biometric, image processing, computer vision, ...