Garuda - Garba Rujukan Digital

Indonesian Journal of Electrical Engineering and Computer Science

Vol 21, No 1: January 2021

Ahmed Hussain Aliwy (University of Kufa)
Basheer Al-Sadawi (University of Kufa)

Publish Date
01 Jan 2021

An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Indonesian Journal of Electrical Engineering and Computer Science

Website

Abbrev

IJEECS

Publisher

Institute of Advanced Engineering and Science

Subject

Description

...

Article Info

Abstract

Corpus-based technique for improving Arabic OCR system

Article Info

Abstract