Home Conference List Conference on Language and Technology Conference on Language and Technology 2012 Article Details

Survey of Urdu OCR: An Offline Approach

Abstract

Optical Character Recognition (OCR) is the process of converting printed, handwritten and typed printed text into its equivalent machine readable form. Scanning and comparison techniques are considered to recognize printed text or numerical data. Once the scanned document is converted into machine readable form, the text can then be used in different applications, just like normal machine readable text. It saves time by not typing already printed material for data entry. OCR software attempts to identify characters by comparing figures to those stored in the software library. The discipline of OCR is an offspring of Pattern Recognition, Artificial Intelligence, and Computer Vision. Arabic script (having characters that are connected cursively) makes the recognition of Urdu text more difficult as compared to a language such as English having isolated characters when forming a word. In this research paper, an analysis of 8 years research papers (2002 to 2009) on Urdu OCR has been conducted to show the endeavors for the development of offline Urdu OCR covering both history and future work

Download

Cite this article

Naila Fareen, Attash Durrani, Mohammad Abid Khan. (2012) Survey of Urdu OCR: An Offline Approach, Conference on Language and Technology 2012.

Viewed 1502
Downloads 165

Publisher

Center for Language Engineering

Country

Pakistan

City

Lahore

From

09-11-2012

10-11-2012