Abstract
Optical Character Recognition (OCR) is the process of converting printed, handwritten and typed printed text into its equivalent machine readable form. Scanning and comparison techniques are considered to recognize printed text or numerical data. Once the scanned document is converted into machine readable form, the text can then be used in different applications, just like normal machine readable text. It saves time by not typing already printed material for data entry. OCR software attempts to identify characters by comparing figures to those stored in the software library. The discipline of OCR is an offspring of Pattern Recognition, Artificial Intelligence, and Computer Vision. Arabic script (having characters that are connected cursively) makes the recognition of Urdu text more difficult as compared to a language such as English having isolated characters when forming a word. In this research paper, an analysis of 8 years research papers (2002 to 2009) on Urdu OCR has been conducted to show the endeavors for the development of offline Urdu OCR covering both history and future work

Naila Fareen, Attash Durrani, Mohammad Abid Khan. (2012) Survey of Urdu OCR: An Offline Approach, Conference on Language and Technology 2012.
  • Viewed 1502
  • Downloads 165
Publisher
Center for Language Engineering
Country
Pakistan
City
Lahore
From
09-11-2012
To
10-11-2012