Abstract
This work presents a language independent keyword based document indexing and retrieval
system using SVM as classifier. Word spotting presents an attractive alternative to the traditional
Optical Character Recognition (OCR) systems where instead of converting the image into text,
retrieval is based on matching the images of words using pattern classification techniques. The
proposed technique relies on extracting words from images of handwritten documents and converting
each word image into a shape represented by its contour. A set of multiple features is then extracted
from each word image and instances of same words are grouped into clusters. These clusters are used
to train a multi-class SVM which learns different word classes. The documents to be indexed are
segmented into words and the closest cluster for each word is determined using the SVM. An index file
is maintained for each word containing the word locations within each document. A query word
presented to the system is matched with the clusters in the database and the documents containing
occurrences of the query word are retrieved. The system realized promising precision and recall rates
on the IAM database of handwritten documents.
Muhammad Rashid Hussain, Asif Masood, Haris Ahmad Khan, Imran Siddiqi, Khurram Khurshid. (2016) Language Independent Keyword Based Information Retrieval System of Handwritten Documents using SVM Classifier and Converting Words into Shapes, Pakistan Journal of Engineering and Applied Sciences, VOLUME 19, Issue 1.
-
Views
2057 -
Downloads
163
Article Details
Volume
Issue
Type
Language