Abstract
Named Entity Recognition (NER) is the process of identifying names of Persons, Organizations, Locations and
other miscellaneous information like number, date, and measure in a given text. In this paper, we describe the
development of a NER system for Urdu Language using Hidden Markov Model (HMM). First, we show a
comparison of IOB2 and IOE2 tagging schemes. Second, we show the preprocessing of Urdu before feeding
data to the HMM model for training using the IOE2 tagging scheme. Finally, we use the Part of Speech (POS)
information, gazetteers, and rules to improve the accuracy of the system. Our system yields 66.71%, 71.70%,
and 69.12% as the values for precision, recall, and f-measure, respectively. This system will help us improve the
results of Urdu Information Retrieval, Machine Translation, and Questing and Answering systems.
Muhammad Kamran Malik, Syed Mansoor Sarwar. (2017) Urdu Named Entity Recognition System using Hidden Markov Model, Pakistan Journal of Engineering and Applied Sciences, VOLUME 21, Issue 1.
-
Views
2132 -
Downloads
260
Article Details
Volume
Issue
Type
Language