Center for Language Engineering
ناشر
Pakistan
ملک
Karachi
شہر
13-11-2014
تاریخِ آغاز
15-11-2014
تاریخِ اختتام
تلخیص
The paper describes a two pass POS-tagging system for the extraction of first name and surname from a Pakistani (full) name string. The full name in Pakistan does not follow a single fixed pattern. The order of its component is flexible, and the simple pattern of first-name middle-name last-name is not applicable. There are many peculiarities e.g. in the absence of family name, the middle-name serve as the surname. To extract first name and surname, two sets of POS tags are designed. The first tagset consists of personal-name, family-name, religious-middle-name, particle and title. The second tagset consists of first-name, surname, title and middle-name. The output of the first pos tagging subsystem is fed to the second subsystem. The evaluation gives 90+% accuracy by using POS tagging.
Tafseer Ahmed, Naila Ata. (2014) What's in a name? Automatic extraction of lexical and functional units of Pakistani names, Conference on Language and Technology 2014.
-
Viewed
1595 -
Downloads
0