Abstract
One of the important resources required for various Natural Language Processing (NLP) applications like machine translation, information retrieval and text mining, is annotated text corpora. Text corpora annotation process requires parts of speech (POS) tags to mark different parts of text with grammatical annotations in order to identify linguistic properties of a word, sentence or discourse. The process of marking text items is based on two main features 1) grammatical category and 2) context of text (word, sentence or discourse) i.e. relationship with adjacent and related text.Saraiki being one of oldest languages is still resource scarce language in recorded literature as well as in computational context. According to our study, at present, there is no tagset defined for Saraiki language. This work presents first hierarchical POS (MPOST) tag set for the Saraiki language which is designed to be used in morphological, syntactic and lexical annotations of Saraiki language corpora.

Farrukh Javed Saleemi, Muhammad Nabeel Asghar, Sajid Iqbal, Muhammad Umar Chaudhry, Muhammad Yasir, Sibghat Ullah Bazai, Muhammad Qasim Khan. (2021) A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language, Journal of Applied and Emerging Sciences, Volume-11, Issue-1.
  • Views 895
  • Downloads 100

Article Details

Volume
Issue
Type
Language
Received At
Accepted At