Center for Language Engineering
ناشر
Pakistan
ملک
Karachi
شہر
13-11-2014
تاریخِ آغاز
15-11-2014
تاریخِ اختتام


تلخیص
The paper presents the development of first publically available Urdu N-grams extracted from different books. For the better representation of N-grams, large amount Urdu corpus is collected from books covering different domains. The automatic cleaning of 37 million Urdu books corpus is discussed. The domain-wise N-grams are extracted which can be used in different Natural Language Processing and Information Retrieval applications.

Farah Adeeba, Qurat-Ul-Ain Akram, Hina Khalid, Sarmad Hussain. (2014) Urdu Books N-grams, Conference on Language and Technology 2014.
  • Viewed 1529
  • Downloads 0