مقالے کی معلومات
جلد
شمارہ
مقالے کی قسم
زبان
تلخیص
In this research, we present the results of a study conducted to ascertain the applicability of
document clustering techniques on Urdu language corpus. This study, which is first of its kind,
employs a fully probabilistic Bayesian method, Latent Dirichlet Allocation, for clustering Urdu
language corpus by using the features collected from the documents. Results obtained are compared
with those obtained from a simplistic classification technique. Analysis of the results shows that
supervised and unsupervised techniques for grouping documents perform reasonably well on this
corpus. Results further indicate that Urdu document clustering technique outperforms document
classification technique in some cases with an accuracy of above 90%.
Toqeer Ehsan, H. M. Shahzad Asif. (2018) Finding Topics in Urdu: A Study of Applicability of Document Clustering on Urdu Language, Pakistan Journal of Engineering and Applied Sciences, VOLUME 23, Issue 1.
-
Views
2642 -
Downloads
326
پچھلا مقالہ