Abstract
Uyghur is an agglutinative language with complex morphology, and word stemming is one of the essentials in Uyghur information processing. However, the performance of Uyghur word-stem segmentation still leaves much room for improvement. In this study, stemming was performed on Uyghur words using an affix-occurred probability feature, which provided the stemming accuracy of 88.59% for a baseline system. The performance of this stemmer was further improved by using parameter ‘α’ in combination with the proposed method.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Deepamala, N., Kumar, P.R.: Kannada stemmer and its effect on Kannada documents classification. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining - Volume 3. SIST, vol. 33, pp. 75–86. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2202-6_7
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. linguist. 11, 11:22–11:31 (1968)
Ekmekҫioglu, F.C., Willett, P.: Effectiveness of stemming for Turkish text retrieval. PROGRAM-LONDON-ASLIB 34(2), 195–200 (2000)
Sever, H., Bitirim, Y.: FindStem: analysis and evaluation of a Turkish stemming algorithm. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 238–251. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39984-1_18
Korenius, T., Laurikkala, J., Jarvelin, K.: Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 625–633 (2004)
Porter, M.: An algorithm for suffix stripping. Program Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)
Oflazer, K.: Two-level description of Turkish morphology. Lit. Linguist. Comput. 9, 137–148 (1994)
Adongbieke, G., Ablimit, M.: Research on Uighur word segmentation. J. Chin. Inf. Process. 18(6), 61–65 (2004)
Abuduwaili, T., Wumaier, A., Yibulayin, T.: Uyghur verb stemming method based on a tagged dictionary and rules. J. Xinjiang Univ. 01, 6–12 (2013)
Majumder, P., Mitra, M., Datta, K.: Yass: yet another suffix stripper. ACM trans. Inf. Syst. (TOIS) 25(4), 18–25 (2007)
Ŝnajder, J., Baŝic, B.D.: String distance-based stemming of the highly inflected croatian language. In: Proceedings of the International Conference RANLP-2009. Association for Computational Linguistics, Bulgaria, pp. 411–415 (2009)
Aisha, B.: A letter tagging approach to Uyghur tokenization. In: International Conference on Asian Language Processing 2010: IEEE Computer Society, pp. 11–14 (2010)
Aili, M., Jiang, W.-B., Wang, Z.-Y.: Directed graph model of Uyghur morphological analysis. J. Softw. 23(12), 3115–3129 (2012)
Ablimit, M., Eli, M., Kawahara, T.: Partly supervised Uyghur morpheme segmentation. In: Proceedings of the Oriental-COCOSDA Workshop, pp. 71–76 (2008)
Enwer, S., Lu, X.: A multi-strategy approach to Uyghur stemming. J. Chin. Inf. Process. 29(5), 204–211 (2015)
Acknowledgement
This study was supported by the Doctoral Graduate Students’ Innovative Projects of Xinjiang University (Grant NO. XJUBSCX-2013011); the National Natural Science Foundation of Xinjiang University (Grant NO. XY110103); the Social Science Foundation of the Ministry of Education (Grant NO. 10YJA740027); the National Natural Science Foundation of China (Grant NO. 61462087); and the New Century Excellent Talent Support Plan of the Ministry of Education (Grant NO. NCET-10-0969).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yilahun, H., Enwer, S., Hamdulla, A. (2018). Uyghur Word Stemming Based on Stem and Affix Features. In: Tao, J., Zheng, T., Bao, C., Wang, D., Li, Y. (eds) Man-Machine Speech Communication. NCMMSC 2017. Communications in Computer and Information Science, vol 807. Springer, Singapore. https://doi.org/10.1007/978-981-10-8111-8_1
Download citation
DOI: https://doi.org/10.1007/978-981-10-8111-8_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8110-1
Online ISBN: 978-981-10-8111-8
eBook Packages: Computer ScienceComputer Science (R0)