Abstract
For effectively obtain information in Uyghur words, we present a novel method based on character tagging for Uyghur word segmentation. In this paper, we suggest five labels for characters in a Uyghur word, include: Su, Bu, Iu, Eu and Au, according to our method, we segment Uyghur words as a sequence labeling procedure, which use Conditional Random Fields (CRFs) as the basic labeling model. Experimental show that our method collect more features in Uyghur words, therefore outperform several traditional used word segmentation models significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Christopher, D.M., Hinrich, S.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Meystre, S., Haug, P.J.: Automation of a problem list using natural language processing. BMC Medical Informatics and Decision Making 5(1), 30 (2005)
Collobert, R., Weston, L., Bottou, M., Karlen, K.K., Kuksa, P.: Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)
Zaokere, K., Aishan, W., Tuergen, Y., et al.: Uyghur noun stemming system based on hybrid method. Computer Engineering and Applications 49(1), 171–175 (2013)
Zou, Y., Tuergen, Y., Mairehaba, A., Aishan, W., Parida, T.: Uyghur event-anchored temporal expressions recognition using stemming method. Computer Engineering and Design 35(2), 625–630 (2014)
Xue, H., Dong, X., Wang, L., Osman, T., Jiang, T.: Unsupervised Uyghur word segmentation method based on affix corpus. Computer Engineering and Design 32(9), 3191–3194 (2011)
Chen, P.: Uyghur Stem Segmentation and POS Tagging based on Corpora. Master’s Thesis, Xinjiang University (2006)
Adongbieke, G., Ablimit, M.: Research on Uighur Word Segmentation. Journal of Chinese Information Processing 18(6), 61–65 (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)
Wallach, H.M.: Conditional random fields: An introduction. Technical Reports (CIS), 22 (2004)
Morwal, S., Jahan, N., Chopra, D.: Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput(IJNLC) 1(4), 15–23 (2012)
Morwal, S., Chopra, D.: NERHMM: A Tool For Named Entity Recognition based on Hidden Markov Model. International Journal on Natural Language Computing (IJNLC) 2, 43–49 (2013)
Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. IRCS Technical Reports Series 81 (1997)
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the 6th Conference on Natural Language Learning, vol. 20, pp. 1–7. Association for Computational Linguistics (2002)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Y., Mi, C., Ma, B., Dong, R., Wang, L., Li, X. (2014). Character Tagging-Based Word Segmentation for Uyghur. In: Shi, X., Chen, Y. (eds) Machine Translation. CWMT 2014. Communications in Computer and Information Science, vol 493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45701-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-662-45701-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45700-9
Online ISBN: 978-3-662-45701-6
eBook Packages: Computer ScienceComputer Science (R0)