Skip to main content

Character Tagging-Based Word Segmentation for Uyghur

  • Conference paper
Machine Translation (CWMT 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 493))

Included in the following conference series:

Abstract

For effectively obtain information in Uyghur words, we present a novel method based on character tagging for Uyghur word segmentation. In this paper, we suggest five labels for characters in a Uyghur word, include: Su, Bu, Iu, Eu and Au, according to our method, we segment Uyghur words as a sequence labeling procedure, which use Conditional Random Fields (CRFs) as the basic labeling model. Experimental show that our method collect more features in Uyghur words, therefore outperform several traditional used word segmentation models significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Christopher, D.M., Hinrich, S.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  2. Meystre, S., Haug, P.J.: Automation of a problem list using natural language processing. BMC Medical Informatics and Decision Making 5(1), 30 (2005)

    Article  Google Scholar 

  3. Collobert, R., Weston, L., Bottou, M., Karlen, K.K., Kuksa, P.: Natural Language Processing (Almost) from Scratch. Journal of Machine Learning Research 12, 2493–2537 (2011)

    MATH  Google Scholar 

  4. Zaokere, K., Aishan, W., Tuergen, Y., et al.: Uyghur noun stemming system based on hybrid method. Computer Engineering and Applications 49(1), 171–175 (2013)

    Google Scholar 

  5. Zou, Y., Tuergen, Y., Mairehaba, A., Aishan, W., Parida, T.: Uyghur event-anchored temporal expressions recognition using stemming method. Computer Engineering and Design 35(2), 625–630 (2014)

    Google Scholar 

  6. Xue, H., Dong, X., Wang, L., Osman, T., Jiang, T.: Unsupervised Uyghur word segmentation method based on affix corpus. Computer Engineering and Design 32(9), 3191–3194 (2011)

    Google Scholar 

  7. Chen, P.: Uyghur Stem Segmentation and POS Tagging based on Corpora. Master’s Thesis, Xinjiang University (2006)

    Google Scholar 

  8. Adongbieke, G., Ablimit, M.: Research on Uighur Word Segmentation. Journal of Chinese Information Processing 18(6), 61–65 (2004)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  10. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 134–141. Association for Computational Linguistics (2003)

    Google Scholar 

  11. Wallach, H.M.: Conditional random fields: An introduction. Technical Reports (CIS), 22 (2004)

    Google Scholar 

  12. Morwal, S., Jahan, N., Chopra, D.: Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput(IJNLC) 1(4), 15–23 (2012)

    Article  Google Scholar 

  13. Morwal, S., Chopra, D.: NERHMM: A Tool For Named Entity Recognition based on Hidden Markov Model. International Journal on Natural Language Computing (IJNLC) 2, 43–49 (2013)

    Article  Google Scholar 

  14. Ratnaparkhi, A.: A simple introduction to maximum entropy models for natural language processing. IRCS Technical Reports Series 81 (1997)

    Google Scholar 

  15. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)

    Google Scholar 

  16. Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the 6th Conference on Natural Language Learning, vol. 20, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  17. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Y., Mi, C., Ma, B., Dong, R., Wang, L., Li, X. (2014). Character Tagging-Based Word Segmentation for Uyghur. In: Shi, X., Chen, Y. (eds) Machine Translation. CWMT 2014. Communications in Computer and Information Science, vol 493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45701-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45701-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45700-9

  • Online ISBN: 978-3-662-45701-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics