Abstract
Uyghur is an agglutinative language that has many morphemes. It is necessary for processing Uyghur to segment words into morphemes. This work is called morphological segmentation. Previous works treat morphological segmentation as a tagging task and classify each character as one of four classes, which are \(\{b,m,e,s\}\). However, these labels are not independent from each other, which makes the models easily overfitted. We propose a new method for the segmentation task. Instead of using these labels, we use only segmentation points for modeling. The model used in our method is more robust and easier to train than previous methods. Applying our model to Uyghur morphological segmentation, it achieves high accuracy and higher recall and f1 score than previous models.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
13 October 2019
The original version of this chapter contained an error in the second author’s name. The spelling of Shuqin Li’s name was incorrect in the header of the paper. The author’s name has been corrected.
References
Abudukelimu, H., Cheng, Y., Liu, Y., Sun, M.: Uyghur morphological segmentation with bidirectional GRU neural networks. J. Tsinghua Univ. (Sci. Technol.) 57(1), 1–6 (2017)
Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 [cs, stat], September 2014
Bergmanis, T., Goldwater, S.: From segmentation to analyses: a probabilistic model for unsupervised morphology induction. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 337–346. Association for Computational Linguistics, Valencia, April 2017
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, October 2014
Cotterell, R., Vieira, T., Schütze, H.: A joint model of orthography and morphological segmentation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 664–669. Association for Computational Linguistics, San Diego (2016)
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL-2002 Workshop on Morphological and Phonological Learning (2002)
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguis. 27(2), 153–198 (2001)
Halidanmu, A., Abudukelimu, A., Sun, M., Liu, Y.: THUUyMorph: an uyghur morpheme segmentation corpus. J. Chin. Inf. Process. 32(2), 81 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
Orhun, M., Tantug, A.C., Adali, E.: Rule based analysis of the uyghur nouns. Int. J. Asian Lang. Proc. 19(1), 33–44 (2009)
Osman, T., Yang, Y., Tursun, E., Cheng, L.: Collaborative analysis of uyghur morphology based on character level. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis 55, 47–54 (2019)
Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 412–418, August 2016
Poon, H., Cherry, C., Toutanova, K.: Unsupervised morphological segmentation with log-linear models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 209–217. Association for Computational Linguistics, Boulder (2009)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112. Curran Associates, Inc. (2014)
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates, Inc. (2015)
Wang, L., Cao, Z., Xia, Y., de Melo, G.: Morphological segmentation with window LSTM neural networks. In: Thirtieth AAAI Conference on Artificial Intelligence, March 2016
Acknowledgements
This work was supported by National Science Foundation of China (Grant No. 61772075), National Science Foundation of China (Grant No. 61772081), Scientific Research Project of Beijing Educational Committee (Grant No. KM201711232022), Beijing Municipal Education Committee (Grant No. SZ20171123228), Beijing Institute of Computer Technology and Application (Grant by Extensible Knowledge Graph Construction Technique Project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Li, S., Zhang, Y., Zhang, HP. (2019). Point the Point: Uyghur Morphological Segmentation Using PointerNetwork with GRU. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-32381-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)