Advertisement

Neural Processing Letters

, Volume 50, Issue 3, pp 2647–2664 | Cite as

Learning Morpheme Representation for Mongolian Named Entity Recognition

  • Weihua Wang
  • Feilong BaoEmail author
  • Guanglai Gao
Article
  • 96 Downloads

Abstract

Traditional approaches to Mongolian named entity recognition heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words made the data more sparsity. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a framework of recurrent neural networks with morpheme representation. We then study this framework in depth with different model variants. More specially, the morpheme representation utilizes the characteristic of classical Mongolian script, which can be learned from unsupervised corpus. Our model will be further augmented by different character representations and auxiliary language model losses which will extract context knowledge from scratch. By jointly decoding by Conditional Random Field layer, the model could learn the dependence between different labels. Experimental results show that feeding the morpheme representation into neural networks outperforms the word representation. The additional character representation and morpheme language model loss also improve the performance.

Keywords

Named entity recognition Mongolian morpheme representation Language model auxiliary loss 

Notes

References

  1. 1.
    Abudukelimu H, Liu Y, Chen X, Sun M, Abulizi A (2015) Learning distributed representations of uyghur words and morphemes. In: Chinese computational linguistics and natural language processing based on naturally annotated big data—14th China National Conference, CCL 2015 and third international symposium, NLP-NABD 2015, Guangzhou, China, November 13–14, 2015, Proceedings, pp 202–211Google Scholar
  2. 2.
    Arisoy E, Sethy A, Ramabhadran B, Chen SF (2015) Bidirectional recurrent neural network language models for automatic speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 5421–5425Google Scholar
  3. 3.
    Benajiba Y, Rosso P (2008) Arabic named entity recognition using conditional random fields. In: Proceedings of workshop on HLT & NLP within the Arabic World, LREC, vol 8, pp 143–153Google Scholar
  4. 4.
    Benajiba Y, Zitouni I, Diab M, Rosso P (2010) Arabic named entity recognition: using features extracted from noisy data. In: Proceedings of the ACL 2010 conference short papers, pp 281–285. Association for Computational LinguisticsGoogle Scholar
  5. 5.
    Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155zbMATHGoogle Scholar
  6. 6.
    Bengio Y, Simard PY, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Net 5(2):157–166CrossRefGoogle Scholar
  7. 7.
    Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRefGoogle Scholar
  8. 8.
    Botha JA, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, 21–26 June 2014, pp 1899–1907Google Scholar
  9. 9.
    Chen X, Xu L, Liu Z, Sun M, Luan H (2015) Joint learning of character and word embeddings. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015, pp 1236–1242Google Scholar
  10. 10.
    Chiu J, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4:357–370. http://www.aclweb.org/anthology/Q16-1026 CrossRefGoogle Scholar
  11. 11.
    Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1724–1734. Association for Computational Linguistics.  https://doi.org/10.3115/v1/D14-1179. http://www.aclweb.org/anthology/D14-1179
  12. 12.
    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537zbMATHGoogle Scholar
  13. 13.
    David Nadeau SS (2007) A survey of named entity recognition and classification. Lingvisticae Investig 30(1):3–26CrossRefGoogle Scholar
  14. 14.
    Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2013, Vancouver, BC, Canada, May 26–31, 2013, pp 6645–6649Google Scholar
  15. 15.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780.  https://doi.org/10.1162/neco.1997.9.8.1735 CrossRefGoogle Scholar
  16. 16.
    Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: The 50th annual meeting of the association for computational linguistics, proceedings of the conference, July 8–14, 2012, Jeju Island, Korea—Volume 1: Long Papers, pp 873–882Google Scholar
  17. 17.
    Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR arXiv:1508.01991
  18. 18.
    Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 720–728Google Scholar
  19. 19.
    Kazama J, Torisawa K (2007) Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL)Google Scholar
  20. 20.
    Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: AAAI, pp 2741–2749Google Scholar
  21. 21.
    Konkol M, Konopík M (2013) CRF-based Czech named entity recognizer and consolidation of czech NER research. In: Text, speech, and dialogue, pp 153–160. SpringerGoogle Scholar
  22. 22.
    Kudo T, Matsumoto Y (2001) Chunking with support vector machines. In: Proceedings of the 2001 conference of the North American chapter of the association for computational linguistics. Association for Computational LinguisticsGoogle Scholar
  23. 23.
    Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence dataGoogle Scholar
  24. 24.
    Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 260–270. Association for Computational Linguistics, San Diego, CaliforniaGoogle Scholar
  25. 25.
    Liu L, Shang J, Xu F, Ren X, Gui H, Peng J, Han J (2017) Empower sequence labeling with task-aware neural language model. arXiv preprint arXiv:1709.04109
  26. 26.
    Luong M, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. CoRR arXiv:1511.06114
  27. 27.
    Luong T, Socher R, Manning CD (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning, CoNLL 2013, Sofia, Bulgaria, August 8–9, 2013, pp 104–113Google Scholar
  28. 28.
    Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1064–1074. Association for computational linguistics.  https://doi.org/10.18653/v1/P16-1101. http://www.aclweb.org/anthology/P16-1101
  29. 29.
    Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH 2013, 14th annual conference of the international speech communication association, Lyon, France, August 25–29, 2013, pp 3771–3775Google Scholar
  30. 30.
    Ogawa A, Hori T (2015) ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19–24, 2015, pp 4370–4374Google Scholar
  31. 31.
    Peng N, Dredze M (2016) Improving named entity recognition for chinese social media with word segmentation representation learning. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 149–155. Association for computational linguistics.  https://doi.org/10.18653/v1/P16-2025. http://aclweb.org/anthology/P16-2025
  32. 32.
    Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543. http://www.aclweb.org/anthology/D14-1162
  33. 33.
    Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: short papers), pp 412–418. Association for computational linguistics.  https://doi.org/10.18653/v1/P16-2067. http://www.aclweb.org/anthology/P16-2067
  34. 34.
    Radford W, Carreras X, Henderson J (2015) Named entity recognition with document-specific KB tag gazetteers. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 512–517Google Scholar
  35. 35.
    Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the thirteenth conference on computational natural language learning (CoNLL-2009), pp 147–155. Association for Computational Linguistics, Boulder, ColoradoGoogle Scholar
  36. 36.
    Rei M (2017) Semi-supervised multitask learning for sequence labeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 2121–2130. Association for computational linguistics.  https://doi.org/10.18653/v1/P17-1194. http://www.aclweb.org/anthology/P17-1194
  37. 37.
    Rei M, Crichton G, Pyysalo S (2016) Attending to characters in neural sequence labeling models. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 309–318. The COLING 2016 organizing committee. http://www.aclweb.org/anthology/C16-1030
  38. 38.
    Reimers N, Gurevych I (2017) Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 338–348. Association for computational linguistics. http://aclweb.org/anthology/D17-1035
  39. 39.
    Sasano R, Kurohashi S (2008) Japanese named entity recognition using structural natural language processing. In: IJCNLP, pp 607–612Google Scholar
  40. 40.
    Şeker GA, şen Eryiğit G (2012) Initial explorations on using CRFS for Turkish named entity recognition. In: Proceedings of the 24th international conference on computational linguistics, COLING 2012. Mumbai, IndiaGoogle Scholar
  41. 41.
    Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRefGoogle Scholar
  42. 42.
    Seltzer ML, Droppo J (2013) Multi-task learning in deep neural networks for improved phoneme recognition. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6965–6969. IEEEGoogle Scholar
  43. 43.
    Sen S, Mitra M, Bhattacharyya A, Sarkar R, Schwenker F, Roy K (2019) Feature selection for recognition of online handwritten bangla characters. Neural Process Lett 1–24Google Scholar
  44. 44.
    Wang L, Cao Z, Xia Y, de Melo G (2016) Morphological segmentation with window LSTM neural networks. In: Proceedings of the 13rd AAAI conference on artificial intelligence, pp 2842–2848Google Scholar
  45. 45.
    Wang W, Bao F, Gao G (2015) Mongolian named entity recognition using suffixes segmentation. In: Proceedings of 2015 international conference on asian language processing (IALP), pp 169–172. Suzhou, ChinaGoogle Scholar
  46. 46.
    Wang W, Bao F, Gao G (2016) Cyrillic mongolian named entity recognition with rich features. In: Natural language understanding and intelligent applications, pp 497–505. SpringerGoogle Scholar
  47. 47.
    Wang W, Bao F, Gao G (2016) Mongolian named entity recognition system with rich features. In: Proceedings of the 26th international conference on computational linguistics (COLING): technical papers, pp 505–512. The COLING 2016 Organizing Committee, Osaka, Japan. http://www.aclweb.org/anthology/C16-1049
  48. 48.
    Wang Z, Jiang T, Chang B, Sui Z (2015) Chinese semantic role labeling with bidirectional recurrent neural networks. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 1626–1631Google Scholar
  49. 49.
    Yannakoudakis H, Rei M, Andersen ØE, Yuan Z (2017) Neural sequence-labelling models for grammatical error correction. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2795–2806Google Scholar
  50. 50.
    Yin R, Wang Q, Li P, Li R, Wang B (2016) Multi-granularity chinese word embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 981–986. Association for computational linguistics.  https://doi.org/10.18653/v1/D16-1100. http://aclweb.org/anthology/D16-1100
  51. 51.
    Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc., Red Hook, pp 649–657Google Scholar
  52. 52.
    Zhou G, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 473–480. Association for Computational LinguisticsGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer ScienceInner Mongolia UniversityHohhotChina

Personalised recommendations