Named Entity Recognition with Gated Convolutional Neural Networks

  • Chunqi WangEmail author
  • Wei Chen
  • Bo Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)


Most state-of-the-art models for named entity recognition (NER) rely on recurrent neural networks (RNNs), in particular long short-term memory (LSTM). Those models learn local and global features automatically by RNNs so that hand-craft features can be discarded, totally or partly. Recently, convolutional neural networks (CNNs) have achieved great success on computer vision. However, for NER problems, they are not well studied. In this work, we propose a novel architecture for NER problems based on GCNN — CNN with gating mechanism. Compared with RNN based NER models, our proposed model has a remarkable advantage on training efficiency. We evaluate the proposed model on three data sets in two significantly different languages — SIGHAN bakeoff 2006 MSRA portion for simplified Chinese NER and CityU portion for traditional Chinese NER, CoNLL 2003 shared task English portion for English NER. Our model obtains state-of-the-art performance on these three data sets.



This work is supported by the National Key Research & Development Plan of China (No.2013CB329302). Thanks anonymous reviewers for their valuable suggestions. Thanks Wang Geng, Zhen Yang and Yuanyuan Zhao for their useful discussions.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint (2016). arXiv:1603.04467
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Computer Science (2014)Google Scholar
  3. 3.
    Chen, A., Peng, F., Shan, R., Sun, G.: Chinese named entity recognition with conditional probabilistic models, pp. 173–176 (2006)Google Scholar
  4. 4.
    Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional lstm-cnns. Computer Science (2015)Google Scholar
  5. 5.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint (2014). arXiv:1412.3555
  6. 6.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(1), 2493–2537 (2011)zbMATHGoogle Scholar
  7. 7.
    Dauphin, Y.N., Fan, A., Auli, M., Grangier, D.: Language modeling with gated convolutional networks (2016)Google Scholar
  8. 8.
    Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. arXiv preprint (2015). arXiv:1512.00103
  9. 9.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3(4), 212–223 (2012)Google Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint (2015). arXiv:1508.01991
  12. 12.
    Junsheng, Z., Liang, H., Xinyu, D., Jiajun, C.: Chinese named entity recognition with a multi-phase model. In: COLING ACL 2006, p. 213 (2006)Google Scholar
  13. 13.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. Computer Science (2014)Google Scholar
  14. 14.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, ICML, vol. 1, pp. 282–289 (2001)Google Scholar
  15. 15.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition (2016)Google Scholar
  16. 16.
    LeCun, Y., Boser, B., Denker, J., Henderson, D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  17. 17.
    Levow, G.A.: The third international chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117 (2006)Google Scholar
  18. 18.
    Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)Google Scholar
  19. 19.
    Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns-crf (2016)Google Scholar
  20. 20.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26, 3111–3119 (2013)Google Scholar
  21. 21.
    Passos, A., Kumar, V., Mccallum, A.: Lexicon infused phrase embeddings for named entity resolution. Computer Science (2014)Google Scholar
  22. 22.
    Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)Google Scholar
  23. 23.
    Pham, N.Q., Kruszewski, G., Boleda, G.: Convolutional neural network language models. In: Proceedings of EMNLP (2016)Google Scholar
  24. 24.
    Ratinov, L., Roth, D.: Conll 09 design challenges and misconceptions in named entity recognition. In: CoNLL 2009: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 147–155 (2009)Google Scholar
  25. 25.
    Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks (2016)Google Scholar
  26. 26.
    dos Santos, C., Guimaraes, V., Niterói, R., de Janeiro, R.: Boosting named entity recognition with neural character embeddings. In: Proceedings of NEWS 2015 The Fifth Named Entities Workshop, p. 25 (2015)Google Scholar
  27. 27.
    dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)Google Scholar
  28. 28.
    Sundermeyer, M., Schlüter, R., Ney, H.: Lstm neural networks for language modeling. In: Interspeech, pp. 194–197 (2012)Google Scholar
  29. 29.
    Sutskever, I., Vinyals, O., Le, Q.V., Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 4, 3104–3112 (2014)Google Scholar
  30. 30.
    Sang, E.F.T.K., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition. Comput. Sci. 21(08), 142–147 (2003)Google Scholar
  31. 31.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint (2014). arXiv:1409.2329
  32. 32.
    Zhao, H., Kit, C.: Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition. In: IJCNLP, pp. 106–111. Citeseer (2008)Google Scholar
  33. 33.
    Zhou, J., Qu, W., Zhang, F.: Chinese named entity recognition via joint identification and categorization. Chin. J. Electron. 22(2), 225–230 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of Chinese Academy of SciencesBeijingChina
  2. 2.Institute of Automation, Chinese Academy of SciencesBeijingChina

Personalised recommendations