Abstract
Lexical analysis and parsing tasks model the deeper properties of the words and their relationships to each other. The commonly used techniques involve word segmentation, part-of-speech tagging and parsing. A typical characteristic of such tasks is that the outputs are structured. Two types of methods are usually used to solve these structured prediction tasks: graph-based methods and transition-based methods. Graph-based methods differentiate output structures based on their characteristics directly, while transition-based methods transform output construction processes into state transition processes, differentiating sequences of transition actions. Neural network models have been successfully used for both graph-based and transition-based structured prediction. In this chapter, we give a review of applying deep learning in lexical analysis and parsing, and compare with traditional statistical methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
From Joakim Nivre’s tutorial at COLING-ACL, Sydney 2006.
- 3.
- 4.
- 5.
References
Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., et al. (2016). Globally normalized transition-based neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 2442–2452). Berlin, Germany: Association for Computational Linguistics.
Ballesteros, M., Dyer, C., & Smith, N. A. (2015). Improved transition-based parsing by modeling characters instead of words with LSTMs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 349–359). Lisbon, Portugal: Association for Computational Linguistics.
Ballesteros, M., Goldberg, Y., Dyer, C., & Smith, N. A. (2016). Training with exploration improves a greedy stack LSTM parser. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2005–2010). Austin, Texas: Association for Computational Linguistics.
Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015). Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 (pp. 1171–1179). Cambridge, MA, USA: MIT Press.
Bohnet, B. & Nivre, J. (2012). A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1455–1465). Jeju Island, Korea: Association for Computational Linguistics.
Booth, T. L. (1969). Probabilistic representation of formal languages. 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 00, 74–81.
Cai, D., & Zhao, H. (2016). Neural word segmentation learning for Chinese. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 409–420). Berlin, Germany: Association for Computational Linguistics.
Carnie, A. (2012). Syntax: A Generative Introduction (3rd ed.). New York: Wiley-Blackwell.
Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of EMNLP-2014.
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734). Doha, Qatar: Association for Computational Linguistics.
Choi, J. D., & Palmer, M. (2011). Getting the most out of transition-based dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 687–692). Portland, Oregon, USA: Association for Computational Linguistics.
Chu, Y., & Liu, T. (1965). On the shortest arborescence of a directed graph. Scientia Sinica, 14, 1396–1400.
Clark, S., & Curran, J. R. (2007). Wide-coverage efficient statistical parsing with ccg and log-linear models. Computational Linguistics, 33(4), 493–552.
Collins, M. (1997). Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (pp. 16–23). Madrid, Spain: Association for Computational Linguistics.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (pp. 1–8). Association for Computational Linguistics.
Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume (pp. 111–118). Barcelona, Spain.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: ACM.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951–991.
Cross, J., & Huang, L. (2016). Span-based constituency parsing with a structure-label system and provably optimal dynamic oracles. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1–11). Austin, Texas: Association for Computational Linguistics.
Dozat, T., & Manning, C. D. (2016). Deep biaffine attention for neural dependency parsing. CoRR, abs/1611.01734.
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.
Durrett, G., & Klein, D. (2015). Neural CRF parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 302–312). Beijing, China: Association for Computational Linguistics.
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., & Smith, N. A. (2015). Transition-based dependency parsing with stack long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 334–343). Beijing, China: Association for Computational Linguistics.
Edmonds, J. (1967). Optimum branchings. Journal of Research of the National Bureau of Standards, 71B, 233–240.
Eisner, J. (1996). Efficient normal-form parsing for combinatory categorial grammar. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 79–86). Santa Cruz, California, USA: Association for Computational Linguistics.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Graves, A. (2008). Supervised sequence labelling with recurrent neural networks. Ph.D. thesis, Technical University Munich.
Hall, D., Durrett, G., & Klein, D. (2014). Less grammar, more features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 228–237). Baltimore, MD: Association for Computational Linguistics.
Hatori, J., Matsuzaki, T., Miyao, Y., & Tsujii, J. (2012). Incremental joint approach to word segmentation, pos tagging, and dependency parsing in Chinese. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1045–1053), Jeju Island, Korea: Association for Computational Linguistics.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Huang, L., Fayong, S., & Guo, Y. (2012). Structured perceptron with inexact search. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 142–151). Montréal, Canada: Association for Computational Linguistics.
Jurafsky, D., & Martin, J. H. (2009). Speech and language processing (2nd ed.). Upper Saddle River, NJ, USA: Prentice-Hall Inc.
Kbler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis Lectures on Human Language Technologies, 2(1), 1–127.
Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional lstm feature representations. Transactions of the Association for Computational Linguistics, 4, 313–327.
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01 (pp. 282–289), San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Lewis, M., & Steedman, M. (2014). A* CCG parsing with a supertag-factored model. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 990–1000). Doha, Qatar: Association for Computational Linguistics.
Li, F., Zhang, Y., Zhang, M., & Ji, D. (2016). Joint models for extracting adverse drug events from biomedical text. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 (pp. 2838–2844). New York, NY, USA, 9–15 July 2016.
Li, Q., & Ji, H. (2014). Incremental joint extraction of entity mentions and relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 402–412). Baltimore, MD: Association for Computational Linguistics.
Liu, J., & Zhang, Y. (2015). An empirical comparison between n-gram and syntactic language models for word ordering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 369–378). Lisbon, Portugal: Association for Computational Linguistics.
Liu, Y., Che, W., Guo, J., Qin, B., & Liu, T. (2016). Exploring segment representations for neural segmentation models. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 (pp. 2880–2886). New York, NY, USA, 9–15 July 2016.
Liu, Y., Zhang, Y., Che, W., & Qin, B. (2015). Transition-based syntactic linearization. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 113–122). Denver, Colorado: Association for Computational Linguistics.
Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412–1421). Lisbon, Portugal: Association for Computational Linguistics.
Lyu, C., Zhang, Y., & Ji, D. (2016). Joint word segmentation, pos-tagging and syntactic chunking. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16 (pp. 3007–3014). AAAI Press.
Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.
McDonald, R. (2006). Discriminative learning spanning tree algorithm for dependency parsing. PhD thesis, University of Pennsylvania.
Nivre, J. (2003). An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT) (pp. 149–160).
Nivre, J. (2008). Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 34(4), 513–554.
Pei, W., Ge, T., & Chang, B. (2015). An effective neural network model for graph-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 313–322), Beijing, China: Association for Computational Linguistics.
Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (pp. 433–440), Sydney, Australia: Association for Computational Linguistics.
Puduppully, R., Zhang, Y., & Shrivastava, M. (2016). Transition-based syntactic linearization with lookahead features. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 488–493). San Diego, CA: Association for Computational Linguistics.
Qian, T., Zhang, Y., Zhang, M., Ren, Y., & Ji, D. (2015). A transition-based model for joint segmentation, pos-tagging and normalization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1837–1846), Lisbon, Portugal: Association for Computational Linguistics.
Sagae, K., & Lavie, A. (2005). A classifier-based parser with linear run-time complexity. In Proceedings of the Ninth International Workshop on Parsing Technology, Parsing ’05 (pp. 125–132). Stroudsburg, PA, USA: Association for Computational Linguistics.
Sagae, K., Lavie, A., & MacWhinney, B. (2005). Automatic measurement of syntactic development in child language. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 197–204). Ann Arbor, MI: Association for Computational Linguistics.
Sarawagi, S., & Cohen, W. W. (2004). Semi-Markov conditional random fields for information extraction. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1185–1192). Cambridge: MIT Press.
Shaalan, K. (2014). A survey of arabic named entity recognition and classification. Computational Linguistics, 40(2), 469–510.
Smith, N. A. (2011). Linguistic structure prediction. Morgan and Claypool: Synthesis Lectures on Human Language Technologies.
Song, L., Zhang, Y., Song, K., & Liu, Q. (2014). Joint morphological generation and syntactic linearization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14 (pp. 1522–1528). AAAI Press.
Vaswani, A., Bisk, Y., Sagae, K., & Musa, R. (2016). Supertagging with LSTMs. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 232–237). San Diego, CA: Association for Computational Linguistics.
Wang, W., & Chang, B. (2016). Graph-based dependency parsing with bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 2306–2315). Berlin, Germany: Association for Computational Linguistics.
Wang, Z., & Xue, N. (2014). Joint pos tagging and transition-based constituent parsing in Chinese with non-local features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 733–742). Baltimore, MD: Association for Computational Linguistics.
Watanabe, T., & Sumita, E. (2015). Transition-based neural constituent parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1169–1179). Beijing, China: Association for Computational Linguistics.
Wong, K.-F., Li, W., Xu, R., & Zhang, Z.-s., (2009). Introduction to Chinese natural language processing. Synthesis Lectures on Human Language Technologies,2(1), 1–148.
Xu, W., Auli, M., & Clark, S. (2015). CCG supertagging with a recurrent neural network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2: Short Papers, pp. 250–255). Beijing, China: Association for Computational Linguistics.
Xu, W., Auli, M., & Clark, S. (2016). Expected f-measure training for shift-reduce parsing with recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 210–220). San Diego, CA: Association for Computational Linguistics.
Xu, W., Clark, S., & Zhang, Y. (2014). Shift-reduce CCG parsing with a dependency model. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers).
Xue, N. (2003). Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 8, 29–48.
Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In In Proceedings of IWPT (pp. 195–206).
Zhang, M., & Zhang, Y. (2015). Combining discrete and continuous features for deterministic transition-based dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1316–1321). Lisbon, Portugal: Association for Computational Linguistics.
Zhang, M., Zhang, Y., Che, W., & Liu, T. (2013). Chinese parsing exploiting characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 125–134). Sofia, Bulgaria: Association for Computational Linguistics.
Zhang, M., Zhang, Y., Che, W., & Liu, T. (2014). Character-level Chinese dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1326–1336). Baltimore, MD: Association for Computational Linguistics.
Zhang, M., Zhang, Y., & Fu, G. (2016a). Transition-based neural word segmentation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 421–431), Berlin, Germany: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2007). Chinese segmentation with a word-based perceptron algorithm. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 840–847), Prague, Czech Republic: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2008a). Joint word segmentation and POS tagging using a single perceptron. In Proceedings of ACL-08: HLT (pp. 888–896). Columbus, OH: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2008b). A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 562–571), Honolulu, HI: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2009). Transition-based parsing of the Chinese Treebank using a global discriminative model. In Proceedings of the 11th International Conference on Parsing Technologies, IWPT ’09 (pp. 162–171). Stroudsburg, PA, USA: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2010). A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 843–852). Cambridge, MA: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2011a). Shift-reduce CCG parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 683–692). Portland, OR, USA: Association for Computational Linguistics.
Zhang, Y., & Clark, S. (2011b). Syntactic processing using the generalized perceptron and beam search. Computational Linguistics,37(1).
Zhang, Y., & Nivre, J. (2011). Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 188–193). Portland, OR, USA: Association for Computational Linguistics.
Zhang, Z., Zhao, H., & Qin, L. (2016b). Probabilistic graph-based dependency parsing with convolutional neural network. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1382–1392), Berlin, Germany: Association for Computational Linguistics.
Zhou, H., Zhang, Y., Huang, S., & Chen, J. (2015). A neural probabilistic structured-prediction model for transition-based dependency parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1213–1222), Beijing, China: Association for Computational Linguistics.
Zhou, J., & Xu, W. (2015). End-to-end learning of semantic role labeling using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 1: Long Papers, pp. 1127–1137), Beijing, China: Association for Computational Linguistics.
Zhu, M., Zhang, Y., Chen, W., Zhang, M., & Zhu, J. (2013). Fast and accurate shift-reduce constituent parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 434–443), Sofia, Bulgaria: Association for Computational Linguistics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Che, W., Zhang, Y. (2018). Deep Learning in Lexical Analysis and Parsing. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_4
Download citation
DOI: https://doi.org/10.1007/978-981-10-5209-5_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5208-8
Online ISBN: 978-981-10-5209-5
eBook Packages: Computer ScienceComputer Science (R0)