A Joint Introduction to Natural Language Processing and to Deep Learning

Deng, Li; Liu, Yang

doi:10.1007/978-981-10-5209-5_1

Li Deng³ &
Yang Liu⁴

11k Accesses
19 Citations

Abstract

In this chapter, we set up the fundamental framework for the book. We first provide an introduction to the basics of natural language processing (NLP) as an integral part of artificial intelligence. We then survey the historical development of NLP, spanning over five decades, in terms of three waves. The first two waves arose as rationalism and empiricism, paving ways to the current deep learning wave. The key pillars underlying the deep learning revolution for NLP consist of (1) distributed representations of linguistic entities via embedding, (2) semantic generalization due to the embedding, (3) long-span deep sequence modeling of natural language, (4) hierarchical networks effective for representing linguistic levels from low to high, and (5) end-to-end deep learning methods to jointly solve many NLP tasks. After the survey, several key limitations of current deep learning technology for NLP are analyzed. This analysis leads to five research directions for future advances in NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Trans. on Audio, Speech and Language Processing.
Google Scholar
Amodei, D., Ng, A., et al. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of ICML.
Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
Google Scholar
Baker, J., et al. (2009a). Research developments and directions in speech recognition and understanding. IEEE Signal Processing Magazine, 26(4).
Google Scholar
Baker, J., et al. (2009b). Updated MINDS report on speech recognition and understanding. IEEE Signal Processing Magazine, 26(4).
Google Scholar
Baum, L., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state markov chains. The Annals of Mathematical Statistics.
Google Scholar
Bengio, Y. (2009). Learning Deep Architectures for AI. Delft: NOW Publishers.
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., & d Jauvin, C. (2001). A neural probabilistic language model. Proceedings of NIPS.
Google Scholar
Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford: Oxford University Press.
Google Scholar
Bishop, C. (2006). Pattern Recognition and Machine Learning. Berlin: Springer.
Google Scholar
Bridle, J., et al. (1998). An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Language Engineering, Johns Hopkins University CLSP.
Google Scholar
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19.
Google Scholar
Charniak, E. (2011). The brain as a statistical inference engine—and you can too. Computational Linguistics, 37.
Google Scholar
Chiang, D. (2007). Hierarchical phrase-based translation. Computaitional Linguistics.
Google Scholar
Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.
Google Scholar
Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In Proceedings of NIPS.
Google Scholar
Church, K. (2007). A pendulum swung too far. Linguistic Issues in Language Technology, 2(4).
Google Scholar
Church, K. (2014). The case for empiricism (with and without statistics). In Proceedings of Frame Semantics in NLP.
Google Scholar
Church, K., & Mercer, R. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 9(1).
Google Scholar
Collins, M. (1997). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.
Google Scholar
Collins, M. (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP.
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Reserach, 12.
Google Scholar
Dahl, G., Yu, D., & Deng, L. (2011). Large-vocabulry continuous speech recognition with context-dependent DBN-HMMs. In Proceedings of ICASSP.
Google Scholar
Dahl, G., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transaction on Audio, Speech, and Language Processing, 20.
Google Scholar
Deng, L. (1998). A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Communication, 24(4).
Google Scholar
Deng, L. (2014). A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3.
Google Scholar
Deng, L. (2016). Deep learning: From speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing, 5.
Google Scholar
Deng, L. (2017). Artificial intelligence in the rising wave of deep learning—The historical path and future outlook. In IEEE Signal Processing Magazine, 35.
Google Scholar
Deng, L., & O’Shaughnessy, D. (2003). SPEECH PROCESSING A Dynamic and Optimization-Oriented Approach. New York: Marcel Dekker.
Google Scholar
Deng, L., & Yu, D. (2007). Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition. In Proceedings of ICASSP.
Google Scholar
Deng, L., & Yu, D. (2014). Deep Learning: Methods and Applications. Delft: NOW Publishers.
Google Scholar
Deng, L., Hinton, G., & Kingsbury, B. (2013). New types of deep neural network learning for speech recognition and related applications: An overview. In Proceedings of ICASSP.
Google Scholar
Deng, L., Seltzer, M., Yu, D., Acero, A., Mohamed, A., & Hinton, G. (2010). Binary coding of speech spectrograms using a deep autoencoder. In Proceedings of Interspeech.
Google Scholar
Deng, L., Yu, D., & Platt, J. (2012). Scalable stacking and learning for building deep architectures. In Proceedings of ICASSP.
Google Scholar
Devlin, J., et al. (2015). Language models for image captioning: The quirks and what works. In Proceedings of CVPR.
Google Scholar
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y., Ahmed, F., & Deng, L. (2017). Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of ACL.
Google Scholar
Fang, H., et al. (2015). From captions to visual concepts and back. In Proceedings of CVPR.
Google Scholar
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of CVPR.
Google Scholar
Fei-Fei, L., & Perona, P. (2016). Stacked attention networks for image question answering. In Proceedings of CVPR.
Google Scholar
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of ICML.
Google Scholar
Gan, Z., et al. (2017). Semantic compositional networks for visual captioning. In Proceedings of CVPR.
Google Scholar
Gasic, M., Mrk, N., Rojas-Barahona, L., Su, P., Ultes, S., Vandyke, D., Wen, T., & Young, S. (2017). Dialogue manager domain adaptation using gaussian process reinforcement learning. Computer Speech and Language, 45.
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.
Google Scholar
Goodfellow, I., et al. (2014). Generative adversarial networks. In Proceedings of NIPS.
Google Scholar
Graves, A., et al. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538.
Google Scholar
Hashimoto, K., Xiong, C., Tsuruoka, Y., & Socher, R. (2017). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of EMNLP.
Google Scholar
He, X., & Deng, L. (2012). Maximum expected BLEU training of phrase and lexicon translation models. In Proceedings of ACL.
Google Scholar
He, X., & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. Proceedings of the IEEE, 101.
Google Scholar
He, X., Deng, L., & Chou, W. (2008). Discriminative learning in sequential pattern recognition. IEEE Signal Processing Magazine, 25(5).
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of CVPR.
Google Scholar
Hinton, G., & Salakhutdinov, R. (2012). A better way to pre-train deep Boltzmann machines. In Proceedings of NIPS.
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., & Sainath, T. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29.
Google Scholar
Hinton, G., Osindero, S., & Teh, Y. -W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18.
Google Scholar
Hochreiter, S., et al. (2001). Learning to learn using gradient descent. In Proceedings of International Conference on Artificial Neural Networks.
Google Scholar
Huang, P., et al. (2013b). Learning deep structured semantic models for web search using clickthrough data. Proceedings of CIKM.
Google Scholar
Huang, J. -T., Li, J., Yu, D., Deng, L., & Gong, Y. (2013a). Cross-lingual knowledge transfer using multilingual deep neural networks with shared hidden layers. In Proceedings of ICASSP.
Google Scholar
Jackson, P. (1998). Introduction to Expert Systems. Boston: Addison-Wesley.
Google Scholar
Jelinek, F. (1998). Statistical Models for Speech Recognition. Cambridge: MIT Press.
Google Scholar
Juang, F. (2016). Deep neural networks a developmental perspective. APSIPA Transactions on Signal and Information Processing, 5.
Google Scholar
Kaiser, L., Nachum, O., Roy, A., & Bengio, S. (2017). Learning to remember rare events. In Proceedings of ICLR.
Google Scholar
Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of CVPR.
Google Scholar
Koh, P., & Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of ICML.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of NIPS.
Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML.
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521.
Google Scholar
Lee, L., Attias, H., Deng, L., & Fieguth, P. (2004). A multimodal variational approach to learning and inference in switching state space models. In Proceedings of ICASSP.
Google Scholar
Lee, M., et al. (2016). Reasoning in vector space: An exploratory study of question answering. In Proceedings of ICLR.
Google Scholar
Lin, H., Deng, L., Droppo, J., Yu, D., & Acero, A. (2008). Learning methods in multilingual speech recognition. In NIPS Workshop.
Google Scholar
Liu, Y., Chen, J., & Deng, L. (2017). An unsupervised learning method exploiting sequential output statistics. In arXiv:1702.07817.
Ma, J., & Deng, L. (2004). Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Transaction on Speech and Audio Processing, 12(4).
Google Scholar
Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceedings of ICML.
Google Scholar
Manning, C. (2016). Computational linguistics and deep learning. In Computational Linguistics.
Google Scholar
Manning, C., & Schtze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.
Google Scholar
Manning, C., & Socher, R. (2017). Lectures 17 and 18: Issues and Possible Architectures for NLP; Tackling the Limits of Deep Learning for NLP. CS224N Course: NLP with Deep Learning.
Google Scholar
Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of Interspeech.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518.
Google Scholar
Mohamed, A., Dahl, G., & Hinton, G. (2009). Acoustic modeling using deep belief networks. In NIPS Workshop on Speech Recognition.
Google Scholar
Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Cambridge: MIT Press.
Google Scholar
Nguyen, T., et al. (2017). MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611,09268
Nilsson, N. (1982). Principles of Artificial Intelligence. Berlin: Springer.
Google Scholar
Och, F. (2003). Maximum error rate training in statistical machine translation. In Proceedings of ACL.
Google Scholar
Och, F., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of ACL.
Google Scholar
Oh, J., Chockalingam, V., Singh, S., & Lee, H. (2016). Control of memory, active perception, and action in minecraft. In Proceedings of ICML.
Google Scholar
Palangi, H., Smolensky, P., He, X., & Deng, L. (2017). Deep learning of grammatically-interpretable representations through question-answering. arXiv:1705.08432
Parloff, R. (2016). Why deep learning is suddenly changing your life. In Fortune Magazine.
Google Scholar
Pereira, F. (2017). A (computational) linguistic farce in three acts. In http://www.earningmyturns.org.
Picone, J., et al. (1999). Initial evaluation of hidden dynamic models on conversational speech. In Proceedings of ICASSP.
Google Scholar
Plamondon, R., & Srihari, S. (2000). Online and off-line handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22.
Google Scholar
Rabiner, L., & Juang, B. -H. (1993). Fundamentals of Speech Recognition. USA: Prentice-Hall.
Google Scholar
Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing. Technical report, University of Pennsylvania.
Google Scholar
Reddy, R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4).
Google Scholar
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323.
Google Scholar
Russell, S., & Stefano, E. (2017). Label-free supervision of neural networks with physics and domain knowledge. In Proceedings of AAAI.
Google Scholar
Saon, G., et al. (2017). English conversational telephone speech recognition by humans and machines. In Proceedings of ICASSP.
Google Scholar
Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. Diploma Thesis, Institute of Informatik, Technical University Munich.
Google Scholar
Seneff, S., et al. (1991). Development and preliminary evaluation of the MIT ATIS system. In Proceedings of HLT.
Google Scholar
Smolensky, P., et al. (2016). Reasoning with tensor product representations. arXiv:1601,02745
Sutskevar, I., Vinyals, O., & Le, Q. (2014). Sequence to sequence learning with neural networks. In Proceedings of NIPS.
Google Scholar
Tur, G., & Deng, L. (2011). Intent Determination and Spoken Utterance Classification; Chapter 4 in book: Spoken Language Understanding. Hoboken: Wiley.
Google Scholar
Turing, A. (1950). Computing machinery and intelligence. Mind, 14.
Google Scholar
Vapnik, V. (1998). Statistical Learning Theory. Hoboken: Wiley.
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. -A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11.
Google Scholar
Vinyals, O., et al. (2016). Matching networks for one shot learning. In Proceedings of NIPS.
Google Scholar
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57.
Google Scholar
Wang, Y. -Y., Deng, L., & Acero, A. (2011). Semantic Frame Based Spoken Language Understanding; Chapter 3 in book: Spoken Language Understanding. Hoboken: Wiley.
Google Scholar
Wichrowska, O., et al. (2017). Learned optimizers that scale and generalize. In Proceedings of ICML.
Google Scholar
Winston, P. (1993). Artificial Intelligence. Boston: Addison-Wesley.
Google Scholar
Xiong, W., et al. (2016). Achieving human parity in conversational speech recognition. In Proceedings of Interspeech.
Google Scholar
Young, S., Gasic, M., Thomson, B., & Williams, J. (2013). Pomdp-based statistical spoken dialogue systems: A review. Proceedings of the IEEE, 101.
Google Scholar
Yu, D., & Deng, L. (2015). Automatic Speech Recognition: A Deep Learning Approach. Berlin: Springer.
Google Scholar
Yu, D., Deng, L., & Dahl, G. (2010). Roles of pre-training and fine-tuning in context-dependent dbn-hmms for real-world speech recognition. In NIPS Workshop.
Google Scholar
Yu, D., Deng, L., Seide, F., & Li, G. (2011). Discriminative pre-training of deep nerual networks. In U.S. Patent No. 9,235,799, granted in 2016, filed in 2011.
Google Scholar
Zue, V. (1985). The use of speech knowledge in automatic speech recognition. Proceedings of the IEEE, 73.
Google Scholar

Download references

Author information

Authors and Affiliations

Citadel, Seattle & Chicago, USA
Li Deng
Tsinghua University, Beijing, China
Yang Liu

Authors

Li Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Deng .

Editor information

Editors and Affiliations

AI Research at Citadel , Chicago, Illinois, USA
Li Deng
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Deng, L., Liu, Y. (2018). A Joint Introduction to Natural Language Processing and to Deep Learning. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_1

Download citation

DOI: https://doi.org/10.1007/978-981-10-5209-5_1
Published: 24 May 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5208-8
Online ISBN: 978-981-10-5209-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics