Skip to main content

A Joint Introduction to Natural Language Processing and to Deep Learning

  • Chapter
  • First Online:
Deep Learning in Natural Language Processing

Abstract

In this chapter, we set up the fundamental framework for the book. We first provide an introduction to the basics of natural language processing (NLP) as an integral part of artificial intelligence. We then survey the historical development of NLP, spanning over five decades, in terms of three waves. The first two waves arose as rationalism and empiricism, paving ways to the current deep learning wave. The key pillars underlying the deep learning revolution for NLP consist of (1) distributed representations of linguistic entities via embedding, (2) semantic generalization due to the embedding, (3) long-span deep sequence modeling of natural language, (4) hierarchical networks effective for representing linguistic levels from low to high, and (5) end-to-end deep learning methods to jointly solve many NLP tasks. After the survey, several key limitations of current deep learning technology for NLP are analyzed. This analysis leads to five research directions for future advances in NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Trans. on Audio, Speech and Language Processing.

    Google Scholar 

  • Amodei, D., Ng, A., et al. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of ICML.

    Google Scholar 

  • Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.

    Google Scholar 

  • Baker, J., et al. (2009a). Research developments and directions in speech recognition and understanding. IEEE Signal Processing Magazine, 26(4).

    Google Scholar 

  • Baker, J., et al. (2009b). Updated MINDS report on speech recognition and understanding. IEEE Signal Processing Magazine, 26(4).

    Google Scholar 

  • Baum, L., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state markov chains. The Annals of Mathematical Statistics.

    Google Scholar 

  • Bengio, Y. (2009). Learning Deep Architectures for AI. Delft: NOW Publishers.

    Google Scholar 

  • Bengio, Y., Ducharme, R., Vincent, P., & d Jauvin, C. (2001). A neural probabilistic language model. Proceedings of NIPS.

    Google Scholar 

  • Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford: Oxford University Press.

    Google Scholar 

  • Bishop, C. (2006). Pattern Recognition and Machine Learning. Berlin: Springer.

    Google Scholar 

  • Bridle, J., et al. (1998). An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Language Engineering, Johns Hopkins University CLSP.

    Google Scholar 

  • Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19.

    Google Scholar 

  • Charniak, E. (2011). The brain as a statistical inference engine—and you can too. Computational Linguistics, 37.

    Google Scholar 

  • Chiang, D. (2007). Hierarchical phrase-based translation. Computaitional Linguistics.

    Google Scholar 

  • Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.

    Google Scholar 

  • Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In Proceedings of NIPS.

    Google Scholar 

  • Church, K. (2007). A pendulum swung too far. Linguistic Issues in Language Technology, 2(4).

    Google Scholar 

  • Church, K. (2014). The case for empiricism (with and without statistics). In Proceedings of Frame Semantics in NLP.

    Google Scholar 

  • Church, K., & Mercer, R. (1993). Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 9(1).

    Google Scholar 

  • Collins, M. (1997). Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.

    Google Scholar 

  • Collins, M. (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP.

    Google Scholar 

  • Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Reserach, 12.

    Google Scholar 

  • Dahl, G., Yu, D., & Deng, L. (2011). Large-vocabulry continuous speech recognition with context-dependent DBN-HMMs. In Proceedings of ICASSP.

    Google Scholar 

  • Dahl, G., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transaction on Audio, Speech, and Language Processing, 20.

    Google Scholar 

  • Deng, L. (1998). A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Communication, 24(4).

    Google Scholar 

  • Deng, L. (2014). A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3.

    Google Scholar 

  • Deng, L. (2016). Deep learning: From speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing, 5.

    Google Scholar 

  • Deng, L. (2017). Artificial intelligence in the rising wave of deep learning—The historical path and future outlook. In IEEE Signal Processing Magazine, 35.

    Google Scholar 

  • Deng, L., & O’Shaughnessy, D. (2003). SPEECH PROCESSING A Dynamic and Optimization-Oriented Approach. New York: Marcel Dekker.

    Google Scholar 

  • Deng, L., & Yu, D. (2007). Use of differential cepstra as acoustic features in hidden trajectory modeling for phonetic recognition. In Proceedings of ICASSP.

    Google Scholar 

  • Deng, L., & Yu, D. (2014). Deep Learning: Methods and Applications. Delft: NOW Publishers.

    Google Scholar 

  • Deng, L., Hinton, G., & Kingsbury, B. (2013). New types of deep neural network learning for speech recognition and related applications: An overview. In Proceedings of ICASSP.

    Google Scholar 

  • Deng, L., Seltzer, M., Yu, D., Acero, A., Mohamed, A., & Hinton, G. (2010). Binary coding of speech spectrograms using a deep autoencoder. In Proceedings of Interspeech.

    Google Scholar 

  • Deng, L., Yu, D., & Platt, J. (2012). Scalable stacking and learning for building deep architectures. In Proceedings of ICASSP.

    Google Scholar 

  • Devlin, J., et al. (2015). Language models for image captioning: The quirks and what works. In Proceedings of CVPR.

    Google Scholar 

  • Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y., Ahmed, F., & Deng, L. (2017). Towards end-to-end reinforcement learning of dialogue agents for information access. In Proceedings of ACL.

    Google Scholar 

  • Fang, H., et al. (2015). From captions to visual concepts and back. In Proceedings of CVPR.

    Google Scholar 

  • Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of CVPR.

    Google Scholar 

  • Fei-Fei, L., & Perona, P. (2016). Stacked attention networks for image question answering. In Proceedings of CVPR.

    Google Scholar 

  • Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of ICML.

    Google Scholar 

  • Gan, Z., et al. (2017). Semantic compositional networks for visual captioning. In Proceedings of CVPR.

    Google Scholar 

  • Gasic, M., Mrk, N., Rojas-Barahona, L., Su, P., Ultes, S., Vandyke, D., Wen, T., & Young, S. (2017). Dialogue manager domain adaptation using gaussian process reinforcement learning. Computer Speech and Language, 45.

    Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.

    Google Scholar 

  • Goodfellow, I., et al. (2014). Generative adversarial networks. In Proceedings of NIPS.

    Google Scholar 

  • Graves, A., et al. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538.

    Google Scholar 

  • Hashimoto, K., Xiong, C., Tsuruoka, Y., & Socher, R. (2017). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of EMNLP.

    Google Scholar 

  • He, X., & Deng, L. (2012). Maximum expected BLEU training of phrase and lexicon translation models. In Proceedings of ACL.

    Google Scholar 

  • He, X., & Deng, L. (2013). Speech-centric information processing: An optimization-oriented approach. Proceedings of the IEEE, 101.

    Google Scholar 

  • He, X., Deng, L., & Chou, W. (2008). Discriminative learning in sequential pattern recognition. IEEE Signal Processing Magazine, 25(5).

    Google Scholar 

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of CVPR.

    Google Scholar 

  • Hinton, G., & Salakhutdinov, R. (2012). A better way to pre-train deep Boltzmann machines. In Proceedings of NIPS.

    Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B., & Sainath, T. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29.

    Google Scholar 

  • Hinton, G., Osindero, S., & Teh, Y. -W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18.

    Google Scholar 

  • Hochreiter, S., et al. (2001). Learning to learn using gradient descent. In Proceedings of International Conference on Artificial Neural Networks.

    Google Scholar 

  • Huang, P., et al. (2013b). Learning deep structured semantic models for web search using clickthrough data. Proceedings of CIKM.

    Google Scholar 

  • Huang, J. -T., Li, J., Yu, D., Deng, L., & Gong, Y. (2013a). Cross-lingual knowledge transfer using multilingual deep neural networks with shared hidden layers. In Proceedings of ICASSP.

    Google Scholar 

  • Jackson, P. (1998). Introduction to Expert Systems. Boston: Addison-Wesley.

    Google Scholar 

  • Jelinek, F. (1998). Statistical Models for Speech Recognition. Cambridge: MIT Press.

    Google Scholar 

  • Juang, F. (2016). Deep neural networks a developmental perspective. APSIPA Transactions on Signal and Information Processing, 5.

    Google Scholar 

  • Kaiser, L., Nachum, O., Roy, A., & Bengio, S. (2017). Learning to remember rare events. In Proceedings of ICLR.

    Google Scholar 

  • Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of CVPR.

    Google Scholar 

  • Koh, P., & Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of ICML.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of NIPS.

    Google Scholar 

  • Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML.

    Google Scholar 

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521.

    Google Scholar 

  • Lee, L., Attias, H., Deng, L., & Fieguth, P. (2004). A multimodal variational approach to learning and inference in switching state space models. In Proceedings of ICASSP.

    Google Scholar 

  • Lee, M., et al. (2016). Reasoning in vector space: An exploratory study of question answering. In Proceedings of ICLR.

    Google Scholar 

  • Lin, H., Deng, L., Droppo, J., Yu, D., & Acero, A. (2008). Learning methods in multilingual speech recognition. In NIPS Workshop.

    Google Scholar 

  • Liu, Y., Chen, J., & Deng, L. (2017). An unsupervised learning method exploiting sequential output statistics. In arXiv:1702.07817.

  • Ma, J., & Deng, L. (2004). Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Transaction on Speech and Audio Processing, 12(4).

    Google Scholar 

  • Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceedings of ICML.

    Google Scholar 

  • Manning, C. (2016). Computational linguistics and deep learning. In Computational Linguistics.

    Google Scholar 

  • Manning, C., & Schtze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT Press.

    Google Scholar 

  • Manning, C., & Socher, R. (2017). Lectures 17 and 18: Issues and Possible Architectures for NLP; Tackling the Limits of Deep Learning for NLP. CS224N Course: NLP with Deep Learning.

    Google Scholar 

  • Mesnil, G., He, X., Deng, L., & Bengio, Y. (2013). Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In Proceedings of Interspeech.

    Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.

    Google Scholar 

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518.

    Google Scholar 

  • Mohamed, A., Dahl, G., & Hinton, G. (2009). Acoustic modeling using deep belief networks. In NIPS Workshop on Speech Recognition.

    Google Scholar 

  • Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Cambridge: MIT Press.

    Google Scholar 

  • Nguyen, T., et al. (2017). MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611,09268

  • Nilsson, N. (1982). Principles of Artificial Intelligence. Berlin: Springer.

    Google Scholar 

  • Och, F. (2003). Maximum error rate training in statistical machine translation. In Proceedings of ACL.

    Google Scholar 

  • Och, F., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of ACL.

    Google Scholar 

  • Oh, J., Chockalingam, V., Singh, S., & Lee, H. (2016). Control of memory, active perception, and action in minecraft. In Proceedings of ICML.

    Google Scholar 

  • Palangi, H., Smolensky, P., He, X., & Deng, L. (2017). Deep learning of grammatically-interpretable representations through question-answering. arXiv:1705.08432

  • Parloff, R. (2016). Why deep learning is suddenly changing your life. In Fortune Magazine.

    Google Scholar 

  • Pereira, F. (2017). A (computational) linguistic farce in three acts. In http://www.earningmyturns.org.

  • Picone, J., et al. (1999). Initial evaluation of hidden dynamic models on conversational speech. In Proceedings of ICASSP.

    Google Scholar 

  • Plamondon, R., & Srihari, S. (2000). Online and off-line handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22.

    Google Scholar 

  • Rabiner, L., & Juang, B. -H. (1993). Fundamentals of Speech Recognition. USA: Prentice-Hall.

    Google Scholar 

  • Ratnaparkhi, A. (1997). A simple introduction to maximum entropy models for natural language processing. Technical report, University of Pennsylvania.

    Google Scholar 

  • Reddy, R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4).

    Google Scholar 

  • Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323.

    Google Scholar 

  • Russell, S., & Stefano, E. (2017). Label-free supervision of neural networks with physics and domain knowledge. In Proceedings of AAAI.

    Google Scholar 

  • Saon, G., et al. (2017). English conversational telephone speech recognition by humans and machines. In Proceedings of ICASSP.

    Google Scholar 

  • Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. Diploma Thesis, Institute of Informatik, Technical University Munich.

    Google Scholar 

  • Seneff, S., et al. (1991). Development and preliminary evaluation of the MIT ATIS system. In Proceedings of HLT.

    Google Scholar 

  • Smolensky, P., et al. (2016). Reasoning with tensor product representations. arXiv:1601,02745

  • Sutskevar, I., Vinyals, O., & Le, Q. (2014). Sequence to sequence learning with neural networks. In Proceedings of NIPS.

    Google Scholar 

  • Tur, G., & Deng, L. (2011). Intent Determination and Spoken Utterance Classification; Chapter 4 in book: Spoken Language Understanding. Hoboken: Wiley.

    Google Scholar 

  • Turing, A. (1950). Computing machinery and intelligence. Mind, 14.

    Google Scholar 

  • Vapnik, V. (1998). Statistical Learning Theory. Hoboken: Wiley.

    Google Scholar 

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. -A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11.

    Google Scholar 

  • Vinyals, O., et al. (2016). Matching networks for one shot learning. In Proceedings of NIPS.

    Google Scholar 

  • Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57.

    Google Scholar 

  • Wang, Y. -Y., Deng, L., & Acero, A. (2011). Semantic Frame Based Spoken Language Understanding; Chapter 3 in book: Spoken Language Understanding. Hoboken: Wiley.

    Google Scholar 

  • Wichrowska, O., et al. (2017). Learned optimizers that scale and generalize. In Proceedings of ICML.

    Google Scholar 

  • Winston, P. (1993). Artificial Intelligence. Boston: Addison-Wesley.

    Google Scholar 

  • Xiong, W., et al. (2016). Achieving human parity in conversational speech recognition. In Proceedings of Interspeech.

    Google Scholar 

  • Young, S., Gasic, M., Thomson, B., & Williams, J. (2013). Pomdp-based statistical spoken dialogue systems: A review. Proceedings of the IEEE, 101.

    Google Scholar 

  • Yu, D., & Deng, L. (2015). Automatic Speech Recognition: A Deep Learning Approach. Berlin: Springer.

    Google Scholar 

  • Yu, D., Deng, L., & Dahl, G. (2010). Roles of pre-training and fine-tuning in context-dependent dbn-hmms for real-world speech recognition. In NIPS Workshop.

    Google Scholar 

  • Yu, D., Deng, L., Seide, F., & Li, G. (2011). Discriminative pre-training of deep nerual networks. In U.S. Patent No. 9,235,799, granted in 2016, filed in 2011.

    Google Scholar 

  • Zue, V. (1985). The use of speech knowledge in automatic speech recognition. Proceedings of the IEEE, 73.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Deng, L., Liu, Y. (2018). A Joint Introduction to Natural Language Processing and to Deep Learning. In: Deng, L., Liu, Y. (eds) Deep Learning in Natural Language Processing. Springer, Singapore. https://doi.org/10.1007/978-981-10-5209-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5209-5_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5208-8

  • Online ISBN: 978-981-10-5209-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics