An Overview of Deep Learning

  • Zhaoqiang XiaEmail author


In the last decade, deep learning has attracted much attention and becomes a dominant technology in artificial intelligence community. This chapter reviews the concepts, methods, and latest applications of deep learning. Firstly, the basic concepts and developing history of deep learning are revisited briefly. Then, five basic types of deep learning methods, i.e., stacked autoencoders, deep belief networks, convolutional neural networks, recurrent neural networks, and generative adversarial networks, are introduced according to applications of deep learning in other domains that are briefly illustrated based on the types of data, such as acoustic data, image data, and textual data. Finally, several issues facing by deep learning are discussed to conclude the trends.


  1. 1.
    Aizenberg, I.N., Aizenberg, N.N., Vandewalle, J.P.: Multi-valued and universal binary neurons: Theory, learning and applications. Springer US (2000)Google Scholar
  2. 2.
    Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: International Conference on Human Behavior Understanding, pp. 29–39 (2011)Google Scholar
  3. 3.
    Badri, H., Yahia, H., Daoudi, K.: Fast and accurate texture recognition with multilayer convolution and multifractal analysis. In: European Conference on Computer Vision (ECCV), pp. 505–519 (2014)Google Scholar
  4. 4.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv (2014)Google Scholar
  5. 5.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 153–160 (2007)Google Scholar
  6. 6.
    Bengio, Y., Senecal, J.S.: Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks 19(4), 713–22 (2008)CrossRefGoogle Scholar
  7. 7.
    Carreira-Perpinan, M.A., Hinton, G.E.: On contrastive divergence learning. Artificial Intelligence & Statistics (2005)Google Scholar
  8. 8.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. In: British Machine Vision Conference (BMVC), pp. 1–12 (2014)Google Scholar
  9. 9.
    Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv (2014)Google Scholar
  10. 10.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research 12(1), 2493–2537 (2011)zbMATHGoogle Scholar
  11. 11.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio Speech & Language Processing 20(1), 30–42 (2012)CrossRefGoogle Scholar
  12. 12.
    Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: An overview. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8599–8603 (2013)Google Scholar
  13. 13.
    Deng, L., Yu, D.: Deep learning: Methods and applications. Foundations & Trends in Signal Processing 7(3), 197–387 (2013)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968 (2014)Google Scholar
  15. 15.
    Dong, C., Chen, C.L., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision (ECCV), vol. 8692, pp. 184–199 (2014)Google Scholar
  16. 16.
    Elman, J.L.: Finding structure in time. Cognitive Science 14(2), 179–211 (1990)CrossRefGoogle Scholar
  17. 17.
    Farabet, C., Couprie, C., Najman, L., Lecun, Y.: Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence 35(8), 1915–1929 (2013)CrossRefGoogle Scholar
  18. 18.
    Fischer, A., Igel, C.: Training restricted Boltzmann machines: An introduction. Pattern Recognition 47(1), 25–39 (2014)CrossRefGoogle Scholar
  19. 19.
    Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv (2014)Google Scholar
  20. 20.
    Goodfellow, I.J., Pougetabadie, J., Mirza, M., Xu, B., Wardefarley, D., Ozair, S., Courville, A., Bengio, Y., Ghahramani, Z., Welling, M.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), vol. 3, pp. 2672–2680 (2014)Google Scholar
  21. 21.
    Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning (ICML), pp. 1764–1772 (2014)Google Scholar
  22. 22.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of wasserstein GANs. arXiv (2017)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  24. 24.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV) (2016)Google Scholar
  25. 25.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)CrossRefGoogle Scholar
  26. 26.
    Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)Google Scholar
  27. 27.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735 (1997)CrossRefGoogle Scholar
  29. 29.
    Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: ACM Conference on Conference on Information & Knowledge Management, pp. 2333–2338 (2013)Google Scholar
  30. 30.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv (2015)Google Scholar
  31. 31.
    Jaitly, N., Nguyen, P., Senior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary conversational speech recognition. In: Proceedings of Interspeech (2012)Google Scholar
  32. 32.
    Ji, S., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 35(1), 221–231 (2013)CrossRefGoogle Scholar
  33. 33.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1725–1732 (2014)Google Scholar
  34. 34.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
  35. 35.
    Learned-Miller, E., Huang, G.B., Roychowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: A survey. Springer International Publishing (2016)Google Scholar
  36. 36.
    Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  37. 37.
    Lecun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541–551 (1989)CrossRefGoogle Scholar
  38. 38.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  39. 39.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (ICLR) (2014)Google Scholar
  40. 40.
    Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., Van Ginneken, B., Snchez, C.I.: A survey on deep learning in medical image analysis. arXiv (2017)Google Scholar
  41. 41.
    Masi, G., Cozzolino, D., Verdoliva, L., Scarpa, G.: Pansharpening by convolutional neural networks. Remote Sensing 8(7), 594 (2016)CrossRefGoogle Scholar
  42. 42.
    Ng, Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O.: Beyond short snippets: Deep networks for video classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 16, pp. 4694–4702 (2015)Google Scholar
  43. 43.
    Oord, A.V.D., Dieleman, S., Schrauwen, B.: Deep content-based music recommendation. In: Advances in Neural Information Processing Systems Conference (NIPS), pp. 2643–2651 (2013)Google Scholar
  44. 44.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence pp. 1–14 (2016)Google Scholar
  45. 45.
    Rosenblatt, F.: The perceptron – a perceiving and recognizing automaton. In: Math. Stat (1957)Google Scholar
  46. 46.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  47. 47.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Scherer, D., Ller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: International Conference on Artificial Neural Networks, pp. 92–101 (2010)Google Scholar
  49. 49.
    Schlkopf, B., Platt, J., Hofmann, T.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems (NIPS), pp. 1137–1144 (2006)Google Scholar
  50. 50.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  51. 51.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv (2013)Google Scholar
  52. 52.
    Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: The International ACM SIGIR Conference, pp. 373–382 (2015)Google Scholar
  53. 53.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems (NIPS), vol. 27, pp. 1988–1996 (2014)Google Scholar
  54. 54.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1891–1898 (2014)Google Scholar
  55. 55.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv (2016)Google Scholar
  56. 56.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)Google Scholar
  57. 57.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1708 (2014)Google Scholar
  58. 58.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11(12), 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Wang, M., Lu, Z., Li, H., Liu, Q.: Syntax-based deep matching of short texts. In: International Conference on Artificial Intelligence, pp. 1354–1361 (2015)Google Scholar
  60. 60.
    Xia, Z., Feng, X., Peng, J., Hadid, A.: Unsupervised deep hashing for large-scale visual search. In: International Conference on Image Processing Theory, Tools and Applications (IPTA) (2016)Google Scholar
  61. 61.
    Ze, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966 (2013)Google Scholar
  62. 62.
    Zhang, L., Zhang, L., Du, B.: Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geoscience & Remote Sensing Magazine 4(2), 22–40 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Electronics and InformationNorthwestern Polytechnical UniversityXi’anChina

Personalised recommendations