Advertisement

Deep Learning

  • Ke-Lin DuEmail author
  • M. N. S. Swamy
Chapter

Abstract

The advent of deep learning has dramatically improved the state of the art in artificial intelligence (AI). Deep learning is regarded as the AI model closest to the human brain due to its deep structure. Deep learning has been widely used in pattern understanding and recognition fields that are traditionally hard to solve. This chapter introduces deep learning and deep learning networks.

References

  1. 1.
    Abdel-Hamid, O., Mohamed, A., Jiang, H., & Penn, G. (2012). Applying convolutional neural network concepts to hybrid NN-HMM model for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, Japan.Google Scholar
  2. 2.
    Audhkhasi, K., Osoba, O., & Kosko, B. (2016). Noise-enhanced convolutional neural networks. Neural Networks, 78, 15–23.zbMATHCrossRefGoogle Scholar
  3. 3.
    Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. Proceedings of International Conference on Learning Representations.Google Scholar
  4. 4.
    Bartlett, P. L., Harvey, N., Liaw, C., & Mehrabian, A. (2019). Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20, 1–17.Google Scholar
  5. 5.
    Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279.CrossRefGoogle Scholar
  6. 6.
    Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2006). Greedy layer-wise training of deep networks. In B. Schlkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems (Vol. 19, pp. 153–160). Cambridge, MA: MIT Press.Google Scholar
  8. 8.
    Bengio, Y., Lee, D.-H., Bornschein, J., Mesnard, T., & Lin, Z. (2015). Towards biologically plausible deep learning. arXiv:1502.04156, 1–10.
  9. 9.
    Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.CrossRefGoogle Scholar
  10. 10.
    Bianchini, M., & Scarselli, F. (2014). On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Transactions on Neural Networks and Learning Systems, 25(8), 1553–1565.CrossRefGoogle Scholar
  11. 11.
    Boser, B., Sackinger, E., Bromley, J., LeCun, Y., & Jackel, L. (1991). An analog neural network processor with programmable topology. IEEE Journal of Solid-State Circuits, 26, 2017–2025.CrossRefGoogle Scholar
  12. 12.
    Bruna, J., Szlam, A., & LeCun, Y. (2014). Signal recovery from pooling representations. In Proceedings of the 31st International Conference on Machine Learning (pp. 307–315).Google Scholar
  13. 13.
    Chen, B., Ting, J.-A., Marlin, B., & de Freitas, N. (2010). Deep learning of invariant spatio-temporal features from video. In Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning.Google Scholar
  14. 14.
    Chen, X.-W., & Lin, X. (2014). Big data deep learning: Challenges and perspectives. IEEE Access, 2, 514–525.CrossRefGoogle Scholar
  15. 15.
    Chien, J.-T., & Bao, Y.-T. (2018). Tensor-factorized neural networks. IEEE Transactions on Neural Networks and Learning Systems, 29(5), 1998–2011.MathSciNetCrossRefGoogle Scholar
  16. 16.
    Chui, C. K., Li, X., & Mhaskar, H. N. (1994). Neural networks for localized approximation. Mathematics of Computation, 63(208), 607–623.MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Ciresan, D. C., Meier, U., Gambardella, L. M., & Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), 3207–3220.CrossRefGoogle Scholar
  18. 18.
    Delalleau, O., & Bengio, Y. (2011). Shallow vs. deep sum-product networks. Advances in neural information processing systems (pp. 666–674).Google Scholar
  19. 19.
    Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pretrained deep neural networks for large-vocabulary speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 20(1), 30–42.CrossRefGoogle Scholar
  20. 20.
    Deng, L., Yu, D., & Platt, J. (2012). Scalable stacking and learning for building deep architectures. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 2133–2136).Google Scholar
  21. 21.
    Diaz-Vico, D., & Dorronsoro, J. R. (2019). Deep least squares fisher discriminant analysis. IEEE Transactions on Neural Networks and Learning Systems.  https://doi.org/10.1109/TNNLS.2019.2906302.
  22. 22.
    Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295–307.CrossRefGoogle Scholar
  23. 23.
    Edelman, G. M., & Mountcastle, V. B. (1978). The mindful brain: Cortical organization and the group-selective theory of higher brain function. Cambridge, MA: MIT Press.Google Scholar
  24. 24.
    Eldan, R., & Shamir, O. (2016). The power of depth for feedforward neural networks. In Proceedings of the 29th Annual Conference on Learning Theory (PMLR Vol. 49, pp. 907–940). New York, NY.Google Scholar
  25. 25.
    Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660.MathSciNetzbMATHGoogle Scholar
  26. 26.
    Esmaeilzehi, A., Ahmad, M. O., & Swamy, M. N. S. (2018). CompNet: A new scheme for single image super resolution based on deep convolutional neural network. IEEE Access, 6, 59963–59974.CrossRefGoogle Scholar
  27. 27.
    Esmaeilzehi, A., Ahmad, M. O., & Swamy, M. N. S. (2019). SRSubBandNet: A new deep learning scheme for single image super resolution based on subband reconstruction. In Proceedings of the IEEE International Symposium on Circuits and Systems. Sapporo, Japan.Google Scholar
  28. 28.
    Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., et al. (2011). Large-scale FPGA-based convolutional networks. In R. Bekkerman, M. Bilenko, & J. Langford (Eds.), Machine learning on very large data sets (pp. 399–419). Cambridge, UK: Cambridge University Press.Google Scholar
  29. 29.
    Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193–202.zbMATHCrossRefGoogle Scholar
  30. 30.
    Gallicchio, C., Micheli, A., & Pedrelli, L. (2017). Deep reservoir computing: A critical experimental analysis. Neurocomputing, 268, 87–99.CrossRefGoogle Scholar
  31. 31.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (Vol. 27, pp. 2672–2680).Google Scholar
  32. 32.
    Hastad, J. T. (1987). Computational limitations for small depth circuits. Cambridge, MA: MIT Press.Google Scholar
  33. 33.
    Hastad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1(2), 113–129.MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.CrossRefGoogle Scholar
  35. 35.
    He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).Google Scholar
  36. 36.
    Hinton, G. E. (2007). To recognize shapes, first learn to generate images. Progress in Brain Research, 165, 535–547.CrossRefGoogle Scholar
  37. 37.
    Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of Physiology, 148(3), 574–591.CrossRefGoogle Scholar
  40. 40.
    Kampffmeyer, M., Lokse, S., Bianchi, F. M., Livi, L., Salberg, A.-B., & Jenssen, R. (2019). Deep divergence-based approach to clustering. Neural Networks, 113, 91–101.CrossRefGoogle Scholar
  41. 41.
    Kheradpisheh, S. R., Ganjtabesh, M., Thorpe, S. J., & Masquelier, T. (2018). STDP-based spiking deep convolutional neural networks for object recognition. Neural Networks, 99, 56–67.CrossRefGoogle Scholar
  42. 42.
    Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada.Google Scholar
  43. 43.
    Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (Vol. 25, pp. 1090–1098).Google Scholar
  44. 44.
    Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 1, 1–40.zbMATHGoogle Scholar
  45. 45.
    LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and timeseries. In The handbook of brain theory and neural networks. Cambridge, MA: MIT Press.Google Scholar
  46. 46.
    LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRefGoogle Scholar
  47. 47.
    Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Handwritten digit recognition with a back-propagation network. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 2, pp. 396–404). San Mateo, CA: Morgan Kaufmann.Google Scholar
  48. 48.
    LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  49. 49.
    Lee, H., Ekanadham, C., & Ng, A. Y. (2007). Sparse deep belief net model for visual area V2. In J. C. Platt, D. Koller, Y. Singer, & S. T. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 873–880).Google Scholar
  50. 50.
    Lee, H., Grosse, R., Ranganath, R., Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In L. Bottou, & M. Littman (Eds.), Proceedings of the 26th Annual International Conference on Machine Learning (pp. 609–616). New York: ACM.Google Scholar
  51. 51.
    Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    Le Roux, N., & Bengio, Y. (2010). Deep belief networks are compact universal approximators. Neural Computation, 22, 2192–2207.MathSciNetzbMATHCrossRefGoogle Scholar
  53. 53.
    Lin, M., Chen, Q., & Yan, S. (2014). Network in network. In Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada.Google Scholar
  54. 54.
    Ling, C. X., & Zhang, H. (2002). The representational power of discrete Bayesian networks. Journal of Machine Learning Research, 3, 709–721.MathSciNetzbMATHGoogle Scholar
  55. 55.
    Liu, J., Gong, M., & He, H. (2019). Deep associative neural network for associative memory based on unsupervised representation learning. Neural Networks, 113, 41–53.CrossRefGoogle Scholar
  56. 56.
    Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the International Conference on Machine Learning (Vol. 30).Google Scholar
  57. 57.
    Marquez, E. S., Hare, J. S., & Niranjan, M. (2018). Deep cascade learning. IEEE Transactions on Neural Networks and Learning Systems, 29(11), 5475–5485.MathSciNetCrossRefGoogle Scholar
  58. 58.
    Masci, J., Meier, U., Ciresan, D., & Schmidhuber, J. (2011). Stacked convolutional autoencoders for hierarchical feature extraction. In Proceedings of the 21st International Conference on Artificial Neural Networks (Vol. 1, pp. 52–59). Espoo, Finland.Google Scholar
  59. 59.
    McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419–457.CrossRefGoogle Scholar
  60. 60.
    Mhaskar, H., Liao, Q., & Poggio, T. (2016). Learning functions: When is deep better than shallow. CBMM Memo No. 045. https://arxiv.org/pdf/1603.00988v4.pdf.
  61. 61.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.CrossRefGoogle Scholar
  62. 62.
    Mohamed, A., Dahl, G., & Hinton, G. (2009). Deep belief networks for phone recognition. In Proceedings of NIPS Workshop on Deep Learning for Speech Recognition and Related Applications.Google Scholar
  63. 63.
    Montufar, G. F., Pascanu, R., Cho, K., & Bengio, Y. (2014). On the number of linear regions of deep neural networks. In Advances in neural information processing systems (Vol. 27, pp. 2924–2932).Google Scholar
  64. 64.
    Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the International Conference on Machine Learning (ICML) (pp. 807–814).Google Scholar
  65. 65.
    Nitta, T. (2017). Resolution of singularities introduced by hierarchical structure in deep neural networks. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2282–2293.MathSciNetCrossRefGoogle Scholar
  66. 66.
    O’Connor, P., Neil, D., Liu, S.-C., Delbruck, T., & Pfeiffer, M. (2013). Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience, 7, 1–13.Google Scholar
  67. 67.
    Pape, L., Gomez, F., Ring, M., & Schmidhuber, J. (2011). Modular deep belief networks that do not forget. In Proceedings of IEEE International Joint Conference on Neural Networks (pp. 1191–1198).Google Scholar
  68. 68.
    Poon, H., & Domingos, P. (2011). Sum-product networks: A new deep architecture. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (pp. 337–346). Barcelona, Spain.Google Scholar
  69. 69.
    Ranzato, M. A., Poultney, C., Chopra, S., & LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems (Vol. 19, 1137–1144).Google Scholar
  70. 70.
    Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral representations for convolutional neural networks. In Advances in neural information processing systems (Vol. 28, pp. 2449–2457).Google Scholar
  71. 71.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRefGoogle Scholar
  72. 72.
    Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann machines. In D. van Dyk, & M. Welling (Eds.), Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (PMLR Vol. 5, pp. 448–455).Google Scholar
  73. 73.
    Salakhutdinov, R., & Larochelle, H. (2010). Efficient learning of deep Boltzmann machines. In Y. W. Teh, & M. Titterington, (Eds.), Proceedings of the 13th Annual International Conference on Artificial Intelligence and Statistics (pp. 693–700).Google Scholar
  74. 74.
    Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4, 234–242.CrossRefGoogle Scholar
  75. 75.
    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.CrossRefGoogle Scholar
  76. 76.
    Simard, P., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (pp. 958–963).Google Scholar
  77. 77.
    Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
  78. 78.
    Sutskever, I., & Hinton, G. E. (2008). Deep, narrow sigmoid belief networks are universal approximators. Neural Computation, 20(11), 2629–2636.zbMATHCrossRefGoogle Scholar
  79. 79.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9).Google Scholar
  80. 80.
    Szymanski, L., & McCane, B. (2014). Deep networks are effective encoders of periodicity. IEEE Transactions on Neural Networks and Learning Systems, 25(10), 1816–1827.CrossRefGoogle Scholar
  81. 81.
    Telgarsky, M. (2016). Benefits of depth in neural networks. In Proceedings of the 29th Annual Conference on Learning Theory (PMLR Vol. 49, pp. 1517–1539). New York, NY.Google Scholar
  82. 82.
    Veit, A., Wilber, M., & Belongie, S. (2016). Residual networks behave like ensembles of relatively shallow networks. In Advances in neural information processing systems (Vol. 29, pp. 550–558).Google Scholar
  83. 83.
    Vergari, A., Di Mauro, N., & Esposito, F. (2019). Visualizing and understanding sum-product networks. Machine Learning, 108, 551–573.MathSciNetzbMATHCrossRefGoogle Scholar
  84. 84.
    Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K., & Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics Speech and Signal Processing, 37, 328–339.CrossRefGoogle Scholar
  85. 85.
    Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.CrossRefGoogle Scholar
  86. 86.
    Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 48, pp. 478–487). New York, NY.Google Scholar
  87. 87.
    Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks. Neural Networks, 94, 103–114.CrossRefGoogle Scholar
  88. 88.
    Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Proceedings of British Machine Vision Conference (pp. 87.1–87.12). Newcastle, UK.Google Scholar
  89. 89.
    Zeiler, M. D., & Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the 1st International Conference on Learning Representations. Scottsdale, AZ.Google Scholar
  90. 90.
    Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 818–833).Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  2. 2.Xonlink Inc.HangzhouChina

Personalised recommendations