Advertisement

Theoretical Characterization of Deep Neural Networks

  • Piyush KaulEmail author
  • Brejesh Lall
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 866)

Abstract

Deep neural networks are poorly understood mathematically, however there has been a lot of recent work focusing on analyzing and understanding their success in a variety of pattern recognition tasks. We describe some of the mathematical techniques used for characterization of neural networks in terms of complexity of classification or regression task assigned, or based on functions learned, and try to relate this to architecture choices for neural networks. We explain some of the measurable quantifiers that can been used for defining expressivity of neural network including using homological complexity and curvature. We also describe neural networks from the viewpoints of scattering transforms and share some of the mathematical and intuitive justifications for those. We finally share a technique for visualizing and analyzing neural networks based on concept of Riemann curvature.

Keywords

Deep neural networks Machine learning Expressivity Algebraic topology Betti numbers Curvature Riemannian geometry Scattering transform 

References

  1. 1.
    Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al. Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)Google Scholar
  2. 2.
    Andén, J., Mallat, S.: Deep scattering spectrum. IEEE Trans. Signal Process. 62(16), 4114–4128 (2014)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bartlett, P.L., Maass, W.: Vapnik-Chervonenkis dimension of neural nets. In: The Handbook of Brain Theory and Neural Networks, pp. 1188–1192 (2003)Google Scholar
  4. 4.
    Bengio, Y., Delalleau, O.: On the expressive power of deep architectures. In: International Conference on Algorithmic Learning Theory, pp. 18–36. Springer, Berlin (2011)Google Scholar
  5. 5.
    Bianchini, M., Scarselli, F.: On the complexity of shallow and deep neural network classifiers. In: ESANN (2014)Google Scholar
  6. 6.
    Bredon, G.E.: Topology and Geometry, vol. 139. Springer Science & Business Media (2013)Google Scholar
  7. 7.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Signal Process. Mag. 34(4), 18–42 (2017)CrossRefGoogle Scholar
  8. 8.
    Bruna, J.: Geometric stability in Euclidean domains: the scattering transform and beyond. https://joanbruna.github.io/MathsDL-spring18/ (2018)
  9. 9.
    Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)CrossRefGoogle Scholar
  10. 10.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
  11. 11.
    Choquet-Bruhat, Cécile, Y., DeWitt-Morette, C., Dillard-Bleick, M.: Analysis, Manifolds, and Physics. Gulf Professional Publishing (1982)Google Scholar
  12. 12.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley (2012)Google Scholar
  13. 13.
    Friedman, J., Hastie, T. and Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer Series in Statistics. Springer, New York (2001)Google Scholar
  14. 14.
    Gilmore, R.: Lie Groups, Lie Algebras, and Some of Their Applications. Courier Corporation (2012)Google Scholar
  15. 15.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  16. 16.
    Goodfellow, I., Bengio, Y. and Courville, A.: Deep Learning. MIT Press (2016)Google Scholar
  17. 17.
    Guss, W.H., Salakhutdinov, R.: On characterizing the capacity of neural networks using algebraic topology (2018). arXiv preprint arXiv:1802.04443
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  19. 19.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  20. 20.
    Kaul, P., Lall, B.: Riemannian curvature of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2019).  https://doi.org/10.1109/TNNLS.2019.2919705CrossRefGoogle Scholar
  21. 21.
    Kearns, M.J., Vazirani, U.V., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press (1994)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  24. 24.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  25. 25.
    LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010)Google Scholar
  26. 26.
    Lee, J.M.: Riemannian Manifolds: An Introduction to Curvature, vol. 176. Springer, New York (1997)Google Scholar
  27. 27.
    Lee, J.M.: Introduction to Smooth Manifolds, vol. 218. Springer, New York (2013)Google Scholar
  28. 28.
    Mallat, S.: Group invariant scattering. Commun. Pure Appl. Math. 65(10), 1331–1398 (2012)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Mallat, S.: Understanding deep convolutional networks. Phil. Trans. R. Soc. A 374(2065), 20150203 (2016)CrossRefGoogle Scholar
  30. 30.
    Mathworks. im2col. https://in.mathworks.com/help/images/ref/im2col.html. Accessed 10 Feb 2019
  31. 31.
    Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of CVPR, vol. 1, p. 3 (2017)Google Scholar
  32. 32.
    Munkres, J.R.: Topology. Prentice Hall (2000)Google Scholar
  33. 33.
    Nakahara, M.: Geometry, Topology and Physics. CRC Press (2003)Google Scholar
  34. 34.
    Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., Ganguli, S.: Exponential expressivity in deep neural networks through transient chaos. In: Advances in Neural Information Processing Systems, pp. 3360–3368 (2016)Google Scholar
  35. 35.
    Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Sohl-Dickstein, J.: Survey of expressivity in deep neural networks (2016). arXiv preprint arXiv:1611.08083
  36. 36.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  37. 37.
    Saxe, A.M., McClelland, J.L., Ganguli, S.:Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013). arXiv preprint arXiv:1312.6120
  38. 38.
    Schutz, B.: A First Course in General Relativity. Cambridge University Press (2009)Google Scholar
  39. 39.
    Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Neural Networks (IJCNN), pp. 2809–2813. IEEE (2011)Google Scholar
  40. 40.
    Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)CrossRefGoogle Scholar
  41. 41.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  42. 42.
    Topaz, C., Ziegelmeier, L., Halverson, T.: Topological data analysis of biological aggregation models. PloS one. 10. https://doi.org/10.1371/journal.pone.0126383CrossRefGoogle Scholar
  43. 43.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of Complexity, pp. 11–30. Springer, Cham (2015)CrossRefGoogle Scholar
  44. 44.
    Wiatowski, T., Bölcskei, H.: A mathematical theory of deep convolutional neural networks for feature extraction (2015). arXiv preprint arXiv:1512.06293
  45. 45.
    Zomorodian, A., Carlsson, G.: Computing persistent homology. Discret. Comput. Geom. 33(2), 249–274 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Electrical Engineering DepartmentIndian Institute of TechnologyDelhiIndia

Personalised recommendations