Advertisement

Forward Stability of ResNet and Its Variants

  • Linan ZhangEmail author
  • Hayden Schaeffer
Article
  • 11 Downloads

Abstract

The residual neural network (ResNet) is a popular deep network architecture which has the ability to obtain high-accuracy results on several image processing problems. In order to analyze the behavior and structure of ResNet, recent work has been on establishing connections between ResNets and continuous-time optimal control problems. In this work, we show that the post-activation ResNet is related to an optimal control problem with differential inclusions and provide continuous-time stability results for the differential inclusion associated with ResNet. Motivated by the stability conditions, we show that alterations of either the architecture or the optimization problem can generate variants of ResNet which improves the theoretical stability bounds. In addition, we establish stability bounds for the full (discrete) network associated with two variants of ResNet, in particular, bounds on the growth of the features and a measure of the sensitivity of the features with respect to perturbations. These results also help to show the relationship between the depth, regularization, and stability of the feature space. Computational experiments on the proposed variants show that the accuracy of ResNet is preserved and that the accuracy seems to be monotone with respect to the depth and various corruptions.

Keywords

Deep feedforward neural networks Residual neural networks Stability Differential inclusions Optimal control problems 

Notes

Acknowledgements

The authors acknowledge the support of AFOSR, FA9550-17-1-0125 and the support of NSF CAREER grant \(\#1752116\).

References

  1. 1.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends. Mach. Learn. 2(1), 1–127 (2009)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)Google Scholar
  3. 3.
    Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 387–402 (2013)Google Scholar
  4. 4.
    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Chang, B., Meng, L., Haber, E., Ruthotto, L., Begert, D., Holtham, E.: Reversible architectures for arbitrarily deep residual neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  6. 6.
    Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., Zecchina, R.: Entropy-SGD: biasing gradient descent into wide valleys. ArXiv e-prints (2016)Google Scholar
  7. 7.
    Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci. 5(3), 30 (2018)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Dragomir, S.S.: Some Gronwall Type Inequalities and Applications. Nova Science Publishers, New York (2003)zbMATHGoogle Scholar
  9. 9.
    Du, S.S., Zhai, X., Poczos, Barnabas, S., Aarti: gradient descent provably optimizes over-parameterized neural networks. ArXiv e-prints (2018)Google Scholar
  10. 10.
    Edmond, J.F., Thibault, L.: Relaxation of an optimal control problem involving a perturbed sweeping process. Math. Program. Ser. B 104, 347–373 (2005)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Goldstein, T., Studer, C., Baraniuk, R.: A field guide to forward-backward splitting with a FASTA implementation. ArXiv e-prints (2014)Google Scholar
  12. 12.
    Gomez, A. N., Ren, M., Urtasun, R., Grosse, R. B.: The reversible residual network: backpropagation without storing activations. In: Advances in Neural Information Processing Systems, pp. 2214–2224 (2017)Google Scholar
  13. 13.
    Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  14. 14.
    Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Probl. 34(1), 014004 (2017)MathSciNetzbMATHGoogle Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer, pp. 630–645 (2016)Google Scholar
  18. 18.
    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. ArXiv e-prints (2015)Google Scholar
  20. 20.
    Kamenskii, M., Makarenkov, O., Wadippuli, L.N., de Fitte, P.R.: Global stability of almost periodic solutions to monotone sweeping processes and their response to non-monotone perturbations. Nonlinear Anal. Hybrid Syst. 30, 213–224 (2018)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: International Conference on Learning Representations (2017)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    Larsson, G., Maire, M., Shakhnarovich, G.: FractalNet: Ultra-deep neural networks without residuals. ArXiv e-prints (2016)Google Scholar
  24. 24.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)Google Scholar
  25. 25.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)Google Scholar
  26. 26.
    Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Advances in Neural Information Processing Systems, pp. 6389–6399 (2018)Google Scholar
  27. 27.
    Li, Z., Shi, Z.: Deep residual learning and PDEs on manifold. arXiv preprint arXiv:1708.05115 (2017)Google Scholar
  28. 28.
    Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Mordukhovich, B.S., Shao, Y.: Nonsmooth sequential analysis in asplund spaces. Trans. Am. Math. Soc. 348, 1235–1280 (1996)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Oberman, A. M., Calder, J.: Lipschitz regularized deep neural networks converge and generalize. ArXiv e-prints (2018)Google Scholar
  31. 31.
    Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348(5), 1805–1838 (1996)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Russakovsky, O., Deng, J., Hao, S., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetGoogle Scholar
  33. 33.
    Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. ArXiv e-prints (2018)Google Scholar
  34. 34.
    Schaeffer, H.: A penalty method for some nonlinear variational obstacle problems. Commun. Math. Sci. 16(7), 1757–1777 (2018)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ArXiv e-prints (2014)Google Scholar
  36. 36.
    Singer, Y., Duchi, J.C.: Efficient learning using forward–backward splitting. In: Advances in Neural Information Processing Systems, vol. 22, Curran Associates, Inc., pp. 495–503 (2009)Google Scholar
  37. 37.
    Sussillo, D., Abbott, L.F.: Random walk initialization for training very deep feedforward networks. ArXiv e-prints (2014)Google Scholar
  38. 38.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  39. 39.
    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. ArXiv e-prints (2013)Google Scholar
  40. 40.
    Thorpe, M., van Gennip, Y.: Deep limits of residual neural networks. ArXiv e-prints (2018)Google Scholar
  41. 41.
    Tran, G., Schaeffer, H., Feldman, W.M., Osher, S.J.: An $l^1$ penalty method for general obstacle problems. SIAM J. Appl. Math. 75(4), 1424–1444 (2015)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Vidal, R., Bruna, J., Giryes, R., Soatto, S.: Mathematics of deep learning. ArXiv e-prints (2017)Google Scholar
  43. 43.
    Wang, B., Luo, X., Li, Z., Zhu, W., Shi, Z., Osher, S.: Deep neural nets with interpolating function as output activation. In: Advances in Neural Information Processing Systems, pp. 743–753 (2018)Google Scholar
  44. 44.
    Weinan, E., Han, J., Li, Q.: A mean-field optimal control formulation of deep learning. Res. Math. Sci. 6(1), 10 (2019)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematical SciencesCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations