Generalization Error in Deep Learning

Jakubovitz, Daniel; Giryes, Raja; Rodrigues, Miguel R. D.

doi:10.1007/978-3-319-73074-5_5

Daniel Jakubovitz²¹,
Raja Giryes²¹ &
Miguel R. D. Rodrigues²²

Part of the book series: Applied and Numerical Harmonic Analysis ((ANHA))

2821 Accesses
31 Citations
1 Altmetric

Abstract

Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this chapter, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(\mathcal {\tilde{O}}\) is the upper bound to the complexity up to a logarithmic factor of the same term.

References

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016)
Google Scholar
B.D. Haeffele, R. Vidal, Global optimality in neural network training, in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4390–4398
Google Scholar
R. Vidal, J. Bruna, R. Giryes, S. Soatto, Mathematics of deep learning, in Proceedings of the Conference on Decision and Control (CDC) (2017)
Google Scholar
V.N. Vapnik, A. Chervonenkis, The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recogn. Image Anal. 1(3), 260–284 (1991)
Google Scholar
P.L. Bartlett, S. Mendelson, Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(3), 463–482 (2002)
MathSciNet MATH Google Scholar
C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, in ICLR (2017)
Google Scholar
D.A. McAllester, PAC-Bayesian model averaging, in Proceedings of the twelfth annual Conference on Computational Learning Theory (ACM, 1999), pp. 164–170
Google Scholar
D.A. McAllester, Some PAC-Bayesian theorems. Mach. Learn. 37(3), 355–363 (1999)
Article Google Scholar
D. McAllester, Simplified PAC-Bayesian margin bounds, in Learning Theory and Kernel Machines, ed. by B. Schlkopf, M.K. Warmuth (Springer, Berlin, 2003), pp. 203–215
Chapter Google Scholar
O. Bousquet, A. Elisseef, Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)
Google Scholar
H. Xu, S. Mannor, Robustness and generalization. Mach. Learn. 86(3), 391–423 (2012)
Google Scholar
K.P. Murphy, Machine Learning: A Probabilistic Perspective, 1st edn. (MIT Press, 2013)
Google Scholar
S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, New York, 2014)
Book Google Scholar
N.S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P.T.P. Tang, On large-batch training for deep learning: generalization gap and sharp minima, in ICLR (2017)
Google Scholar
S. Arora, R. Ge, B. Neyshabur, Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 254–263
Google Scholar
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML’15, vol. 37 (2015). www.jmlr.org, pp. 448–456
N. Golowich, A. Rakhlin, O. Shamir, Size-independent sample complexity of neural networks, in Bubeck, S., Perchet, V., Rigollet, P., (eds.) Proceedings of the 31st Conference on Learning Theory of Proceedings of Machine Learning Research, vol. 75, PMLR, 06–09 July 2018, pp. 297–299
Google Scholar
B. Neyshabur, S. Bhojanapalli, D. McAllester, N. Srebro, Exploring generalization in deep learning, in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA (2017), pp. 5949–5958
Google Scholar
N. Harvey, C. Liaw, A. Mehrabian, Nearly-tight VC-dimension bounds for piecewise linear neural networks, in Proceedings of the 2017 Conference on Learning Theory, vol. 65 of Proceedings of Machine Learning Research, ed. by S. Kale, O. Shamir, Amsterdam, Netherlands, PMLR, 07–10 July 2017, pp. 1064–1068
Google Scholar
P.L. Bartlett, V. Maiorov, R. Meir, Almost linear VC-dimension bounds for piecewise polynomial networks. Neural Comput. 10(8), 2159–2173 (1998)
Article Google Scholar
M. Anthony, P.L. Bartlett, Neural Network Learning: Theoretical Foundations (Cambridge University Press, New York, 2009)
MATH Google Scholar
B. Neyshabur, R. Tomioka, N. Srebro, Norm-based capacity control in neural networks, in Proceedings of The 28th Conference on Learning Theory of Proceedings of Machine Learning Research, vol. 40, ed. by P. Grnwald, E. Hazan, S. Kale, Paris, France, PMLR, 03–06 July 2015, pp. 1376–1401
Google Scholar
B. Neyshabur, Z. Li, S. Bhojanapalli, Y. LeCun, N. Srebro, Towards understanding the role of over-parametrization in generalization of neural networks (2018). arXiv:1805.12076
A.R. Barron, J.M. Klusowski, Approximation and estimation for high-dimensional deep learning networks (2018). arXiv:1809.03090
B. Neyshabur, S. Bhojanapalli, N. Srebro, A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, in ICLR (2018)
Google Scholar
P.L. Bartlett, D.J. Foster, M.J. Telgarsky, Spectrally-normalized margin bounds for neural networks, in Advances in Neural Information Processing Systems 30, ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates, Inc., 2017), pp. 6240–6249
Google Scholar
Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, in Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp. 11–15 (NSW, Sydney, 2017)
Google Scholar
L. Dinh, R. Pascanu, S. Bengio, Y. Bengio, Sharp minima can generalize for deep nets, in Proceedings of the 34th International Conference on Machine Learning of Proceedings of Machine Learning Research, vol. 70, International Convention Centre, Sydney, Australia, PMLR, 06–11 August 2017, pp. 1019–1028
Google Scholar
M. Hardt, B. Recht, Y. Singer, Train faster, generalize better: stability of stochastic gradient descent, in ICML (2016)
Google Scholar
A.W. van der Vaart, J.A. Wellner, Weak convergence and empirical processes: with applications to statistics. Springer Series in Statistics. (Springer, 1996)
Google Scholar
J. Sokolic, R. Giryes, G. Sapiro, M.R.D. Rodrigues, Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
Article MathSciNet Google Scholar
W. Zhou, V. Veitch, M. Austern, R.P. Adams, P. Orbanz, Compressibility and generalization in large-scale deep learning (2018). arXiv:1804.05862
D. Soudry, E. Hoffer, M.S. Nacson, N. Srebro, The implicit bias of gradient descent on separable data, in ICLR (2018)
Google Scholar
A. Brutzkus, A. Globerson, E. Malach, S. Shalev-Shwartz, Sgd learns over-parameterized networks that provably generalize on linearly separable data, in ICLR (2018)
Google Scholar
H. Zhang, J. Shao, R. Salakhutdinov, Deep neural networks with multi-branch architectures are less non-convex (2018). arXiv:1806.01845
T.A. Poggio, K. Kawaguchi, Q. Liao, B. Miranda, L. Rosasco, X. Boix, J. Hidary, H. Mhaskar, Theory of deep learning iii: explaining the non-overfitting puzzle (2017). CoRR arXiv:1801.00173
E. Hoffer, I. Hubara, D. Soudry, Train longer, generalize better: closing the generalization gap in large batch training of neural networks, in Advances in Neural Information Processing Systems 30, ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates, Inc., 2017), pp. 1731–1741
Google Scholar
J. Sokolic, R. Giryes, G. Sapiro, M.R.D. Rodrigues, Generalization error of invariant classifiers, in Artificial Intelligence and Statistics (2017), pp. 1094–1103
Google Scholar
J. Bruna, S. Mallat, Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
Article Google Scholar
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, in International Conference on Learning Representations (2014)
Google Scholar
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in ICLR (2015)
Google Scholar
R. Novak, Y. Bahri, D.A. Abolafia, J. Pennington, J. Sohl-Dickstein, Sensitivity and generalization in neural networks: an empirical study, in ICLR (2018)
Google Scholar
D. Jakubovitz, R. Giryes, Improving DNN robustness to adversarial attacks using Jacobian regularization, in The European Conference on Computer Vision (ECCV) (2018)
Google Scholar
L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, A. Madry, Adversarially robust generalization requires more data, in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada. (2018) 5019–5031
Google Scholar
N. Akhtar, A.S. Mian, Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)
Article Google Scholar
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. Liao, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 1–17 (2017)
Google Scholar
G. Cohen, G. Sapiro, R. Giryes, Dnn or k-nn: That is the generalize vs. memorize question (2018). arXiv:1805.06822
D. Vainsencher, S. Mannor, A.M. Bruckstein, The sample complexity of dictionary learning. J. Mach. Learn. Res. 12, 3259–3281 (2011)
MathSciNet MATH Google Scholar
A. Jung, Y.C. Eldar, N. Grtz, Performance limits of dictionary learning for sparse coding. In: European Signal Processing Conference (EUSIPCO). (Sept 2014) 765–769
Google Scholar
R. Gribonval, R. Jenatton, F. Bach, Sparse and spurious: dictionary learning with noise and outliers. IEEE Trans. Inf. Theory 61(11), 6298–6319 (2015)
Article MathSciNet Google Scholar
R. Gribonval, R. Jenatton, F. Bach, M. Kleinsteuber, M. Seibert, Sample complexity of dictionary learning and other matrix factorizations. IEEE Trans. Inf. Theory 61(6), 3469–3486 (2015)
Article MathSciNet Google Scholar
K. Schnass, Convergence radius and sample complexity of itkm algorithms for dictionary learning. Appl. Comput. Harmon. Anal. 45(1), 22–58 (2018)
Article MathSciNet Google Scholar
S. Singh, B. Pczos, J. Ma, On the reconstruction risk of convolutional sparse dictionary learning. In: AISTATS (2018)
Google Scholar
V. Papyan, Y. Romano, M. Elad, Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. (JMLR) 18(83), 1–52 (2017)
MathSciNet MATH Google Scholar
A. Gepperth, B. Hammer, Incremental learning algorithms and applications. In: European Symposium on Artificial Neural Networks (ESANN) (2016)
Google Scholar
L. Torrey, J. Shavlik, Transfer Learning, in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, ed. by E. Soria, J. Martin, R. Magdalena, M. Martinez, A. Serrano. IGI Global (2009)
Google Scholar
S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
F. Tramer, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, The space of transferable adversarial examples (2017). arXiv:1704.03453
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems 27, ed. by Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (Curran Associates, Inc., 2014), pp. 2672–2680
Google Scholar
S. Arora, R. Ge, Y. Liang, T. Ma, Y. Zhang, Generalization and equilibrium in generative adversarial nets (gans). In: ICML (2017)
Google Scholar
D.P. Kingma, M. Welling, Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW) (April 2015), pp. 1–5
Google Scholar
R. Schwartz-Ziv, N. Tishby, Opening the black box of deep neural networks via information (2017). arXiv:1703.00810
T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, New York, NY, USA, 2006)
MATH Google Scholar
M. Vera, L.R. Vega, P. Piantanida, Compression-based regularization with an application to multi-task learning. IEEE J. Sel. Top. Signal Process. 1–1 (2018)
Google Scholar
P. Piantanida, L. Rey Vega, Information bottleneck and representation learning. In: Information-Theoretic Methods in Data Science (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, Tel Aviv University, Tel Aviv-Yafo, Israel
Daniel Jakubovitz & Raja Giryes
Department of Electronics and Electrical Engineering, University College London, London, UK
Miguel R. D. Rodrigues

Authors

Daniel Jakubovitz
View author publications
You can also search for this author in PubMed Google Scholar
Raja Giryes
View author publications
You can also search for this author in PubMed Google Scholar
Miguel R. D. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel R. D. Rodrigues .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Munich Center for Quantum Science and Technology (MCQST), Technical University of Munich, Munich, Germany
Holger Boche
Institute of Telecommunications Systems, Technical University of Berlin, Berlin, Germany
Giuseppe Caire
Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
Robert Calderbank
Department of Mathematics, Technical University of Berlin, Berlin, Germany
Gitta Kutyniok
Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Nordrhein-Westfalen, Germany
Rudolf Mathar
Mathematical Institute, University of Oxford, Oxford, UK
Philipp Petersen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jakubovitz, D., Giryes, R., Rodrigues, M.R.D. (2019). Generalization Error in Deep Learning. In: Boche, H., Caire, G., Calderbank, R., Kutyniok, G., Mathar, R., Petersen, P. (eds) Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-73074-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-73074-5_5
Published: 14 August 2019
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-73073-8
Online ISBN: 978-3-319-73074-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics