Abstract
Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this chapter, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(\mathcal {\tilde{O}}\) is the upper bound to the complexity up to a logarithmic factor of the same term.
References
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016)
B.D. Haeffele, R. Vidal, Global optimality in neural network training, in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4390–4398
R. Vidal, J. Bruna, R. Giryes, S. Soatto, Mathematics of deep learning, in Proceedings of the Conference on Decision and Control (CDC) (2017)
V.N. Vapnik, A. Chervonenkis, The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recogn. Image Anal. 1(3), 260–284 (1991)
P.L. Bartlett, S. Mendelson, Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(3), 463–482 (2002)
C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, in ICLR (2017)
D.A. McAllester, PAC-Bayesian model averaging, in Proceedings of the twelfth annual Conference on Computational Learning Theory (ACM, 1999), pp. 164–170
D.A. McAllester, Some PAC-Bayesian theorems. Mach. Learn. 37(3), 355–363 (1999)
D. McAllester, Simplified PAC-Bayesian margin bounds, in Learning Theory and Kernel Machines, ed. by B. Schlkopf, M.K. Warmuth (Springer, Berlin, 2003), pp. 203–215
O. Bousquet, A. Elisseef, Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)
H. Xu, S. Mannor, Robustness and generalization. Mach. Learn. 86(3), 391–423 (2012)
K.P. Murphy, Machine Learning: A Probabilistic Perspective, 1st edn. (MIT Press, 2013)
S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, New York, 2014)
N.S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P.T.P. Tang, On large-batch training for deep learning: generalization gap and sharp minima, in ICLR (2017)
S. Arora, R. Ge, B. Neyshabur, Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 254–263
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML’15, vol. 37 (2015). www.jmlr.org, pp. 448–456
N. Golowich, A. Rakhlin, O. Shamir, Size-independent sample complexity of neural networks, in Bubeck, S., Perchet, V., Rigollet, P., (eds.) Proceedings of the 31st Conference on Learning Theory of Proceedings of Machine Learning Research, vol. 75, PMLR, 06–09 July 2018, pp. 297–299
B. Neyshabur, S. Bhojanapalli, D. McAllester, N. Srebro, Exploring generalization in deep learning, in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA (2017), pp. 5949–5958
N. Harvey, C. Liaw, A. Mehrabian, Nearly-tight VC-dimension bounds for piecewise linear neural networks, in Proceedings of the 2017 Conference on Learning Theory, vol. 65 of Proceedings of Machine Learning Research, ed. by S. Kale, O. Shamir, Amsterdam, Netherlands, PMLR, 07–10 July 2017, pp. 1064–1068
P.L. Bartlett, V. Maiorov, R. Meir, Almost linear VC-dimension bounds for piecewise polynomial networks. Neural Comput. 10(8), 2159–2173 (1998)
M. Anthony, P.L. Bartlett, Neural Network Learning: Theoretical Foundations (Cambridge University Press, New York, 2009)
B. Neyshabur, R. Tomioka, N. Srebro, Norm-based capacity control in neural networks, in Proceedings of The 28th Conference on Learning Theory of Proceedings of Machine Learning Research, vol. 40, ed. by P. Grnwald, E. Hazan, S. Kale, Paris, France, PMLR, 03–06 July 2015, pp. 1376–1401
B. Neyshabur, Z. Li, S. Bhojanapalli, Y. LeCun, N. Srebro, Towards understanding the role of over-parametrization in generalization of neural networks (2018). arXiv:1805.12076
A.R. Barron, J.M. Klusowski, Approximation and estimation for high-dimensional deep learning networks (2018). arXiv:1809.03090
B. Neyshabur, S. Bhojanapalli, N. Srebro, A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, in ICLR (2018)
P.L. Bartlett, D.J. Foster, M.J. Telgarsky, Spectrally-normalized margin bounds for neural networks, in Advances in Neural Information Processing Systems 30, ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates, Inc., 2017), pp. 6240–6249
Dziugaite, G.K., Roy, D.M.: Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, in Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2016, pp. 11–15 (NSW, Sydney, 2017)
L. Dinh, R. Pascanu, S. Bengio, Y. Bengio, Sharp minima can generalize for deep nets, in Proceedings of the 34th International Conference on Machine Learning of Proceedings of Machine Learning Research, vol. 70, International Convention Centre, Sydney, Australia, PMLR, 06–11 August 2017, pp. 1019–1028
M. Hardt, B. Recht, Y. Singer, Train faster, generalize better: stability of stochastic gradient descent, in ICML (2016)
A.W. van der Vaart, J.A. Wellner, Weak convergence and empirical processes: with applications to statistics. Springer Series in Statistics. (Springer, 1996)
J. Sokolic, R. Giryes, G. Sapiro, M.R.D. Rodrigues, Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
W. Zhou, V. Veitch, M. Austern, R.P. Adams, P. Orbanz, Compressibility and generalization in large-scale deep learning (2018). arXiv:1804.05862
D. Soudry, E. Hoffer, M.S. Nacson, N. Srebro, The implicit bias of gradient descent on separable data, in ICLR (2018)
A. Brutzkus, A. Globerson, E. Malach, S. Shalev-Shwartz, Sgd learns over-parameterized networks that provably generalize on linearly separable data, in ICLR (2018)
H. Zhang, J. Shao, R. Salakhutdinov, Deep neural networks with multi-branch architectures are less non-convex (2018). arXiv:1806.01845
T.A. Poggio, K. Kawaguchi, Q. Liao, B. Miranda, L. Rosasco, X. Boix, J. Hidary, H. Mhaskar, Theory of deep learning iii: explaining the non-overfitting puzzle (2017). CoRR arXiv:1801.00173
E. Hoffer, I. Hubara, D. Soudry, Train longer, generalize better: closing the generalization gap in large batch training of neural networks, in Advances in Neural Information Processing Systems 30, ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Curran Associates, Inc., 2017), pp. 1731–1741
J. Sokolic, R. Giryes, G. Sapiro, M.R.D. Rodrigues, Generalization error of invariant classifiers, in Artificial Intelligence and Statistics (2017), pp. 1094–1103
J. Bruna, S. Mallat, Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, in International Conference on Learning Representations (2014)
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in ICLR (2015)
R. Novak, Y. Bahri, D.A. Abolafia, J. Pennington, J. Sohl-Dickstein, Sensitivity and generalization in neural networks: an empirical study, in ICLR (2018)
D. Jakubovitz, R. Giryes, Improving DNN robustness to adversarial attacks using Jacobian regularization, in The European Conference on Computer Vision (ECCV) (2018)
L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, A. Madry, Adversarially robust generalization requires more data, in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada. (2018) 5019–5031
N. Akhtar, A.S. Mian, Threat of adversarial attacks on deep learning in computer vision: a survey. IEEE Access 6, 14410–14430 (2018)
T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, Q. Liao, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 1–17 (2017)
G. Cohen, G. Sapiro, R. Giryes, Dnn or k-nn: That is the generalize vs. memorize question (2018). arXiv:1805.06822
D. Vainsencher, S. Mannor, A.M. Bruckstein, The sample complexity of dictionary learning. J. Mach. Learn. Res. 12, 3259–3281 (2011)
A. Jung, Y.C. Eldar, N. Grtz, Performance limits of dictionary learning for sparse coding. In: European Signal Processing Conference (EUSIPCO). (Sept 2014) 765–769
R. Gribonval, R. Jenatton, F. Bach, Sparse and spurious: dictionary learning with noise and outliers. IEEE Trans. Inf. Theory 61(11), 6298–6319 (2015)
R. Gribonval, R. Jenatton, F. Bach, M. Kleinsteuber, M. Seibert, Sample complexity of dictionary learning and other matrix factorizations. IEEE Trans. Inf. Theory 61(6), 3469–3486 (2015)
K. Schnass, Convergence radius and sample complexity of itkm algorithms for dictionary learning. Appl. Comput. Harmon. Anal. 45(1), 22–58 (2018)
S. Singh, B. Pczos, J. Ma, On the reconstruction risk of convolutional sparse dictionary learning. In: AISTATS (2018)
V. Papyan, Y. Romano, M. Elad, Convolutional neural networks analyzed via convolutional sparse coding. J. Mach. Learn. Res. (JMLR) 18(83), 1–52 (2017)
A. Gepperth, B. Hammer, Incremental learning algorithms and applications. In: European Symposium on Artificial Neural Networks (ESANN) (2016)
L. Torrey, J. Shavlik, Transfer Learning, in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, ed. by E. Soria, J. Martin, R. Magdalena, M. Martinez, A. Serrano. IGI Global (2009)
S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
F. Tramer, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel, The space of transferable adversarial examples (2017). arXiv:1704.03453
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems 27, ed. by Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (Curran Associates, Inc., 2014), pp. 2672–2680
S. Arora, R. Ge, Y. Liang, T. Ma, Y. Zhang, Generalization and equilibrium in generative adversarial nets (gans). In: ICML (2017)
D.P. Kingma, M. Welling, Auto-encoding variational bayes. In: ICLR (2014)
N. Tishby, N. Zaslavsky, Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW) (April 2015), pp. 1–5
R. Schwartz-Ziv, N. Tishby, Opening the black box of deep neural networks via information (2017). arXiv:1703.00810
T.M. Cover, J.A. Thomas, Elements of Information Theory (Wiley, New York, NY, USA, 2006)
M. Vera, L.R. Vega, P. Piantanida, Compression-based regularization with an application to multi-task learning. IEEE J. Sel. Top. Signal Process. 1–1 (2018)
P. Piantanida, L. Rey Vega, Information bottleneck and representation learning. In: Information-Theoretic Methods in Data Science (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jakubovitz, D., Giryes, R., Rodrigues, M.R.D. (2019). Generalization Error in Deep Learning. In: Boche, H., Caire, G., Calderbank, R., Kutyniok, G., Mathar, R., Petersen, P. (eds) Compressed Sensing and Its Applications. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-73074-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-73074-5_5
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-73073-8
Online ISBN: 978-3-319-73074-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)