Oracle Inequalities for Local and Global Empirical Risk Minimizers

  • Andreas ElsenerEmail author
  • Sara van de Geer
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)


The aim of this chapter is to provide an overview of general frameworks used to derive (sharp) oracle inequalities. Two extensions of a general theory for convex norm penalized empirical risk minimizers are summarized. The first one is for convex nondifferentiable loss functions. The second is for nonconvex differentiable loss functions. Theoretical understanding is required for the growing number of algorithms in statistics, machine learning, and, more recently, deep learning that are based on (combinations of) these types of loss functions. To motivate the importance of oracle inequalities, the problem of model misspecification in the linear model is first discussed. Then, the sharp oracle inequalities are stated. Finally, we show how to apply the general theory to problems from regression, classification, and dimension reduction.


  1. 1.
    F. Bach, R. Jenatton, J. Mairal, G. Obozinski, Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4(1), 1–106 (2012)Google Scholar
  2. 2.
    F.R. Bach, Structured sparsity-inducing norms through submodular functions, in Advances in Neural Information Processing Systems, pp. 118–126 (2010)Google Scholar
  3. 3.
    D.P. Bertsekas, Nonlinear Programming (Athena Scientific, Belmont, 1999)Google Scholar
  4. 4.
    P.J. Bickel, Y. Ritov, A.B. Tsybakov, Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37(4), 1705–1732 (2009)MathSciNetCrossRefGoogle Scholar
  5. 5.
    M. Bogdan, E. van den Berg, W. Su, E. Candès, Statistical estimation and testing via the sorted l1 norm. arXiv preprint arXiv:1310.1969 (2013)
  6. 6.
    P. Bühlmann, S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer Science & Business Media, 2011)Google Scholar
  7. 7.
    F. Bunea, A. Tsybakov, M. Wegkamp, Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1, 169–194 (2007)MathSciNetCrossRefGoogle Scholar
  8. 8.
    E. Candès, T. Tao, The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35(6), 2313–2351 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    A. Elsener, S. van de Geer, Robust low-rank matrix estimation. Ann. Statist. 46(6B), 3481–3509 (2018)MathSciNetCrossRefGoogle Scholar
  10. 10.
    A. Elsener, S. van de Geer, Sharp oracle inequalities for stationary points of nonconvex penalized M-estimators. IEEE Trans. Inform. Theory 65(3), 1452–1472 (2019)MathSciNetCrossRefGoogle Scholar
  11. 11.
    I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016).
  12. 12.
    J. Janková, S. van de Geer, De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices. arXiv preprint arXiv:1801.10567 (2018)
  13. 13.
    R. Jenatton, J.-Y. Audibert, F. Bach, Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res. 12, 2777–2824 (2011)Google Scholar
  14. 14.
    I.M. Johnstone, A.Yu. Lu, On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104(486), 682–693 (2009)MathSciNetCrossRefGoogle Scholar
  15. 15.
    V. Koltchinskii, Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École dÉté de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Mathematics. Springer (2011)Google Scholar
  16. 16.
    V. Koltchinskii, K. Lounici, A.B. Tsybakov, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39(5), 2302–2329 (2011)MathSciNetCrossRefGoogle Scholar
  17. 17.
    M. Ledoux, M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes vol. 23. (Springer Science & Business Media, 1991)Google Scholar
  18. 18.
    P.-L. Loh, Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Statist. 45(2), 866–896 (2017)MathSciNetCrossRefGoogle Scholar
  19. 19.
    P.-L. Loh, M.J. Wainwright, High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40(3), 1637–1664 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    P.-L. Loh, M.J. Wainwright, Regularized M-estimators with Nonconvexity. J. Mach. Learn. Res. 16, 559–616 (2015)MathSciNetzbMATHGoogle Scholar
  21. 21.
    P.-L. Loh, M.J. Wainwright, Support recovery without incoherence: a case for nonconvex regularization. Ann. Statist. 45(6), 2455–2482 (2017)MathSciNetCrossRefGoogle Scholar
  22. 22.
    A. Maurer, M. Pontil, Structured sparsity and generalization. J. Mach. Learn. Res. 13, 671–690 (2012)Google Scholar
  23. 23.
    Yu.S. Mei, Bai, A. Montanari, The landscape of empirical risk for nonconvex losses. Ann. Statist. 46(6A), 2747–2774 (2018)MathSciNetCrossRefGoogle Scholar
  24. 24.
    J. Morales, C.A. Micchelli, M. Pontil, A family of penalty functions for structured sparsity. Adv. Neural Inf. Process. Syst. 23, 1612–1623 (2010)Google Scholar
  25. 25.
    P. Rigollet, A. Tsybakov, Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39(2), 731–771 (2011)MathSciNetCrossRefGoogle Scholar
  26. 26.
    P. Rigollet, High-dimensional statisticsGoogle Scholar
  27. 27.
    B. Stucky, S. van de Geer, Sharp oracle inequalities for square root regularization. J. Mach. Learn. Res. 18(67), 1–29 (2017)MathSciNetzbMATHGoogle Scholar
  28. 28.
    R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  29. 29.
    S. van de Geer, Weakly decomposable regularization penalties and structured sparsity. Scand. J. Stat. 41(1), 72–86 (2014)MathSciNetCrossRefGoogle Scholar
  30. 30.
    S. van de Geer, Estimation and Testing under Sparsity: École dÉté de Probabilités de Saint-Flour XLV-2015 (Lecture Notes in Mathematics. Springer, 2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ETH Zürich, Seminar für StatistikZürichSwitzerland

Personalised recommendations