Advertisement

Agnostic Learning Nonconvex Function Classes

  • Shahar Mendelson
  • Robert C. Williamson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2375)

Abstract

We consider the sample complexity of agnostic learning with respect to squared loss. It is known that if the function class F used for learning is convex then one can obtain better sample complexity bounds than usual. It has been claimed that there is a lower bound that showed there was an essential gap in the rate. In this paper we show that the lower bound proof has a gap in it. Although we do not provide a definitive answer to its validity. More positively, we show one can obtain “fast” sample complexity bounds for nonconvex F for “most” target conditional expectations. The new bounds depend on the detailed geometry of F, in particular the distance in a certain sense of the target’s conditional expectation from the set of nonuniqueness points of the class F.

Keywords

Sample Complexity Normed Linear Space Computational Learn Theory Random Construction Hide Layer Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, “Localized Rademacher averages”, in COLT2002 (these proceedings).Google Scholar
  2. 2.
    Shai Ben-David and Michael Lindenbaum, “Learning Distributions by their Density Levels — A Paradigm for Learning without a Teacher,” in Computational Learning Theory — EUROCOLT’95, pages 53–68 (1995).Google Scholar
  3. 3.
    Dietrich Braess, Nonlinear Approximation Theory, Springer-Verlag, Berlin, 1986.zbMATHGoogle Scholar
  4. 4.
    Richard M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63, Cambridge University Press 1999.Google Scholar
  5. 5.
    David Haussler, “Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications,” Information and Computation, 100, 78–150 (1992).zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Michael J. Kearns, Robert E. Schapire and Linda M. Sellie, “Toward Efficient Agnostic Learning,” pages 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM press, New York, 1992.CrossRefGoogle Scholar
  7. 7.
    Wee Sun Lee, Agnostic Learning and Single Hidden Layer Neural Networks, Ph.D. Thesis, Australian National University, 1996.Google Scholar
  8. 8.
    Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “Efficient Agnostic Learning of Neural Networks with Bounded Fan-in,” IEEE Trans. on Information Theory, 42(6), 2118–2132 (1996).zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “The Importance of Convexity in Learning with Squared Loss” IEEE Transactions on Information Theory 44(5), 1974–1980, 1998 (earlier version in Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 140–146, 1996.)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Shahar Mendelson, “Improving the sample complexity using global data,” IEEE transactions on Information Theory, to appear. http://axiom.anu.edu.au/~shahar
  11. 11.
    Shahar Mendelson “Rademacher averages and phase transitions in Glivenko-Cantelli classes” IEEE transactions on Information Theory, 48(1), 251–263, (2002).zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Shahar Mendelson “A few remarks on Statistical Learning Theory”, preprint. http://axiom.anu.edu.au/~shahar
  13. 13.
    S. B. Stechkin, “Approximation Properties of Sets in Normed Linear Spaces,” Revue de mathematiques pures et appliquees, 8, 5–18, (1963) [in Russian].zbMATHGoogle Scholar
  14. 14.
    M. Talagrand, “Sharper bounds for Gaussian and empirical processes”, Annals of Probability, 22(1), 28–76, (1994).zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Aad W. van der Vaart and Jon A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.zbMATHGoogle Scholar
  16. 16.
    Frederick A. Valentine, Convex Sets, McGraw-Hill, San Francisco, 1964.zbMATHGoogle Scholar
  17. 17.
    L. P. Vlasov, “Approximative Properties of Sets in Normed Linear Spaces,” Russian Mathematical Surveys, 28(6), 1–66, (1973).zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Shahar Mendelson
    • 1
  • Robert C. Williamson
    • 1
  1. 1.Research School of Information Sciences and EngineeringAustralian National UniversityCanberraAustralia

Personalised recommendations