Agnostic Learning Nonconvex Function Classes
We consider the sample complexity of agnostic learning with respect to squared loss. It is known that if the function class F used for learning is convex then one can obtain better sample complexity bounds than usual. It has been claimed that there is a lower bound that showed there was an essential gap in the rate. In this paper we show that the lower bound proof has a gap in it. Although we do not provide a definitive answer to its validity. More positively, we show one can obtain “fast” sample complexity bounds for nonconvex F for “most” target conditional expectations. The new bounds depend on the detailed geometry of F, in particular the distance in a certain sense of the target’s conditional expectation from the set of nonuniqueness points of the class F.
KeywordsSample Complexity Normed Linear Space Computational Learn Theory Random Construction Hide Layer Neural Network
Unable to display preview. Download preview PDF.
- 1.Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson, “Localized Rademacher averages”, in COLT2002 (these proceedings).Google Scholar
- 2.Shai Ben-David and Michael Lindenbaum, “Learning Distributions by their Density Levels — A Paradigm for Learning without a Teacher,” in Computational Learning Theory — EUROCOLT’95, pages 53–68 (1995).Google Scholar
- 4.Richard M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63, Cambridge University Press 1999.Google Scholar
- 7.Wee Sun Lee, Agnostic Learning and Single Hidden Layer Neural Networks, Ph.D. Thesis, Australian National University, 1996.Google Scholar
- 9.Wee Sun Lee, Peter L. Bartlett and Robert C. Williamson, “The Importance of Convexity in Learning with Squared Loss” IEEE Transactions on Information Theory 44(5), 1974–1980, 1998 (earlier version in Proceedings of the 9th Annual Conference on Computational Learning Theory, pages 140–146, 1996.)zbMATHCrossRefMathSciNetGoogle Scholar
- 10.Shahar Mendelson, “Improving the sample complexity using global data,” IEEE transactions on Information Theory, to appear. http://axiom.anu.edu.au/~shahar
- 12.Shahar Mendelson “A few remarks on Statistical Learning Theory”, preprint. http://axiom.anu.edu.au/~shahar