Skip to main content

Optimal Oracle Inequality for Aggregation of Classifiers Under Low Noise Condition

  • Conference paper
Learning Theory (COLT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

Abstract

We consider the problem of optimality, in a minimax sense, and adaptivity to the margin and to regularity in binary classification. We prove an oracle inequality, under the margin assumption (low noise condition), satisfied by an aggregation procedure which uses exponential weights. This oracle inequality has an optimal residual: (logM/n)κ/(2κ− 1) where κ is the margin parameter, M the number of classifiers to aggregate and n the number of observations. We use this inequality first to construct minimax classifiers under margin and regularity assumptions and second to aggregate them to obtain a classifier which is adaptive both to the margin and regularity. Moreover, by aggregating plug-in classifiers (only logn), we provide an easily implementable classifier adaptive both to the margin and to regularity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers under margin condition (2005) (Preprint PMA-998), available at: http://www.proba.jussieu.fr/mathdoc/preprints/index.html#2005

  2. Barron, A., Leung, G.: Information theory and mixing least-square regressions (manuscript, 2004)

    Google Scholar 

  3. Barron, A., Li, J.: Mixture density estimation. Biometrics 53, 603–618 (1997)

    Article  Google Scholar 

  4. Bartlett, P., Freund, Y., Lee, W.S., Schapire, R.E.: Boosting the margin: a new explanantion for the effectiveness of voting methods. Annals of Statistics 26, 1651–1686 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, Classification and Risk Bounds, Technical Report 638, Department of Statistics, U.C. Berkeley (2003), available at: http://stat-www.berkeley.edu/tech-reports/638.pdf

  6. Blanchard, G., Bousquet, O., Massart, P.: Statistical Performance of Support Vector Machines (2004), available at: http://mahery.math.u-psud.fr/~blanchard/publi/

  7. Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: A survey of some recent advances. ESAIM: Probability and statistics 9, 325–375 (2005)

    Article  MathSciNet  Google Scholar 

  8. Blanchard, G., Lugosi, G., Vayatis, N.: On the rate of convergence of regularized boosting classifiers. JMLR 4, 861–894 (2003)

    Article  MathSciNet  Google Scholar 

  9. Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Statist. 30(4), 927–961 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  11. Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  12. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)

    MATH  Google Scholar 

  13. Catoni, O.: Statistical Learning Theory and Stochastic Optimization, Ecole d’été de Probabilités de Saint-Flour, Lecture Notes in Mathematics. Springer, N.Y. (2001)

    Google Scholar 

  14. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30, 1–50 (2002)

    MathSciNet  MATH  Google Scholar 

  15. Koltchinskii, V.: Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist. (to appear, 2005)

    Google Scholar 

  16. Lugosi, G., Vayatis, N.: On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32(1), 30–55 (2004)

    MathSciNet  MATH  Google Scholar 

  17. Lecué, G.: Simultaneous adaptation to the margin and to complexity in classification (2005), available at: http://hal.ccsd.cnrs.fr/ccsd-00009241/en/

  18. Lecué, G.: Optimal rates of aggregation in classification (2006), available at: https://hal.ccsd.cnrs.fr/ccsd-00021233

  19. Massart, P.: Some applications of concentration inequalities to Statistics. Probability Theory. Annales de la Faculté des Sciences de Toulouse 2, 245–303 (2000), volume spécial dédié à Michel Talagrand

    Google Scholar 

  20. Massart, P.: Concentration inequalities and Model Selection. Lectures notes of Saint Flour (2004)

    Google Scholar 

  21. Schölkopf, B., Smola, A.: Learning with kernels. MIT press, Cambridge University (2002)

    Google Scholar 

  22. Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines using Gaussian Kernels (2004), Los Alamos National Laboratory Technical Report LA-UR 04-8796 (submitted to Annals of Statistics)

    Google Scholar 

  23. Steinwart, I., Scovel, C.: Fast Rates for Support Vector Machines. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 279–294. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  24. Tsybakov, A.B.: Optimal rates of aggregation. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 303–313. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  25. Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1), 135–166 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tsybakov, A.B.: Introduction à l’estimation non-paramétrique. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  27. Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383 (1990)

    Google Scholar 

  28. Yang, Y.: Mixing strategies for density estimation. Ann. Statist. 28(1), 75–87 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  29. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32(1), 56–85 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lecué, G. (2006). Optimal Oracle Inequality for Aggregation of Classifiers Under Low Noise Condition. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_28

Download citation

  • DOI: https://doi.org/10.1007/11776420_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35294-5

  • Online ISBN: 978-3-540-35296-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics