Skip to main content

Simplified PAC-Bayesian Margin Bounds

  • Conference paper
Learning Theory and Kernel Machines

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimension-independent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bartlett, P.: Personal communication (2003)

    Google Scholar 

  2. Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  3. Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the the size of the network. IEEE Transactions on Information Theory (March 1998)

    Google Scholar 

  4. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)

    Article  MathSciNet  Google Scholar 

  5. Catoni, O.: Gibbs estimators. To appear in Probability Theory and Related fields

    Google Scholar 

  6. Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear clasifiers: Why svms work. In: Advances in Neural Information Processing Systems, NIPS (2001)

    Google Scholar 

  7. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  8. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)

    Google Scholar 

  9. Langford, J., Seeger, M., Megiddo, N.: An improved predictive accuracy bound for averaging classifiers. In: ICML 2001 (2001)

    Google Scholar 

  10. Langford, J., Seger, M.: Bounds for averaging classifiers. CMU Technical Report CMU-CS-01-102 (2002)

    Google Scholar 

  11. Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Neural Information Processing Systems (NIPS) (2002)

    Google Scholar 

  12. McAllester, D.: PAC-Bayesian stochastic model selection. Machine Learning 5, 5–21 (2003); A short version appeared as PAC-Bayesian Model Averaging. In: COLT 1999 (1999)

    Google Scholar 

  13. McAllester, D., Ortiz, L.: Concentration inequalities for the missing mass and for histogram rule error. In: Neural Information Processing systems (NIPS) (2002)

    Google Scholar 

  14. Ruben, H.: A new asymptotic expansion for the normal probability integral and mill’s ratio. Journal of the Royal Statistical Society. Series B (Methodological) 24(1), 177–179 (1962)

    MATH  MathSciNet  Google Scholar 

  15. Schapire, R., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Machine Learning: Proceedings of the 14th International Conference (1997)

    Google Scholar 

  16. Seeger, M.: PAC-Bayesian generalization bounds for gaussian processes. Journal of Machine Learning Research 3, 233–269 (2002)

    Article  MathSciNet  Google Scholar 

  17. Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: A Framework for structural risk minimization. IEEE Transactions on Information Theory 44(5), 1926–1940 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McAllester, D. (2003). Simplified PAC-Bayesian Margin Bounds. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics