Simplified PAC-Bayesian Margin Bounds

McAllester, David

doi:10.1007/978-3-540-45167-9_16

David McAllester⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

5660 Accesses
25 Citations

Abstract

The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimension-independent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bartlett, P.: Personal communication (2003)
Google Scholar
Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar
Bartlett, P.L.: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the the size of the network. IEEE Transactions on Information Theory (March 1998)
Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)
Article MathSciNet Google Scholar
Catoni, O.: Gibbs estimators. To appear in Probability Theory and Related fields
Google Scholar
Herbrich, R., Graepel, T.: A PAC-Bayesian margin bound for linear clasifiers: Why svms work. In: Advances in Neural Information Processing Systems, NIPS (2001)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 13–30 (1963)
Article MATH MathSciNet Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics 30 (2002)
Google Scholar
Langford, J., Seeger, M., Megiddo, N.: An improved predictive accuracy bound for averaging classifiers. In: ICML 2001 (2001)
Google Scholar
Langford, J., Seger, M.: Bounds for averaging classifiers. CMU Technical Report CMU-CS-01-102 (2002)
Google Scholar
Langford, J., Shawe-Taylor, J.: PAC-Bayes and margins. In: Neural Information Processing Systems (NIPS) (2002)
Google Scholar
McAllester, D.: PAC-Bayesian stochastic model selection. Machine Learning 5, 5–21 (2003); A short version appeared as PAC-Bayesian Model Averaging. In: COLT 1999 (1999)
Google Scholar
McAllester, D., Ortiz, L.: Concentration inequalities for the missing mass and for histogram rule error. In: Neural Information Processing systems (NIPS) (2002)
Google Scholar
Ruben, H.: A new asymptotic expansion for the normal probability integral and mill’s ratio. Journal of the Royal Statistical Society. Series B (Methodological) 24(1), 177–179 (1962)
MATH MathSciNet Google Scholar
Schapire, R., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Machine Learning: Proceedings of the 14th International Conference (1997)
Google Scholar
Seeger, M.: PAC-Bayesian generalization bounds for gaussian processes. Journal of Machine Learning Research 3, 233–269 (2002)
Article MathSciNet Google Scholar
Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: A Framework for structural risk minimization. IEEE Transactions on Information Theory 44(5), 1926–1940 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Technological Institute at Chicago,
David McAllester

Authors

David McAllester
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

MPI for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Bernhard Schölkopf
University of California, Santa Cruz
Manfred K. Warmuth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McAllester, D. (2003). Simplified PAC-Bayesian Margin Bounds. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-45167-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40720-1
Online ISBN: 978-3-540-45167-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics