A Stochastic Gradient Descent Algorithm for Structural Risk Minimisation

Ratsaby, Joel

doi:10.1007/978-3-540-39624-6_17

Joel Ratsaby⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2842))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

366 Accesses
1 Citations

Abstract

Structural risk minimisation (SRM) is a general complexity regularization method which automatically selects the model complexity that approximately minimises the misclassification error probability of the empirical risk minimiser. It does so by adding a complexity penalty term ε(m,k) to the empirical risk of the candidate hypotheses and then for any fixed sample size m it minimises the sum with respect to the model complexity variable k.

When learning multicategory classification there are M subsamples m _i, corresponding to the M pattern classes with a priori probabilities p _i, 1 ≤ i ≤ M. Using the usual representation for a multi-category classifier as M individual boolean classifiers, the penalty becomes \(\Sigma_{i=1}^{M}p_{i}\epsilon(m_{i},k_{i})\). If the m _i are given then the standard SRM trivially applies here by minimizing the penalised empirical risk with respect to k _i, 1...,M.

However, in situations where the total sample size \(\Sigma_{i=1}^{M}m_{i}\) needs to be minimal one needs to also minimize the penalised empirical risk with respect to the variables m _i, i = 1...,M. The obvious problem is that the empirical risk can only be defined after the subsamples (and hence their sizes) are given (known).

Utilising an on-line stochastic gradient descent approach, this paper overcomes this difficulty and introduces a sample-querying algorithm which extends the standard SRM principle. It minimises the penalised empirical risk not only with respect to the k _i, as the standard SRM does, but also with respect to the m _i, i = 1...,M.

The challenge here is in defining a stochastic empirical criterion which when minimised yields a sequence of subsample-size vectors which asymptotically achieve the Bayes’ optimal error convergence rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, UK (1999)
Book MATH Google Scholar
Bartlett, P.L., Boucheron, S., Lugosi, G.: Model Selection and Error Estimation. Machine Learning 48(1–3), 85–113 (2002)
Article MATH Google Scholar
Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)
MATH Google Scholar
Kultchinskii, V.: Rademacher Penalties and Structural Risk Minimization. IEEE Trans. on Info. Theory 47(5), 1902–1914 (2001)
Article Google Scholar
Lugosi, G., Nobel, A.: Adaptive Model Selection Using Empirical Complexities. Annals of Statistics 27, 1830–1864 (1999)
Article MATH MathSciNet Google Scholar
Ratsaby, J.: Incremental Learning with Sample Queries. IEEE Trans. on PAMI 20(8), 883–888 (1998)
Google Scholar
Ratsaby, J.: On Learning Multicategory Classification with Sample Queries. Information and Computation 185(2), 298–327 (2003)
Article MATH MathSciNet Google Scholar
Ratsaby, J., Meir, R., Maiorov, V.: Towards Robust Model Selection using Estimation and Approximation Error Bounds. In: Proc. 9^th Annual Conference on Computational Learning Theory, p. 57. ACM, New York (1996)
Chapter Google Scholar
Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: A Framework for Structural Risk Minimisation. NeuroCOLT Technical Report Series, NC-TR-96-032, Royal Holloway, University of London (1996)
Google Scholar
Valiant, L.G.: A Theory of the learnable. Comm. ACM 27(11), 1134–1142 (1984)
Article MATH Google Scholar
Vapnik, V.N.: Estimation of Dependences Based on Empirical Data. Springer, Berlin (1982)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University College London, Gower Street, London, WC1E 6BT, United Kingdom
Joel Ratsaby

Authors

Joel Ratsaby
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavaldá
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, 060-8628, Sapporo, Japan
Klaus P. Jantke
,
Eiji Takimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ratsaby, J. (2003). A Stochastic Gradient Descent Algorithm for Structural Risk Minimisation. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-39624-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20291-2
Online ISBN: 978-3-540-39624-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics