Online Learning with Adaptive Local Step Sizes

Schraudolph, Nicol N.

doi:10.1007/978-1-4471-0877-1_13

Nicol N. Schraudolph⁴

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

176 Accesses
3 Citations

Abstract

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Regularized nonlinear acceleration

Article 21 August 2018

Scalable estimation strategies based on stochastic approximations: classical results and new insights

Article 11 June 2015

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

Article 28 September 2020

References

S. Becker and Y. LeCun, “Improving the convergence of back-propagation learning with second order methods”, in Proceedings of the 1988 Connectionist Models Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, Eds., Pittsburg 1988, 1989, pp. 29–37, Morgan Kaufmann, San Mateo.
Google Scholar
N. N. Schraudolph and T. J. Sejnowski, “Tempering backpropagation networks: Not all weights are created equal”, in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. 1996, vol. 8, pp. 563–569, The MIT Press, Cambridge, MA, ftp://ftp.idsia.ch/pub/nic/nips95.ps.gz.
Google Scholar
R. Neuneier and H. G. Zimmermann, “How to train neural networks”, in Neural Networks: Tricks of the Trade, vol. 1524 of Lecture Notes in Computer Science, pp. 373–423. Springer Verlag, Berlin, 1998.
Google Scholar
R. S. Sutton, “Gain adaptation beats least squares?”, in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161–166, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz.
Google Scholar
Y. LeCun, P. Y. Simard, and B. Pearlmutter, “Automatic learning rate maximization in large adaptive machines”, in Advances in Neural Information Processing Systems, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds. 1993, vol. 5, pp. 156–163, Morgan Kaufmann, San Mateo, CA.
Google Scholar
N. Murata, K.-R. Müller, A. Ziehe, and S.-I. Amari, “Adaptive on-line learning in changing environments”, in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. 1997, vol. 9, pp. 599–605, The MIT Press, Cambridge, MA.
Google Scholar
L.-W. Chan and F. Fallside, “An adaptive training algorithm for back propagation networks”, Computer Speech and Language, 2:205–218, 1987.
Article Google Scholar
R. Jacobs, “Increased rates of convergence through learning rate adaptation”, Neural Networks, 1:295–307, 1988.
Article Google Scholar
T. Tollenaere, “SuperSAB: fast adaptive back propagation with good scaling properties”, Neural Networks, 3:561–573, 1990.
Article Google Scholar
F. M. Silva and L. B. Almeida, “Speeding up back-propagation”, in Advanced Neural Computers, R. Eckmiller, Ed., Amsterdam, 1990, pp. 151–158, Elsevier.
Google Scholar
M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, in Proc. International Conference on Neural Networks, San Francisco, CA, 1993, pp. 586–591, IEEE, New York.
Google Scholar
L. B. Almeida, T. Langlois, J. D. Amarai, and A. Plakhov, “Parameter adaptation in stochastic optimization”, in On-Line Learning in Neural Networks, D. Saad, Ed., Publications of the Newton Institute, chapter 6. Cambridge University Press, 1999, ftp://146.193.2.131/pub/lba/papers/adsteps.ps.gz.
Google Scholar
R. S. Sutton, “Adapting bias by gradient descent: an incremental version of delta-bar-delta”, in Proc. 10th National Conference on Artificial Intelligence. 1992, pp. 171–176, The MIT Press, Cambridge, MA, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92a.ps.gz.
Google Scholar
J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symposium on Theory of Computing, New York, NY, May 1995, pp. 209–218, The Association for Computing Machinery.
Google Scholar
N. N. Schraudolph, “A fast, compact approximation of the exponential function”, Neural Computation, 11(4):853–862, 1999, ftp://ftp.idsia.ch/pub/nic/exp.ps.gz.
Article Google Scholar
B. A. Pearlmutter, “Fast exact multiplication by the Hessian”, Neural Computation, 6(1):147–160, 1994.
Article Google Scholar
N. N. Schraudolph, “Online local gain adaptation for multi-layer perceptrons”, Tech. Rep. IDSIA-09-98, Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Corso Elvezia 36, 6900 Lugano, Switzerland, 1998, ftp://ftp.idsia.ch/pub/nic/olga.ps.gz.
Google Scholar
S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter”, in Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, D. S. Touretzky, Ed., San Mateo, CA, 1989, pp. 133–140, Morgan Kaufmann.
Google Scholar
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, second edition, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Nicol N. Schraudolph

Authors

Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Scienze Fisiche, “E.R. Caianiello”, Universita di Salerno, 84081, Baronissi (SA), Italy
Maria Marinaro
DMI, Universita di Salerno, 84081, Baronissi (SA), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schraudolph, N.N. (1999). Online Learning with Adaptive Local Step Sizes. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN Vietri-99. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0877-1_13

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0877-1_13
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1226-6
Online ISBN: 978-1-4471-0877-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Online Learning with Adaptive Local Step Sizes

Abstract

Access this chapter

Preview

Similar content being viewed by others

Regularized nonlinear acceleration

Scalable estimation strategies based on stochastic approximations: classical results and new insights

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Online Learning with Adaptive Local Step Sizes

Abstract

Access this chapter

Preview

Similar content being viewed by others

Regularized nonlinear acceleration

Scalable estimation strategies based on stochastic approximations: classical results and new insights

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation