Skip to main content

Online Learning with Adaptive Local Step Sizes

  • Conference paper
Neural Nets WIRN Vietri-99

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Abstract

Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Becker and Y. LeCun, “Improving the convergence of back-propagation learning with second order methods”, in Proceedings of the 1988 Connectionist Models Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, Eds., Pittsburg 1988, 1989, pp. 29–37, Morgan Kaufmann, San Mateo.

    Google Scholar 

  2. N. N. Schraudolph and T. J. Sejnowski, “Tempering backpropagation networks: Not all weights are created equal”, in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. 1996, vol. 8, pp. 563–569, The MIT Press, Cambridge, MA, ftp://ftp.idsia.ch/pub/nic/nips95.ps.gz.

    Google Scholar 

  3. R. Neuneier and H. G. Zimmermann, “How to train neural networks”, in Neural Networks: Tricks of the Trade, vol. 1524 of Lecture Notes in Computer Science, pp. 373–423. Springer Verlag, Berlin, 1998.

    Google Scholar 

  4. R. S. Sutton, “Gain adaptation beats least squares?”, in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161–166, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz.

    Google Scholar 

  5. Y. LeCun, P. Y. Simard, and B. Pearlmutter, “Automatic learning rate maximization in large adaptive machines”, in Advances in Neural Information Processing Systems, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds. 1993, vol. 5, pp. 156–163, Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  6. N. Murata, K.-R. Müller, A. Ziehe, and S.-I. Amari, “Adaptive on-line learning in changing environments”, in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. 1997, vol. 9, pp. 599–605, The MIT Press, Cambridge, MA.

    Google Scholar 

  7. L.-W. Chan and F. Fallside, “An adaptive training algorithm for back propagation networks”, Computer Speech and Language, 2:205–218, 1987.

    Article  Google Scholar 

  8. R. Jacobs, “Increased rates of convergence through learning rate adaptation”, Neural Networks, 1:295–307, 1988.

    Article  Google Scholar 

  9. T. Tollenaere, “SuperSAB: fast adaptive back propagation with good scaling properties”, Neural Networks, 3:561–573, 1990.

    Article  Google Scholar 

  10. F. M. Silva and L. B. Almeida, “Speeding up back-propagation”, in Advanced Neural Computers, R. Eckmiller, Ed., Amsterdam, 1990, pp. 151–158, Elsevier.

    Google Scholar 

  11. M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, in Proc. International Conference on Neural Networks, San Francisco, CA, 1993, pp. 586–591, IEEE, New York.

    Google Scholar 

  12. L. B. Almeida, T. Langlois, J. D. Amarai, and A. Plakhov, “Parameter adaptation in stochastic optimization”, in On-Line Learning in Neural Networks, D. Saad, Ed., Publications of the Newton Institute, chapter 6. Cambridge University Press, 1999, ftp://146.193.2.131/pub/lba/papers/adsteps.ps.gz.

    Google Scholar 

  13. R. S. Sutton, “Adapting bias by gradient descent: an incremental version of delta-bar-delta”, in Proc. 10th National Conference on Artificial Intelligence. 1992, pp. 171–176, The MIT Press, Cambridge, MA, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92a.ps.gz.

    Google Scholar 

  14. J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symposium on Theory of Computing, New York, NY, May 1995, pp. 209–218, The Association for Computing Machinery.

    Google Scholar 

  15. N. N. Schraudolph, “A fast, compact approximation of the exponential function”, Neural Computation, 11(4):853–862, 1999, ftp://ftp.idsia.ch/pub/nic/exp.ps.gz.

    Article  Google Scholar 

  16. B. A. Pearlmutter, “Fast exact multiplication by the Hessian”, Neural Computation, 6(1):147–160, 1994.

    Article  Google Scholar 

  17. N. N. Schraudolph, “Online local gain adaptation for multi-layer perceptrons”, Tech. Rep. IDSIA-09-98, Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Corso Elvezia 36, 6900 Lugano, Switzerland, 1998, ftp://ftp.idsia.ch/pub/nic/olga.ps.gz.

    Google Scholar 

  18. S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter”, in Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, D. S. Touretzky, Ed., San Mateo, CA, 1989, pp. 133–140, Morgan Kaufmann.

    Google Scholar 

  19. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, second edition, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Schraudolph, N.N. (1999). Online Learning with Adaptive Local Step Sizes. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN Vietri-99. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0877-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0877-1_13

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-1226-6

  • Online ISBN: 978-1-4471-0877-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics