Abstract
Almeida et al. have recently proposed online algorithms for local step size adaptation in nonlinear systems trained by gradient descent. Here we develop an alternative to their approach by extending Sutton’s work on linear systems to the general, nonlinear case. The resulting algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods as well as stochastic gradient descent with fixed learning rate and momentum.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Becker and Y. LeCun, “Improving the convergence of back-propagation learning with second order methods”, in Proceedings of the 1988 Connectionist Models Summer School, D. Touretzky, G. Hinton, and T. Sejnowski, Eds., Pittsburg 1988, 1989, pp. 29–37, Morgan Kaufmann, San Mateo.
N. N. Schraudolph and T. J. Sejnowski, “Tempering backpropagation networks: Not all weights are created equal”, in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. 1996, vol. 8, pp. 563–569, The MIT Press, Cambridge, MA, ftp://ftp.idsia.ch/pub/nic/nips95.ps.gz.
R. Neuneier and H. G. Zimmermann, “How to train neural networks”, in Neural Networks: Tricks of the Trade, vol. 1524 of Lecture Notes in Computer Science, pp. 373–423. Springer Verlag, Berlin, 1998.
R. S. Sutton, “Gain adaptation beats least squares?”, in Proc. 7th Yale Workshop on Adaptive and Learning Systems, 1992, pp. 161–166, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92b.ps.gz.
Y. LeCun, P. Y. Simard, and B. Pearlmutter, “Automatic learning rate maximization in large adaptive machines”, in Advances in Neural Information Processing Systems, S. J. Hanson, J. D. Cowan, and C. L. Giles, Eds. 1993, vol. 5, pp. 156–163, Morgan Kaufmann, San Mateo, CA.
N. Murata, K.-R. Müller, A. Ziehe, and S.-I. Amari, “Adaptive on-line learning in changing environments”, in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. 1997, vol. 9, pp. 599–605, The MIT Press, Cambridge, MA.
L.-W. Chan and F. Fallside, “An adaptive training algorithm for back propagation networks”, Computer Speech and Language, 2:205–218, 1987.
R. Jacobs, “Increased rates of convergence through learning rate adaptation”, Neural Networks, 1:295–307, 1988.
T. Tollenaere, “SuperSAB: fast adaptive back propagation with good scaling properties”, Neural Networks, 3:561–573, 1990.
F. M. Silva and L. B. Almeida, “Speeding up back-propagation”, in Advanced Neural Computers, R. Eckmiller, Ed., Amsterdam, 1990, pp. 151–158, Elsevier.
M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, in Proc. International Conference on Neural Networks, San Francisco, CA, 1993, pp. 586–591, IEEE, New York.
L. B. Almeida, T. Langlois, J. D. Amarai, and A. Plakhov, “Parameter adaptation in stochastic optimization”, in On-Line Learning in Neural Networks, D. Saad, Ed., Publications of the Newton Institute, chapter 6. Cambridge University Press, 1999, ftp://146.193.2.131/pub/lba/papers/adsteps.ps.gz.
R. S. Sutton, “Adapting bias by gradient descent: an incremental version of delta-bar-delta”, in Proc. 10th National Conference on Artificial Intelligence. 1992, pp. 171–176, The MIT Press, Cambridge, MA, ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-92a.ps.gz.
J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symposium on Theory of Computing, New York, NY, May 1995, pp. 209–218, The Association for Computing Machinery.
N. N. Schraudolph, “A fast, compact approximation of the exponential function”, Neural Computation, 11(4):853–862, 1999, ftp://ftp.idsia.ch/pub/nic/exp.ps.gz.
B. A. Pearlmutter, “Fast exact multiplication by the Hessian”, Neural Computation, 6(1):147–160, 1994.
N. N. Schraudolph, “Online local gain adaptation for multi-layer perceptrons”, Tech. Rep. IDSIA-09-98, Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, Corso Elvezia 36, 6900 Lugano, Switzerland, 1998, ftp://ftp.idsia.ch/pub/nic/olga.ps.gz.
S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter”, in Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, D. S. Touretzky, Ed., San Mateo, CA, 1989, pp. 133–140, Morgan Kaufmann.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, second edition, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Schraudolph, N.N. (1999). Online Learning with Adaptive Local Step Sizes. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN Vietri-99. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0877-1_13
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0877-1_13
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1226-6
Online ISBN: 978-1-4471-0877-1
eBook Packages: Springer Book Archive