Learning Rate Adaptation in Stochastic Gradient Descent

Plagianakos, V. P.; Magoulas, G. D.; Vrahatis, M. N.

doi:10.1007/978-1-4613-0279-7_27

V. P. Plagianakos⁴,
G. D. Magoulas⁵ &
M. N. Vrahatis⁶

Part of the book series: Nonconvex Optimization and Its Applications ((NOIA,volume 54))

712 Accesses
27 Citations

Abstract

The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization methods are employed, however, in several cases, significant training speed and alleviation of the local minima problem can be achieved when stochastic minimization methods are used. In this paper a method for adapting the learning rate in stochastic gradient descent is presented. The main feature of the proposed learning rate adaptation scheme is that it exploits gradient-related information from the current as well as the two previous pattern presentations. This seems to provide some kind of stabilization in the value of the learning rate and helps the stochastic gradient descent to exhibit fast convergence and a high rate of success. Tests in various problems validate the above mentioned characteristics of the new algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov (1998). Parameter adaptation in Stochastic Optimization, In: On-line Learning in Neural Networks, L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov, ed., 111–134, Cambridge University Press.
Google Scholar
L. Armijo (1966). Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1–3.
MathSciNet Google Scholar
R. Battiti (1989). Accelerated backpropagation learning: two optimization methods. Complex Systems, 3, 331–342.
Google Scholar
R. Battiti (1992). First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation, 4, 141–166.
Article Google Scholar
S. Becker and Y. Le Cun, (1988). Improving the convergence of the back-propagation learning with second order methods. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 29–37. San Mateo: Morgan Koufmann.
Google Scholar
L.W. Chan and F. Fallside (1987). An adaptive training algorithm for back-propagation networks. Computers Speech and Language, 2, 205–218.
Article Google Scholar
S.E. Fahlman (1988). Faster-learning variations on back-propagation: an empirical study. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 38–51. San Mateo: Morgan Koufmann
Google Scholar
A.A. Goldstein (1962). Cauchy’s method of minimization. Numerische Mathematik, 4, 146–150.
Article MathSciNet Google Scholar
H.C. Hsin, C.C. Li, M. Sun, and R.J. Sclabassi (1995). An adaptive training algorithm for back-propagation neural networks. IEEE Transactions on System, Man and Cybernetics, 25, 512–514.
Article Google Scholar
R.A. Jacobs (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.
Article Google Scholar
Y. Le Cun, P.Y. Simard, and B.A. Pearlmutter (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In: S.J. Hanson, J.D. Cowan, and C.L. Giles (Eds.), Advances in Neural Information Processing Systems 5, 156–163. San Mateo: Morgan Koufmann
Google Scholar
G.D. Magoulas, M.N. Vrahatis and G.S. Androulakis (1997). Effective back-propagation with variable stepsize, Neural Networks, 10, 69–82.
Article Google Scholar
G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis (1999). Improving the convergence of the back-propagation algorithm using learning rate adaptation methods. Neural Computation, 11, 1769–1796.
Article Google Scholar
D. Nguyen and B. Widrow (1990). Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights, In: IEEE First International Joint Conference on Neural Networks, 21–26.
Chapter Google Scholar
M. Pfister and R. Rojas (1993). Speeding-up backpropagation-A comparison of orthogonal techniques. In Proceedings of the Joint Conference on Neural Networks. 517–523. Nagoya, Japan.
Google Scholar
V.P. Plagianakos, M.N. Vrahatis, and G.D. Magoulas (1999). Non-monotone Methods for Backpropagation Training with Adaptive Learning Rate. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’99). Washington D.C.
Google Scholar
M. Riedmiller and H. Braun (1993). A direct adaptive method for faster backpropagation learning: the Rprop algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, 586–591. San Francisco.
Google Scholar
D.E. Rumelhart and J.L. McClelland (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, MIT Press.
Google Scholar
D. Saad (1998). On-line Learning in Neural Networks, Cambridge University Press.
Google Scholar
N.N. Schraudolph (1998). Online Local Gain Adaptation for Multi-layer perceptrons, Technical Report, IDSIA-09–98, IDSIA, Lugano, Switzerland.
Google Scholar
N.N. Schraudolph (1999). Local Gain Adaptation in Stochastic Gradient Descend, Technical Report, IDSIA-09–99, IDSIA, Lugano, Switzerland.
Google Scholar
F. Silva and L. Almeida (1990). Acceleration techniques for the back-propagation algorithm. Lecture Notes in Computer Science, 412, 110–119. Berlin: Springer-Verlag.
Google Scholar
R.S. Sutton (1992). Adapting Bias by Gradient Descent: an Incremental Version of Delta-Bar-Delta, In: Proc. of the Tenth National Conference on Artificial Intelligence, MIT Press, 171–176.
Google Scholar
R.S. Sutton and S.D. Whitehead (1993). Online Learning with Random Representations, In: Proc. of the Tenth International Conference on Machine Learning, Morgan Kaufmann, 314–321.
Google Scholar
T.P. Vogl, J.K. Mangis, J.K. Rigler, W.T. Zink and D.L. Alkon (1988). Accelerating the Convergence of the Back-propagation Method, Biological Cybernetics, 59, 257–263.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Patras, GR-26110, Patras, Greece
V. P. Plagianakos
Department of Information Systems and Computing, Brunel University, Uxbridge, UB8 3PH, UK
G. D. Magoulas
Department of Mathematics, University of Patras, GR-26110, Patras, Greece
M. N. Vrahatis

Authors

V. P. Plagianakos
View author publications
You can also search for this author in PubMed Google Scholar
G. D. Magoulas
View author publications
You can also search for this author in PubMed Google Scholar
M. N. Vrahatis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of the Aegean, Greece
Nicolas Hadjisavvas
University of Florida, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N. (2001). Learning Rate Adaptation in Stochastic Gradient Descent. In: Hadjisavvas, N., Pardalos, P.M. (eds) Advances in Convex Analysis and Global Optimization. Nonconvex Optimization and Its Applications, vol 54. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0279-7_27

Download citation

DOI: https://doi.org/10.1007/978-1-4613-0279-7_27
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-6942-4
Online ISBN: 978-1-4613-0279-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics