Abstract
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization methods are employed, however, in several cases, significant training speed and alleviation of the local minima problem can be achieved when stochastic minimization methods are used. In this paper a method for adapting the learning rate in stochastic gradient descent is presented. The main feature of the proposed learning rate adaptation scheme is that it exploits gradient-related information from the current as well as the two previous pattern presentations. This seems to provide some kind of stabilization in the value of the learning rate and helps the stochastic gradient descent to exhibit fast convergence and a high rate of success. Tests in various problems validate the above mentioned characteristics of the new algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov (1998). Parameter adaptation in Stochastic Optimization, In: On-line Learning in Neural Networks, L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov, ed., 111–134, Cambridge University Press.
L. Armijo (1966). Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1–3.
R. Battiti (1989). Accelerated backpropagation learning: two optimization methods. Complex Systems, 3, 331–342.
R. Battiti (1992). First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation, 4, 141–166.
S. Becker and Y. Le Cun, (1988). Improving the convergence of the back-propagation learning with second order methods. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 29–37. San Mateo: Morgan Koufmann.
L.W. Chan and F. Fallside (1987). An adaptive training algorithm for back-propagation networks. Computers Speech and Language, 2, 205–218.
S.E. Fahlman (1988). Faster-learning variations on back-propagation: an empirical study. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 38–51. San Mateo: Morgan Koufmann
A.A. Goldstein (1962). Cauchy’s method of minimization. Numerische Mathematik, 4, 146–150.
H.C. Hsin, C.C. Li, M. Sun, and R.J. Sclabassi (1995). An adaptive training algorithm for back-propagation neural networks. IEEE Transactions on System, Man and Cybernetics, 25, 512–514.
R.A. Jacobs (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.
Y. Le Cun, P.Y. Simard, and B.A. Pearlmutter (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In: S.J. Hanson, J.D. Cowan, and C.L. Giles (Eds.), Advances in Neural Information Processing Systems 5, 156–163. San Mateo: Morgan Koufmann
G.D. Magoulas, M.N. Vrahatis and G.S. Androulakis (1997). Effective back-propagation with variable stepsize, Neural Networks, 10, 69–82.
G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis (1999). Improving the convergence of the back-propagation algorithm using learning rate adaptation methods. Neural Computation, 11, 1769–1796.
D. Nguyen and B. Widrow (1990). Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights, In: IEEE First International Joint Conference on Neural Networks, 21–26.
M. Pfister and R. Rojas (1993). Speeding-up backpropagation-A comparison of orthogonal techniques. In Proceedings of the Joint Conference on Neural Networks. 517–523. Nagoya, Japan.
V.P. Plagianakos, M.N. Vrahatis, and G.D. Magoulas (1999). Non-monotone Methods for Backpropagation Training with Adaptive Learning Rate. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’99). Washington D.C.
M. Riedmiller and H. Braun (1993). A direct adaptive method for faster backpropagation learning: the Rprop algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, 586–591. San Francisco.
D.E. Rumelhart and J.L. McClelland (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, MIT Press.
D. Saad (1998). On-line Learning in Neural Networks, Cambridge University Press.
N.N. Schraudolph (1998). Online Local Gain Adaptation for Multi-layer perceptrons, Technical Report, IDSIA-09–98, IDSIA, Lugano, Switzerland.
N.N. Schraudolph (1999). Local Gain Adaptation in Stochastic Gradient Descend, Technical Report, IDSIA-09–99, IDSIA, Lugano, Switzerland.
F. Silva and L. Almeida (1990). Acceleration techniques for the back-propagation algorithm. Lecture Notes in Computer Science, 412, 110–119. Berlin: Springer-Verlag.
R.S. Sutton (1992). Adapting Bias by Gradient Descent: an Incremental Version of Delta-Bar-Delta, In: Proc. of the Tenth National Conference on Artificial Intelligence, MIT Press, 171–176.
R.S. Sutton and S.D. Whitehead (1993). Online Learning with Random Representations, In: Proc. of the Tenth International Conference on Machine Learning, Morgan Kaufmann, 314–321.
T.P. Vogl, J.K. Mangis, J.K. Rigler, W.T. Zink and D.L. Alkon (1988). Accelerating the Convergence of the Back-propagation Method, Biological Cybernetics, 59, 257–263.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Kluwer Academic Publishers
About this chapter
Cite this chapter
Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N. (2001). Learning Rate Adaptation in Stochastic Gradient Descent. In: Hadjisavvas, N., Pardalos, P.M. (eds) Advances in Convex Analysis and Global Optimization. Nonconvex Optimization and Its Applications, vol 54. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0279-7_27
Download citation
DOI: https://doi.org/10.1007/978-1-4613-0279-7_27
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-6942-4
Online ISBN: 978-1-4613-0279-7
eBook Packages: Springer Book Archive