Skip to main content

Learning Rate Adaptation in Stochastic Gradient Descent

  • Chapter
Advances in Convex Analysis and Global Optimization

Part of the book series: Nonconvex Optimization and Its Applications ((NOIA,volume 54))

Abstract

The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization methods are employed, however, in several cases, significant training speed and alleviation of the local minima problem can be achieved when stochastic minimization meth­ods are used. In this paper a method for adapting the learning rate in stochastic gradient descent is presented. The main feature of the proposed learning rate adaptation scheme is that it exploits gradient-related information from the current as well as the two previous pattern presentations. This seems to provide some kind of stabilization in the value of the learning rate and helps the stochastic gradient descent to exhibit fast convergence and a high rate of success. Tests in various problems validate the above mentioned characteristics of the new algo­rithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov (1998). Parameter adaptation in Stochastic Optimization, In: On-line Learning in Neural Networks, L.B. Almeida, T. Langlois, J.D. Amaral, A. Plankhov, ed., 111–134, Cambridge University Press.

    Google Scholar 

  2. L. Armijo (1966). Minimization of functions having Lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, 16, 1–3.

    MathSciNet  Google Scholar 

  3. R. Battiti (1989). Accelerated backpropagation learning: two optimization methods. Complex Systems, 3, 331–342.

    Google Scholar 

  4. R. Battiti (1992). First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Computation, 4, 141–166.

    Article  Google Scholar 

  5. S. Becker and Y. Le Cun, (1988). Improving the convergence of the back-propagation learning with second order methods. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 29–37. San Mateo: Morgan Koufmann.

    Google Scholar 

  6. L.W. Chan and F. Fallside (1987). An adaptive training algorithm for back-propagation networks. Computers Speech and Language, 2, 205–218.

    Article  Google Scholar 

  7. S.E. Fahlman (1988). Faster-learning variations on back-propagation: an empirical study. In: D.S. Touretzky, G.E. Hinton and T.J. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School, 38–51. San Mateo: Morgan Koufmann

    Google Scholar 

  8. A.A. Goldstein (1962). Cauchy’s method of minimization. Numerische Mathematik, 4, 146–150.

    Article  MathSciNet  Google Scholar 

  9. H.C. Hsin, C.C. Li, M. Sun, and R.J. Sclabassi (1995). An adaptive training algorithm for back-propagation neural networks. IEEE Transactions on System, Man and Cybernetics, 25, 512–514.

    Article  Google Scholar 

  10. R.A. Jacobs (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.

    Article  Google Scholar 

  11. Y. Le Cun, P.Y. Simard, and B.A. Pearlmutter (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In: S.J. Hanson, J.D. Cowan, and C.L. Giles (Eds.), Advances in Neural Information Processing Systems 5, 156–163. San Mateo: Morgan Koufmann

    Google Scholar 

  12. G.D. Magoulas, M.N. Vrahatis and G.S. Androulakis (1997). Effective back-propagation with variable stepsize, Neural Networks, 10, 69–82.

    Article  Google Scholar 

  13. G.D. Magoulas, M.N. Vrahatis, and G.S. Androulakis (1999). Improving the convergence of the back-propagation algorithm using learning rate adaptation methods. Neural Computation, 11, 1769–1796.

    Article  Google Scholar 

  14. D. Nguyen and B. Widrow (1990). Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights, In: IEEE First International Joint Conference on Neural Networks, 21–26.

    Chapter  Google Scholar 

  15. M. Pfister and R. Rojas (1993). Speeding-up backpropagation-A comparison of orthogonal techniques. In Proceedings of the Joint Conference on Neural Networks. 517–523. Nagoya, Japan.

    Google Scholar 

  16. V.P. Plagianakos, M.N. Vrahatis, and G.D. Magoulas (1999). Non-monotone Methods for Backpropagation Training with Adaptive Learning Rate. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’99). Washington D.C.

    Google Scholar 

  17. M. Riedmiller and H. Braun (1993). A direct adaptive method for faster backpropagation learning: the Rprop algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, 586–591. San Francisco.

    Google Scholar 

  18. D.E. Rumelhart and J.L. McClelland (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, MIT Press.

    Google Scholar 

  19. D. Saad (1998). On-line Learning in Neural Networks, Cambridge University Press.

    Google Scholar 

  20. N.N. Schraudolph (1998). Online Local Gain Adaptation for Multi-layer perceptrons, Technical Report, IDSIA-09–98, IDSIA, Lugano, Switzerland.

    Google Scholar 

  21. N.N. Schraudolph (1999). Local Gain Adaptation in Stochastic Gradient Descend, Technical Report, IDSIA-09–99, IDSIA, Lugano, Switzerland.

    Google Scholar 

  22. F. Silva and L. Almeida (1990). Acceleration techniques for the back-propagation algorithm. Lecture Notes in Computer Science, 412, 110–119. Berlin: Springer-Verlag.

    Google Scholar 

  23. R.S. Sutton (1992). Adapting Bias by Gradient Descent: an Incremental Version of Delta-Bar-Delta, In: Proc. of the Tenth National Conference on Artificial Intelligence, MIT Press, 171–176.

    Google Scholar 

  24. R.S. Sutton and S.D. Whitehead (1993). Online Learning with Random Representations, In: Proc. of the Tenth International Conference on Machine Learning, Morgan Kaufmann, 314–321.

    Google Scholar 

  25. T.P. Vogl, J.K. Mangis, J.K. Rigler, W.T. Zink and D.L. Alkon (1988). Accelerating the Convergence of the Back-propagation Method, Biological Cybernetics, 59, 257–263.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Kluwer Academic Publishers

About this chapter

Cite this chapter

Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N. (2001). Learning Rate Adaptation in Stochastic Gradient Descent. In: Hadjisavvas, N., Pardalos, P.M. (eds) Advances in Convex Analysis and Global Optimization. Nonconvex Optimization and Its Applications, vol 54. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0279-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-0279-7_27

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-6942-4

  • Online ISBN: 978-1-4613-0279-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics