Advertisement

Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning

  • Kazuyuki Hara
  • Kentaro Katahira
  • Kazuo Okanoya
  • Masato Okada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)

Abstract

In on-line gradient descent learning, the local property of the derivative term of the output can slow convergence. Improving the derivative term, such as by using the natural gradient, has been proposed for speeding up the convergence. Beside this sophisticated method, ”simple method” that replace the derivative term with a constant has proposed and showed that this greatly increases convergence speed. Although this phenomenon has been analyzed empirically, however, theoretical analysis is required to show its generality. In this paper, we theoretically analyze the effect of using the simple method. Our results show that, with the simple method, the generalization error decreases faster than with the true gradient descent method when the learning step is smaller than optimum value η opt . When it is larger than η opt , it decreases slower with the simple method, and the residual error is larger than with the true gradient descent method. Moreover, when there is output noise, η opt is no longer optimum; thus, the simple method is not robust in noisy circumstances.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Krogh, A., Hertz, J., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City (1991)Google Scholar
  2. 2.
    Biehl, M., Schwarze, H.: Learning by on-line gradient descent. Journal of Physics A: Mathematical and General Physics 28, 643–656 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Saad, D., Solla, S.A.: On-line learning in soft-committee machines. Physical Review E 52, 4225–4243 (1995)CrossRefGoogle Scholar
  4. 4.
    Hara, K., Katahira, K., Okanoya, K., Okada, M.: Statistical Mechanics of On-Line Node-perturbation Learning. Information Processing Society of Japan, Transactions on Mathematical Modeling and Its Applications 4(1), 72–81 (2011)Google Scholar
  5. 5.
    Fukumizu, K.: A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network. Neural Networks 9(5), 871–879 (1996)CrossRefGoogle Scholar
  6. 6.
    Rattray, M., Saad, D.: Incorporating Curvature Information into On-line learning. In: Saad, D. (ed.) On-line Learning in Neural Networks, pp. 183–207. Cambridge University Press, Cambridge (1998)Google Scholar
  7. 7.
    Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)CrossRefGoogle Scholar
  8. 8.
    Fahlman, S.E.: An Empirical Study of Learning Speed in Back-Propagation Networks, CMU-CS-88-162 (1988)Google Scholar
  9. 9.
    Williams, C.K.I.: Computation with Infinite Neural Networks. Neural Computation 10, 1203–1216 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kazuyuki Hara
    • 1
  • Kentaro Katahira
    • 2
    • 3
  • Kazuo Okanoya
    • 3
    • 4
  • Masato Okada
    • 4
    • 3
    • 2
  1. 1.College of Industrial TechnologyNihon UniversityNarashinoJapan
  2. 2.Center for Evolutionary Cognitive SciencesThe University of TokyoMeguro-kuJapan
  3. 3.Brain Science InstituteRIKENWakoJapan
  4. 4.Graduate School of Frontier ScienceThe University of TokyoKashiwaJapan

Personalised recommendations