Skip to main content

Natural Conjugate Gradient Training of Multilayer Perceptrons

  • Conference paper
Artificial Neural Networks – ICANN 2006 (ICANN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

  • 3266 Accesses

Abstract

For maximum log–likelihood estimation, the Fisher matrix defines a Riemannian metric in weight space and, as shown by Amari and his coworkers, the resulting natural gradient greatly accelerates on–line multilayer perceptron (MLP) training. While its batch gradient descent counterpart also improves on standard gradient descent (as it gives a Gauss–Newton approximation to mean square error minimization), it may no longer be competitive with more advanced gradient–based function minimization procedures. In this work we shall show how to introduce natural gradients in a conjugate gradient (CG) setting, showing numerically that when applied to batch MLP learning, they lead to faster convergence to better minima than that achieved by standard euclidean CG descent. Since a drawback of full natural gradient is its larger computational cost, we also consider some cost simplifying variants and show that one of them, diagonal natural CG, also gives better minima than standard CG, with a comparable complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari, S.: Natural Gradient Works Efficiently in Learning. Neural Computation 10, 251–276 (1998)

    Article  Google Scholar 

  2. Amari, S., Nagaoka, H.: Methods of information geometry. American Mathematical Society (2000)

    Google Scholar 

  3. Amari, S., Park, H., Fukumizu, K.: Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons. Neural Computation 12, 1399–1409 (2000)

    Article  Google Scholar 

  4. Duda, R., Hart, P., Stork, D.: Pattern classification. Wiley, Chichester (2000)

    Google Scholar 

  5. Heskes, T.: On natural Learning and pruning in multilayered perceptrons. Neural Computation 12, 1037–1057 (2000)

    Article  Google Scholar 

  6. Igel, C., Toussaint, M., Weishui, W.: Rprop Using the Natural Gradient. In: Trends and Applications in Constructive Approximation. International Series of Numerical Mathematics, vol. 151, Birkhäuser, Basel (2005)

    Google Scholar 

  7. LeCun, J., Bottou, L., Orr, G., Müller, K.R.: Efficient BackProp. In: Neural Networks: tricks of the trade, pp. 9–50. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Murray, M., Rice, J.W.: Differential Geometry and Statistics. Chapman & Hall, Boca Raton (1993)

    MATH  Google Scholar 

  9. Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases, Tech. Report, University of Califonia, Irvine (1994)

    Google Scholar 

  10. Polak, F.: Computational Methods in Optimization. Academic Press, London (1971)

    Google Scholar 

  11. Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C. Cambridge U. Press, New York (1988)

    MATH  Google Scholar 

  12. Rao, C.R.: Information and accuracy attainable in estimation of statistical parameters. Bull. Cal. Math. Soc. 37, 81–91 (1945)

    MATH  Google Scholar 

  13. Rattray, M., Saad, D., Amari, S.: Natural gradient descent for on–line learning. Physical Review Letters 81, 5461–5464 (1998)

    Article  Google Scholar 

  14. Yang, H., Amari, S.: Complexity Issues in Natural Gradient Descent Method for Training Multi-Layer Perceptrons. Neural Computation 10, 2137–2157 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

González, A., Dorronsoro, J.R. (2006). Natural Conjugate Gradient Training of Multilayer Perceptrons. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_18

Download citation

  • DOI: https://doi.org/10.1007/11840817_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-38625-4

  • Online ISBN: 978-3-540-38627-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics