New Generation Computing

, Volume 24, Issue 1, pp 79–95 | Cite as

Part 2: Multilayer perceptron and natural gradient learning

  • Hyeyong Park
Tutorial on Brain-Inspired Computing


Since the perceptron was developed for learning to classify input patterns, there have been plenty of studies on simple perceptrons and multilayer perceptrons. Despite wide and active studies in theory and applications, multilayer perceptrons still have many unsettled problems such as slow learning speed and overfitting. To find a thorough solution to these problems, it is necessary to consolidate previous studies, and find new directions for uplifting the practical power of multilayer perceptrons. As a first step toward the new stage of studies on multilayer perceptrons, we give short reviews on two interesting and important approaches; one is stochastic approach and the other is geometric approach. We also explain an efficient learning algorithm developed from the statistical and geometrical studies, which is now well known as the natural gradient learning method.


Multilayer Perceptrons Gradient Decent Learning Backpropagation Learning Natural Gradient Singularity Stochastic Neural Networks Neuromanifold 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1).
    Amari, S., “Theory of Adaptive Pattern classifiers,”IEEE Trans. Elect. Comput., EC-16, pp. 299–307, 1965.CrossRefGoogle Scholar
  2. 2).
    Amari, S., “Natural Gradient Works Efficiently in Learning,”Neural Computation, 10, pp. 251–276, 1998.CrossRefGoogle Scholar
  3. 3).
    Amari, S. and Nagaoka, H.,Methods of Information Geometry, AMS New York and Oxford University Press, 2000.Google Scholar
  4. 4).
    Amari, S., Park, H. and Fukumizu, K., “Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons”Neural Computation, 12, pp. 1399–1409, 2003.CrossRefGoogle Scholar
  5. 5).
    Amari, S., Park, H. and Ozeki, T., “Singularities Affect Dyanmics of Learning in Neuromanifold,”Neural Computation, 17, in press, 2005.Google Scholar
  6. 6).
    Bishop, C.M.,Neural Networks, for Pattern Recognition, Clarendon Press, Oxford, 1995.Google Scholar
  7. 7).
    Bottou, L. “Online Algorithms and Stochastic Approximations,”Online Learning and Neural Networks (Saad, D. ed.), Cambridge University Press, Cambridge, UK, 1998.Google Scholar
  8. 8).
    Cybenko, G., “Approximation by Superpositions of a Sigmoidal FUnction,”Mathematical Control Singals Systems, 2, pp. 303–310, 1989.MATHCrossRefMathSciNetGoogle Scholar
  9. 9).
    Fukumizu, K., “Likelihood Ratio of Unidentifiable Models and Multilayer Neural Networks,”The Annals of Statistics, 3, pp. 833–851, 2003.CrossRefMathSciNetGoogle Scholar
  10. 10).
    Fukumizu, K. and Amari, S., “Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons,”Neural Networks, 13, pp. 317–327, 2000.CrossRefGoogle Scholar
  11. 11).
    Hagiwara, K., “On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario,”Neural Computation, 14, pp. 1979–2002, 2002.MATHCrossRefGoogle Scholar
  12. 12).
    Inoue, M., Park, H. and Okada, M., “On-line Learning Theory of Soft Committee Machines with Correlated Hidden Units — Steepest Gradient Descent and Natural Gradient Descent-,”J. Phys. Soc. Jpn., 72, 4, pp. 805–810, 2003.CrossRefGoogle Scholar
  13. 13).
    Inoue, M., Park, H. and Okada, M., “Dynamics of the Adaptive natural Gradient Descent Method for Soft Committee Machines,”Physical Review E, 69, 5-2, 056120, 2004.CrossRefGoogle Scholar
  14. 14).
    Park, H., Amari, S. and Fukumizu, K., “Adaptive Natural Gradient Learning Algorithms for Various Stochastic Models,”Neural Networks, 13, pp. 755–764, 2000.CrossRefGoogle Scholar
  15. 15).
    Park, H., Inoue, M. and Okada, M., “Learning Dynamics of Multilayer Perceptrons with Singular Structures,”J. Phys. A, 36, 11753, 2003.MATHCrossRefMathSciNetGoogle Scholar
  16. 16).
    Park, H., Inoue, M. and Okada, M., “Slow Dynamics Due to Singularities of Hierarchical Learning Machines,”Progress of Theoretical Physics Supplement, 157, pp. 275–279, 2005.CrossRefGoogle Scholar
  17. 17).
    Rattray, M., Saad, D. and Amari, S., “Natural Gradient Descent for On-line Learning,”Physical Review Letters, 81, 24, pp. 5461–5464, 1998.CrossRefGoogle Scholar
  18. 18).
    Rosenblatt, F., “The Perceptron: A Probabilistic Model for Information Storage and organization in the brain,”Psychological Review, 65, pp. 386–408, 1958.CrossRefMathSciNetGoogle Scholar
  19. 19).
    Rosenblatt, F.,Principles of Neurodynamics, Spartan New York, 1961.Google Scholar
  20. 20).
    Rumelhart, D., Hinton, G. E. and Williams, R. J., “Learning Internal Representations by Error Backpropagation,” inParallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Foundations, MIT Press, Cambridge, MA, 1986.Google Scholar
  21. 21).
    Saad, D. and Solla, A., “On-line Learning in Soft Committee Machines”Phys. Rev. E. 52, pp. 4225–4243, 1995.CrossRefGoogle Scholar
  22. 22).
    Watanabe, S., “Algebraic Analysis for Non-identifiable Learning Machines,”Neural Computation, 13, pp. 899–933, 2001.MATHCrossRefGoogle Scholar

Copyright information

© Ohmsha, Ltd. and Springer 2006

Authors and Affiliations

  • Hyeyong Park
    • 1
  1. 1.Computer Science Dept.Kyungpook National UniversityDaeguKorea

Personalised recommendations