Skip to main content

Geometry of Learning in Multilayer Perceptrons

  • Conference paper
  • 684 Accesses

Abstract

Neural networks provide a good model of learning from statistical data. Multilayer perceptron is regarded as a statistical model in which a nonlinear input-output relation is realized. The set of multilayer perceptrons forms a statistical manifold in which learning and estimation takes place. This is a Riemannian manifold with Fisher information metric. However, such a hierarchical model includes algebraic singularities at which the Fisher information matrix degenerates. This causes various difficulties in learning and statistical estimation. The present paper elucidates the structure of singularities, and how they influence the behavior of learning. The paper describes a new learning algorithm, named the natural gradient method, to overcome such difficulties. Various statistical problems in singular models are discussed, and the models selection criteria (AIC and MDL) are studied in this framework.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control AC-19, 716–723.

    Google Scholar 

  2. Amari S. (1965). Theory of adaptive pattern classifiers. IEEE Trans. Elect. Comput. EC-16, 299–307.

    Google Scholar 

  3. Amari S. (1998). Natural gradient works efficiently in learning. Neural Computation 10, 251–276.

    Article  Google Scholar 

  4. Amari S., Nagaoka H. (2000). Information geometry. AMS and Oxford University Press, New York.

    MATH  Google Scholar 

  5. Amari S., Ozeki T. (2001). Differential and algebraic geometry of multilayer perceptrons. IEICE Trans., E84-A, 31–38.

    Google Scholar 

  6. Amari S. (2003). New consideration on criteria of model selection. Neural Networks and Soft Computing (Proceedings of the Sixth International Conference on Neural Networks and Soft Computing), L. Rutkowski and J. Kacprzyk (eds.), 25–30.

    Google Scholar 

  7. Amari S., Ozeki T., Park H. (2003). Learning and inference in hierarchical models with singularities. Systems and Computers in Japan 34(7), 701–708.

    Article  Google Scholar 

  8. Amari S., Park H., Fukumizu K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation 12, 1399–1409.

    Article  Google Scholar 

  9. Amari S., Park H., Ozeki T. (2002). Geometrical singularities in the neuromanifold of multilayer perceptrons. Advances in Neural Information Processing Systems,T.G. Dietterich, S. Becker, and Z. Ghahramani (eds.) 14, 343–350.

    Google Scholar 

  10. Chen A.M., Liu H., Hecht-Nielsen R. (1993). On the geometry of feedforward neural network error surfaces. Neural Computation 5, 910–927.

    Article  Google Scholar 

  11. Fukumizu K. (2003). Likelihood ratio of unidentifiable models and multilayer neural networks. The Annals of Statistics 31(3), 833–851.

    Article  MATH  MathSciNet  Google Scholar 

  12. Fukumizu K., Amari S. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks 13, 317–327.

    Article  Google Scholar 

  13. Hartigan J.A. (1985). A failure of likelihood asymptotics for normal mixtures. Proc. Barkeley Conf. in Honor of J. Neyman and J. Kiefer 2, 807–810.

    Google Scholar 

  14. Inoue M., Park H., Okada M. (2003). On-line learning theory of soft committee machines with correlated hidden units-Steepest gradient descent and natural gradient descent-. J. Phys. Soc.Jpn 72(4), 805–810.

    Article  Google Scholar 

  15. Kůrková V., Kainen P.C. (1994). Functionally equivalent feedforward neural networks. Neural Computation 6, 543–558.

    Article  Google Scholar 

  16. Lin X., Shao Y., (2003). Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics 31(3), 807–832.

    Article  MathSciNet  Google Scholar 

  17. Park H., Amari S., Fukumizu K. (2000). Adaptive natural gradient learning algorithms for various stochastic models. Neural Networks 13, 755–764.

    Article  Google Scholar 

  18. Park H., Inoue M., Okada M. (2003). Learning dynamics of multilayer perceptrons with unidentifiable parameters. J. Phys. A: Mathe. Gen. 36 (47), 11753–11764.

    Article  MATH  MathSciNet  Google Scholar 

  19. Rao C.R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society 37, 81–91.

    MATH  MathSciNet  Google Scholar 

  20. Rattray M., Saad D., Amari S. (1998). Natural gradient descent for online learning. Physical Review Letters 81, 5461–5464.

    Article  Google Scholar 

  21. Riegler P., Biehl M. (1995). On-line backpropagation in two-layered neural networks. J. Phys. A; Mathe. Gen. 28, L507–L513.

    Article  Google Scholar 

  22. Rissanen J. (1978). Modelling by shortest data description. Automata 14, 465–471.

    Article  MATH  Google Scholar 

  23. Rumelhart D.E., Hinton G.E., Williams R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group (eds.), Parallel distributed processing (Vol. 1, 318–362), Cambridge, MA:MIT Press.

    Google Scholar 

  24. Rüger S.M., Ossen A. (1995). The metric of weight space. Neural Processing Letters 5, 63–72.

    Google Scholar 

  25. Saad D., Solla A. (1995). On-line learning in soft committee machines. Phys. Rev. E 52, 4225–4243.

    Article  Google Scholar 

  26. Schwarz G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.

    Article  MATH  MathSciNet  Google Scholar 

  27. Sussmann H.J. (1992). Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Networks 5, 589–593.

    Article  Google Scholar 

  28. Watanabe S. (2001a). Algebraic analysis for non-identifiable learning machines. Neural Computation 13, 899–933.

    Article  MATH  Google Scholar 

  29. Watanabe S. (2001b). Algebraic geometrical methods for hierarchical learning machines. Neural Networks 14(8), 1409–1060.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amari, Si., Park, H., Ozeki, T. (2004). Geometry of Learning in Multilayer Perceptrons. In: Antoch, J. (eds) COMPSTAT 2004 — Proceedings in Computational Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-2656-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-2656-2_3

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-1554-2

  • Online ISBN: 978-3-7908-2656-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics