Geometry of Learning in Multilayer Perceptrons

Amari, Shun-ichi; Park, Hyeyoung; Ozeki, Tomoko

doi:10.1007/978-3-7908-2656-2_3

Geometry of Learning in Multilayer Perceptrons

Shun-ichi Amari²,
Hyeyoung Park³ &
Tomoko Ozeki²

Conference paper

684 Accesses

Abstract

Neural networks provide a good model of learning from statistical data. Multilayer perceptron is regarded as a statistical model in which a nonlinear input-output relation is realized. The set of multilayer perceptrons forms a statistical manifold in which learning and estimation takes place. This is a Riemannian manifold with Fisher information metric. However, such a hierarchical model includes algebraic singularities at which the Fisher information matrix degenerates. This causes various difficulties in learning and statistical estimation. The present paper elucidates the structure of singularities, and how they influence the behavior of learning. The paper describes a new learning algorithm, named the natural gradient method, to overcome such difficulties. Various statistical problems in singular models are discussed, and the models selection criteria (AIC and MDL) are studied in this framework.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control AC-19, 716–723.
Google Scholar
Amari S. (1965). Theory of adaptive pattern classifiers. IEEE Trans. Elect. Comput. EC-16, 299–307.
Google Scholar
Amari S. (1998). Natural gradient works efficiently in learning. Neural Computation 10, 251–276.
Article Google Scholar
Amari S., Nagaoka H. (2000). Information geometry. AMS and Oxford University Press, New York.
MATH Google Scholar
Amari S., Ozeki T. (2001). Differential and algebraic geometry of multilayer perceptrons. IEICE Trans., E84-A, 31–38.
Google Scholar
Amari S. (2003). New consideration on criteria of model selection. Neural Networks and Soft Computing (Proceedings of the Sixth International Conference on Neural Networks and Soft Computing), L. Rutkowski and J. Kacprzyk (eds.), 25–30.
Google Scholar
Amari S., Ozeki T., Park H. (2003). Learning and inference in hierarchical models with singularities. Systems and Computers in Japan 34(7), 701–708.
Article Google Scholar
Amari S., Park H., Fukumizu K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation 12, 1399–1409.
Article Google Scholar
Amari S., Park H., Ozeki T. (2002). Geometrical singularities in the neuromanifold of multilayer perceptrons. Advances in Neural Information Processing Systems,T.G. Dietterich, S. Becker, and Z. Ghahramani (eds.) 14, 343–350.
Google Scholar
Chen A.M., Liu H., Hecht-Nielsen R. (1993). On the geometry of feedforward neural network error surfaces. Neural Computation 5, 910–927.
Article Google Scholar
Fukumizu K. (2003). Likelihood ratio of unidentifiable models and multilayer neural networks. The Annals of Statistics 31(3), 833–851.
Article MATH MathSciNet Google Scholar
Fukumizu K., Amari S. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks 13, 317–327.
Article Google Scholar
Hartigan J.A. (1985). A failure of likelihood asymptotics for normal mixtures. Proc. Barkeley Conf. in Honor of J. Neyman and J. Kiefer 2, 807–810.
Google Scholar
Inoue M., Park H., Okada M. (2003). On-line learning theory of soft committee machines with correlated hidden units-Steepest gradient descent and natural gradient descent-. J. Phys. Soc.Jpn 72(4), 805–810.
Article Google Scholar
Kůrková V., Kainen P.C. (1994). Functionally equivalent feedforward neural networks. Neural Computation 6, 543–558.
Article Google Scholar
Lin X., Shao Y., (2003). Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics 31(3), 807–832.
Article MathSciNet Google Scholar
Park H., Amari S., Fukumizu K. (2000). Adaptive natural gradient learning algorithms for various stochastic models. Neural Networks 13, 755–764.
Article Google Scholar
Park H., Inoue M., Okada M. (2003). Learning dynamics of multilayer perceptrons with unidentifiable parameters. J. Phys. A: Mathe. Gen. 36 (47), 11753–11764.
Article MATH MathSciNet Google Scholar
Rao C.R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society 37, 81–91.
MATH MathSciNet Google Scholar
Rattray M., Saad D., Amari S. (1998). Natural gradient descent for online learning. Physical Review Letters 81, 5461–5464.
Article Google Scholar
Riegler P., Biehl M. (1995). On-line backpropagation in two-layered neural networks. J. Phys. A; Mathe. Gen. 28, L507–L513.
Article Google Scholar
Rissanen J. (1978). Modelling by shortest data description. Automata 14, 465–471.
Article MATH Google Scholar
Rumelhart D.E., Hinton G.E., Williams R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group (eds.), Parallel distributed processing (Vol. 1, 318–362), Cambridge, MA:MIT Press.
Google Scholar
Rüger S.M., Ossen A. (1995). The metric of weight space. Neural Processing Letters 5, 63–72.
Google Scholar
Saad D., Solla A. (1995). On-line learning in soft committee machines. Phys. Rev. E 52, 4225–4243.
Article Google Scholar
Schwarz G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.
Article MATH MathSciNet Google Scholar
Sussmann H.J. (1992). Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Networks 5, 589–593.
Article Google Scholar
Watanabe S. (2001a). Algebraic analysis for non-identifiable learning machines. Neural Computation 13, 899–933.
Article MATH Google Scholar
Watanabe S. (2001b). Algebraic geometrical methods for hierarchical learning machines. Neural Networks 14(8), 1409–1060.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Brain Science Institute, RIKEN, 2-1 Hirosawa, 351-0198, Wako, Satitma, Japan
Shun-ichi Amari & Tomoko Ozeki
Dept. of Computer Science, College of Natural Science, Kyungpook National University, 702-701, Sangyuk-dong, Buk-gu, Daegu, Korea
Hyeyoung Park

Authors

Shun-ichi Amari
View author publications
You can also search for this author in PubMed Google Scholar
Hyeyoung Park
View author publications
You can also search for this author in PubMed Google Scholar
Tomoko Ozeki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Physics Department of Statistics and Probability, Charles University, Sokolovská 83, 18675, Prague 8 - Karlin, Czech Republic
Jaromir Antoch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amari, Si., Park, H., Ozeki, T. (2004). Geometry of Learning in Multilayer Perceptrons. In: Antoch, J. (eds) COMPSTAT 2004 — Proceedings in Computational Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-2656-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-7908-2656-2_3
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1554-2
Online ISBN: 978-3-7908-2656-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics