Abstract
Neural networks provide a good model of learning from statistical data. Multilayer perceptron is regarded as a statistical model in which a nonlinear input-output relation is realized. The set of multilayer perceptrons forms a statistical manifold in which learning and estimation takes place. This is a Riemannian manifold with Fisher information metric. However, such a hierarchical model includes algebraic singularities at which the Fisher information matrix degenerates. This causes various difficulties in learning and statistical estimation. The present paper elucidates the structure of singularities, and how they influence the behavior of learning. The paper describes a new learning algorithm, named the natural gradient method, to overcome such difficulties. Various statistical problems in singular models are discussed, and the models selection criteria (AIC and MDL) are studied in this framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Akaike H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control AC-19, 716–723.
Amari S. (1965). Theory of adaptive pattern classifiers. IEEE Trans. Elect. Comput. EC-16, 299–307.
Amari S. (1998). Natural gradient works efficiently in learning. Neural Computation 10, 251–276.
Amari S., Nagaoka H. (2000). Information geometry. AMS and Oxford University Press, New York.
Amari S., Ozeki T. (2001). Differential and algebraic geometry of multilayer perceptrons. IEICE Trans., E84-A, 31–38.
Amari S. (2003). New consideration on criteria of model selection. Neural Networks and Soft Computing (Proceedings of the Sixth International Conference on Neural Networks and Soft Computing), L. Rutkowski and J. Kacprzyk (eds.), 25–30.
Amari S., Ozeki T., Park H. (2003). Learning and inference in hierarchical models with singularities. Systems and Computers in Japan 34(7), 701–708.
Amari S., Park H., Fukumizu K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation 12, 1399–1409.
Amari S., Park H., Ozeki T. (2002). Geometrical singularities in the neuromanifold of multilayer perceptrons. Advances in Neural Information Processing Systems,T.G. Dietterich, S. Becker, and Z. Ghahramani (eds.) 14, 343–350.
Chen A.M., Liu H., Hecht-Nielsen R. (1993). On the geometry of feedforward neural network error surfaces. Neural Computation 5, 910–927.
Fukumizu K. (2003). Likelihood ratio of unidentifiable models and multilayer neural networks. The Annals of Statistics 31(3), 833–851.
Fukumizu K., Amari S. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks 13, 317–327.
Hartigan J.A. (1985). A failure of likelihood asymptotics for normal mixtures. Proc. Barkeley Conf. in Honor of J. Neyman and J. Kiefer 2, 807–810.
Inoue M., Park H., Okada M. (2003). On-line learning theory of soft committee machines with correlated hidden units-Steepest gradient descent and natural gradient descent-. J. Phys. Soc.Jpn 72(4), 805–810.
Kůrková V., Kainen P.C. (1994). Functionally equivalent feedforward neural networks. Neural Computation 6, 543–558.
Lin X., Shao Y., (2003). Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics 31(3), 807–832.
Park H., Amari S., Fukumizu K. (2000). Adaptive natural gradient learning algorithms for various stochastic models. Neural Networks 13, 755–764.
Park H., Inoue M., Okada M. (2003). Learning dynamics of multilayer perceptrons with unidentifiable parameters. J. Phys. A: Mathe. Gen. 36 (47), 11753–11764.
Rao C.R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society 37, 81–91.
Rattray M., Saad D., Amari S. (1998). Natural gradient descent for online learning. Physical Review Letters 81, 5461–5464.
Riegler P., Biehl M. (1995). On-line backpropagation in two-layered neural networks. J. Phys. A; Mathe. Gen. 28, L507–L513.
Rissanen J. (1978). Modelling by shortest data description. Automata 14, 465–471.
Rumelhart D.E., Hinton G.E., Williams R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group (eds.), Parallel distributed processing (Vol. 1, 318–362), Cambridge, MA:MIT Press.
Rüger S.M., Ossen A. (1995). The metric of weight space. Neural Processing Letters 5, 63–72.
Saad D., Solla A. (1995). On-line learning in soft committee machines. Phys. Rev. E 52, 4225–4243.
Schwarz G. (1978). Estimating the dimension of a model. The Annals of Statistics 6, 461–464.
Sussmann H.J. (1992). Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Networks 5, 589–593.
Watanabe S. (2001a). Algebraic analysis for non-identifiable learning machines. Neural Computation 13, 899–933.
Watanabe S. (2001b). Algebraic geometrical methods for hierarchical learning machines. Neural Networks 14(8), 1409–1060.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amari, Si., Park, H., Ozeki, T. (2004). Geometry of Learning in Multilayer Perceptrons. In: Antoch, J. (eds) COMPSTAT 2004 — Proceedings in Computational Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-2656-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2656-2_3
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1554-2
Online ISBN: 978-3-7908-2656-2
eBook Packages: Springer Book Archive