The Evidence for Neural Networks

MacKay, David J. C.

doi:10.1007/978-94-017-2219-3_12

The Evidence for Neural Networks

David J. C. MacKay⁴^nAff5

Chapter

947 Accesses

Part of the book series: Fundamental Theories of Physics ((FTPH,volume 50))

Abstract

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures; (2) objective stopping rules for network pruning or growing procedures; (3) objective choice of magnitude and type of weight decay terms or additive regularisers (for penalising large weights, etc.); (4) a measure of the effective number of well-determined parameters in a model; (5) quantified estimates of the error bars on network parameters and on network output; (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian ‘evidence’ automatically embodies ‘Occam’s razor,’ penalising over-flexible and over-complex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalisation ability and the Bayesian evidence is obtained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y.S. Abu-Mostafa (1990). ‘The Vapnik—Chervonenkis dimension: information versus complexity in learning’, Neural Computation, 13, 312–317.
Google Scholar
Y.S. Abu-Mostafa (1990). ‘Learning from hints in neural networks’, J. Complexity, 6, 192–198.
Article MathSciNet MATH Google Scholar
C.M. Bishop (1991). ‘Exact calculation of the Hessian matrix for the multilayer perceptron’, submitted to Neural Computation.
Google Scholar
J.S. Denker and Y. Le Cun (1991). ‘Transforming neural-net output levels to probability distributions’, in Advances in neural information processing systems 3, ed. R.P. Lippmann et. al., 853–859, Morgan Kaufmann.
Google Scholar
S.F. Gull (1989). ‘Developments in Maximum entropy data analysis’, in J. Skilling, ed., 53–71.
Google Scholar
I. Guyon, V.N. Vapnik, B.E. Boser, L.Y. Bottou and S.A. Solla (1992). ‘Structural risk minimization for character recognition’, in Advances in neural information processing systems 4, ed. J.E. Moody, S.J. Hanson and R.P. Lippmann, Morgan Kaufmann.
Google Scholar
R. Hanson, J. Stutz and P. Cheeseman (1991). ‘Bayesian classification theory’, NASA Ames TR FIA–90–12–7–01.
Google Scholar
D. Haussier, M. Kearns and R. Schapire (1991). ‘Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension’, Preprint.
Google Scholar
G.E. Hinton and T.J. Sejnowski (1986). ‘Learning and relearning in Boltzmann machines’, in Parallel Distributed Processing, Rumelhart et. al., MIT Press.
Google Scholar
C. Ji, R.R. Snapp and D. Psaltis (1990). ‘Generalizing smoothness constraints from discrete samples’, Neural Computation, 22, 188–197.
Article Google Scholar
Y. Le Cun, J.S. Denker and S.S. Solla (1990). ‘Optimal Brain Damage’, in Advances in neural information processing systems 2, ed. David S. Touretzky, 598–605, Morgan Kaufmann.
Google Scholar
W.T. Lee and M.F. Tenorio (1991). ‘On Optimal Adaptive Classifier Design Criterion - How many hidden units are necessary for an optimal neural network classifier?’, Purdue University TREE-91–5.
Google Scholar
E. Levin, N. Tishby and S. Solla (1989). ‘A statistical approach to learning and generalization in layered neural networks’, in COLT ‘89: 2nd workshop on computational learning theory, 245–260.
Google Scholar
D.J.C. MacKay (1992b). `A practical Bayesian framework for backprop networks’, Neural computation, 43.
Google Scholar
D.J.C. MacKay (1992c) ‘Bayesian interpolation’, this volume.
Google Scholar
D.J.C. MacKay (1992f) ‘The evidence framework applied to classification networks’, in preparation.
Google Scholar
J.E. Moody (1991). ‘Note on generalization, regularization and architecture selection in nonlinear learning systems’, in First IEEE-SP Workshop on neural networks for signal processing, IEEE Computer society press.
Google Scholar
S.J. Nowlan (1991). ‘Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures’, Carnegie Mellon University Doctoral thesis CS-91–126.
Google Scholar
W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling (1988). Numerical Recipes in C, Cambridge.
Google Scholar
D.E. Rumelhart, G.E. Hinton and R.J. Williams (1986). ‘Learning representations by back propagating errors’, Nature, 323, 533–536.
Article Google Scholar
D.E. Rumelhart (1987). Cited in Ji et. al. (1990).
Google Scholar
J. Skilling, editor (1989). Maximum Entropy and Bayesian Methods, Cambridge 1988,Kluwer.
Google Scholar
J. Skilling (1989). ‘The eigenvalues of mega-dimensional matrices’, in J. Skilling, ed., 455–466.
Google Scholar
N. Tishby, E. Levin and S.A. Solla (1989). ‘Consistent inference of probabilities in layered networks: predictions and generalization’, in Proc. IJCNN, Washington.
Google Scholar
A.M. Walker (1967). ‘On the asymptotic behaviour of posterior distributions’, J. R. Stat. Soc. B, 31, 80–88.
Google Scholar
A.S. Weigend, D.E. Rumelhart and B.A. Huberman (1991). ‘Generalization by weight-elimination with applications to forecasting’, in Advances in neural information processing systems 3., ed. R.P. Lippmann et. al., 875–882, Morgan Kaufmann.
Google Scholar

Download references

Author information

David J. C. MacKay
Present address: Cavendish laboratory, Madingley Road, Cambridge, CB3 OHE, UK

Authors and Affiliations

Computation and Neural Systems, California Institute of Technology, 139-74, Pasadena, California, 91125, USA
David J. C. MacKay

Authors

David J. C. MacKay
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research, Development and Engineering Center, U.S. Missile Command, Redstone, Alabama, USA
C. Ray Smith
Department of Electrical Engineering, Seattle University, Seattle, Washington, USA
Gary J. Erickson & Paul O. Neudorfer &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

MacKay, D.J.C. (1992). The Evidence for Neural Networks. In: Smith, C.R., Erickson, G.J., Neudorfer, P.O. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 50. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2219-3_12

Download citation

DOI: https://doi.org/10.1007/978-94-017-2219-3_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4220-0
Online ISBN: 978-94-017-2219-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics