Skip to main content

The Evidence for Neural Networks

  • Chapter
  • 947 Accesses

Part of the book series: Fundamental Theories of Physics ((FTPH,volume 50))

Abstract

A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible: (1) objective comparisons between solutions using alternative network architectures; (2) objective stopping rules for network pruning or growing procedures; (3) objective choice of magnitude and type of weight decay terms or additive regularisers (for penalising large weights, etc.); (4) a measure of the effective number of well-determined parameters in a model; (5) quantified estimates of the error bars on network parameters and on network output; (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian ‘evidence’ automatically embodies ‘Occam’s razor,’ penalising over-flexible and over-complex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalisation ability and the Bayesian evidence is obtained.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Y.S. Abu-Mostafa (1990). ‘The Vapnik—Chervonenkis dimension: information versus complexity in learning’, Neural Computation, 13, 312–317.

    Google Scholar 

  • Y.S. Abu-Mostafa (1990). ‘Learning from hints in neural networks’, J. Complexity, 6, 192–198.

    Article  MathSciNet  MATH  Google Scholar 

  • C.M. Bishop (1991). ‘Exact calculation of the Hessian matrix for the multilayer perceptron’, submitted to Neural Computation.

    Google Scholar 

  • J.S. Denker and Y. Le Cun (1991). ‘Transforming neural-net output levels to probability distributions’, in Advances in neural information processing systems 3, ed. R.P. Lippmann et. al., 853–859, Morgan Kaufmann.

    Google Scholar 

  • S.F. Gull (1989). ‘Developments in Maximum entropy data analysis’, in J. Skilling, ed., 53–71.

    Google Scholar 

  • I. Guyon, V.N. Vapnik, B.E. Boser, L.Y. Bottou and S.A. Solla (1992). ‘Structural risk minimization for character recognition’, in Advances in neural information processing systems 4, ed. J.E. Moody, S.J. Hanson and R.P. Lippmann, Morgan Kaufmann.

    Google Scholar 

  • R. Hanson, J. Stutz and P. Cheeseman (1991). ‘Bayesian classification theory’, NASA Ames TR FIA–90–12–7–01.

    Google Scholar 

  • D. Haussier, M. Kearns and R. Schapire (1991). ‘Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension’, Preprint.

    Google Scholar 

  • G.E. Hinton and T.J. Sejnowski (1986). ‘Learning and relearning in Boltzmann machines’, in Parallel Distributed Processing, Rumelhart et. al., MIT Press.

    Google Scholar 

  • C. Ji, R.R. Snapp and D. Psaltis (1990). ‘Generalizing smoothness constraints from discrete samples’, Neural Computation, 22, 188–197.

    Article  Google Scholar 

  • Y. Le Cun, J.S. Denker and S.S. Solla (1990). ‘Optimal Brain Damage’, in Advances in neural information processing systems 2, ed. David S. Touretzky, 598–605, Morgan Kaufmann.

    Google Scholar 

  • W.T. Lee and M.F. Tenorio (1991). ‘On Optimal Adaptive Classifier Design Criterion - How many hidden units are necessary for an optimal neural network classifier?’, Purdue University TREE-91–5.

    Google Scholar 

  • E. Levin, N. Tishby and S. Solla (1989). ‘A statistical approach to learning and generalization in layered neural networks’, in COLT ‘89: 2nd workshop on computational learning theory, 245–260.

    Google Scholar 

  • D.J.C. MacKay (1992b). `A practical Bayesian framework for backprop networks’, Neural computation, 43.

    Google Scholar 

  • D.J.C. MacKay (1992c) ‘Bayesian interpolation’, this volume.

    Google Scholar 

  • D.J.C. MacKay (1992f) ‘The evidence framework applied to classification networks’, in preparation.

    Google Scholar 

  • J.E. Moody (1991). ‘Note on generalization, regularization and architecture selection in nonlinear learning systems’, in First IEEE-SP Workshop on neural networks for signal processing, IEEE Computer society press.

    Google Scholar 

  • S.J. Nowlan (1991). ‘Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures’, Carnegie Mellon University Doctoral thesis CS-91–126.

    Google Scholar 

  • W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling (1988). Numerical Recipes in C, Cambridge.

    Google Scholar 

  • D.E. Rumelhart, G.E. Hinton and R.J. Williams (1986). ‘Learning representations by back propagating errors’, Nature, 323, 533–536.

    Article  Google Scholar 

  • D.E. Rumelhart (1987). Cited in Ji et. al. (1990).

    Google Scholar 

  • J. Skilling, editor (1989). Maximum Entropy and Bayesian Methods, Cambridge 1988,Kluwer.

    Google Scholar 

  • J. Skilling (1989). ‘The eigenvalues of mega-dimensional matrices’, in J. Skilling, ed., 455–466.

    Google Scholar 

  • N. Tishby, E. Levin and S.A. Solla (1989). ‘Consistent inference of probabilities in layered networks: predictions and generalization’, in Proc. IJCNN, Washington.

    Google Scholar 

  • A.M. Walker (1967). ‘On the asymptotic behaviour of posterior distributions’, J. R. Stat. Soc. B, 31, 80–88.

    Google Scholar 

  • A.S. Weigend, D.E. Rumelhart and B.A. Huberman (1991). ‘Generalization by weight-elimination with applications to forecasting’, in Advances in neural information processing systems 3., ed. R.P. Lippmann et. al., 875–882, Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

MacKay, D.J.C. (1992). The Evidence for Neural Networks. In: Smith, C.R., Erickson, G.J., Neudorfer, P.O. (eds) Maximum Entropy and Bayesian Methods. Fundamental Theories of Physics, vol 50. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2219-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2219-3_12

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4220-0

  • Online ISBN: 978-94-017-2219-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics