Advertisement

Information Theory Based Regularizing Methods

  • Gustavo Deco
  • Dragan Obradovic
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)

Abstract

One of the main requirements in modelling, i.e. neural network training, is to assure good generalization of the obtained model. Hence, the latter requirement has to be built into the training mechanism either by constantly monitoring the behavior of the trained network on an independent data set during learning or by appropriately modifying the cost function. Akaike’s and Rissanen’s methods for formulating cost functions which naturally include model complexity terms are presented in Chapter 7 while the problem of generalization over an infinite ensemble of networks is presented in Chapter 8.

Keywords

Hide Layer Mutual Information Hide Neuron Penalty Term Hide Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [10.1]
    G. Deco, W. Finnoff and H.G. Zimmermann: Unsupervised Mutual Information Criterion for Elimination of Overtraining in Supervised Multilayer Networks. Neural Computation, 7, 86–107, 1995.CrossRefGoogle Scholar
  2. [10.2]
    Y. Le Cun, J. Denker and S. Solla: Optimal Brain Damage. In Proceedings of the Neural Information Processing Systems, Denver, 598–605, 1990.Google Scholar
  3. [10.3]
    A. Weigend and D. Rumelhart: The Effective Dimension of the Space of Hidden Units. In Proceedings International Joint Conference on Neural Networks, Singapore, 1991.Google Scholar
  4. [10.4]
    G. Deco and J. Ebmeyer: Coarse Coding Resource-Allocating-Network. Neural Computation, 5, 105–114, 1993.CrossRefGoogle Scholar
  5. [10.5]
    S.E. Fahlman and C. Lebiere: The Cascade Correlation Learning Architecture. In Advances in Neural Information Processing 2, D.S. Touretzky ed., Morgan Kaufmann, San Mateo, CA, 1990.Google Scholar
  6. [10.6]
    W. Finnoff and H.G. Zimmermann: Detecting Structure in Small Data Sets by Network Fitting under Complexity Constrains. In Proceedings of 2nd Ann. Workshop Computational Learning Theory and Natural Learning Systems, Berkeley, 1991.Google Scholar
  7. [10.7]
    A. Weigend, D. Rumelhart and B. Huberman: Generalization by Weight Elimination with Application to Forecasting. In Advances in Neural Information Processing 3, R. P. Lippman, J. Moody and D.S. Touretzky eds., Morgan Kaufmann, San Mateo, CA, 1991.Google Scholar
  8. [10.8]
    J. Moody and C. Darken: Fast Learning in Networks of Locally-Tuned Processing Units. Neural Computation, 1, 281–294, 1989.CrossRefGoogle Scholar
  9. [10.9]
    V.N. Vapnik: Estimation of Dependencies Based on Empirical Data. Springer Verlag, New York, 1982.Google Scholar
  10. [10.10]
    V.N. Vapnik: Principles of Risk Minimization for Learning Theory. In Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Lippmann eds., 831–838, Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  11. [10.11]
    D. Pollard: Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1984.Google Scholar
  12. [10.12]
    S. Hanson and L. Pratt: Comparing Biases for Minimal Network Construction with Back-Propagation. In Advances in Neural Information Processing II, D.S. Touretzky, Ed., Morgan Kaufmann, New York, 533–541, 1989.Google Scholar
  13. [10.13]
    S. Nowlan and G. Hinton: Adaptive Soft Weight Tying using Gaussian Mixtures. Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Lippmann eds., 993–1000, Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  14. [10.14]
    D. MacKay: Bayesian Modelling and Neural Networks. Ph.D thesis, Computation and Neural Systems, California Institute of Technology, Pasadena, CA, 1991.Google Scholar
  15. [10.15]
    J. Moody: The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems. In Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Lippmann eds., 847–854, Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  16. [10.16]
    J. Bridle, D. MacKay and A. Heading: Unsupervised Classifiers, Mutual Information and ‘Phantom Targets’. In Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Lippmann eds., 1096–1101, Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  17. [10.17]
    R. Linsker: How to Generate Ordered Maps by Maximizing the Mutual Information Between Input and Output Signals. Neural Computation, 1, 402–411, 1991.CrossRefGoogle Scholar
  18. [10.18]
    A.N. Redlich: Redundancy Reduction as a Strategy for Unsupervised Learning. Neural Computation, 5, 289–304, 1993.CrossRefGoogle Scholar
  19. [10.19]
    D. MacKay: A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4, 448–472, 1992.CrossRefGoogle Scholar
  20. [10.20]
    J. H. Friedman: Multivariate Adaptive Regression Splines. Annals of Statistics, 19, 1–141, 1991.MathSciNetMATHCrossRefGoogle Scholar
  21. [10.21]
    S. Nowlan and G. Hinton: Simplifying Neural Networks by Soft Weight-Sharing. Neural Computation, 4, 473–493.Google Scholar
  22. [10.22]
    C. Peterson and B. Soederberg: A New Method for Mapping Optimization Problems Onto Neural Networks. Int. J. Neural Syst., 1, 68, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Gustavo Deco
    • 1
  • Dragan Obradovic
    • 1
  1. 1.Corporate Research and DevelopmentSiemens AGMunichGermany

Personalised recommendations