Skip to main content

Adaptive Regularization in Neural Network Modeling

  • Chapter
Neural Networks: Tricks of the Trade

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

Abstract

In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akaike, H.: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21, 243–247 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amari, S., Murata, N., Müller, K.R., Finke, M., Yang, H.: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 and IEEE Transactions on Neural Networks 8(5), 985–996 (1995)

    Article  Google Scholar 

  3. Nonboe Andersen, L., Larsen, J., Hansen, L.K., Hintz-madsen, M.: Adaptive Regularization of Neural Classifiers. In: Principe, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, pp. 24–33. IEEE, Piscataway (1997)

    Google Scholar 

  4. Bishop, C.M.: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4), 882–884 (1993)

    Article  Google Scholar 

  5. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  6. Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Prentice- Hall, Englewood Cliffs (1983)

    MATH  Google Scholar 

  7. Drucker, H., Le Cun, Y.: Improving Generalization Performance in Character Recognition. In: Juang, B.H., et al. (eds.) Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SP Workshop, pp. 198–207. IEEE, Piscataway (1991)

    Chapter  Google Scholar 

  8. Geisser, S.: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association 50, 320–328 (1975)

    Article  MATH  Google Scholar 

  9. Geman, S., Bienenstock, E., Doursat, R.: Neural Networks and the Bias/Variance Dilemma. Neural Computation 4, 1–58 (1992)

    Article  Google Scholar 

  10. Girosi, F., Jones, M., Poggio, T.: Regularization Theory and Neural Networks Architectures. Neural Computation 7(2), 219–269 (1995)

    Article  Google Scholar 

  11. Goutte, C., Larsen, J.: Adaptive Regularization of Neural Networks using Conjugate Gradient. In: Proceedings of ICASSP 1998, Seattle, USA, vol. 2, pp. 1201–1204 (1998)

    Google Scholar 

  12. Goutte, C.: Note on Free Lunches and Cross-Validation. Neural Computation 9(6), 1211–1215 (1997)

    Article  Google Scholar 

  13. Goutte, C.: Regularization with a Pruning Prior. Neural Networks (1997) (to appear)

    Google Scholar 

  14. Hansen, L.K., Rasmussen, C.E.: Pruning from Adaptive Regularization. Neural Computation 6, 1223–1232 (1994)

    Article  MATH  Google Scholar 

  15. Hansen, L.K., Rasmussen, C.E., Svarer, C., Larsen, J.: Adaptive Regularization. In: Vlontzos, J., Hwang, J.-N., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 78–87. IEEE, Piscataway (1994)

    Chapter  Google Scholar 

  16. Hansen, L.K., Larsen, J.: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5, 269–280 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  17. Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company, Redwood City (1991)

    Google Scholar 

  18. Hintz-Madsen, M., With Pedersen, M., Hansen, L.K., Larsen, J.: Design and Evaluation of Neural Classifiers. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 223–232. IEEE, Piscataway (1996)

    Google Scholar 

  19. Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)

    Article  Google Scholar 

  20. Kearns, M.: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5), 1143–1161 (1997)

    Article  Google Scholar 

  21. Larsen, J.: A Generalization Error Estimate for Nonlinear Systems. In: Kung, S.Y., et al. (eds.) Proceedings of the 1992 IEEE-SP Workshop on Neural Networks for Signal Processing, vol. 2, pp. 29–38. IEEE, Piscataway (1992)

    Chapter  Google Scholar 

  22. Larsen, J.: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993), ftp://eivind.imm.dtu.dk/dist/PhD_thesis/jlarsen.thesis.ps.Z

  23. Larsen, J., Hansen, L.K.: Generalization Performance of Regularized Neural Network Models. In: Vlontzos, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 42–51. IEEE, Piscataway (1994)

    Chapter  Google Scholar 

  24. Larsen, J., Hansen, L.K.: Empirical Generalization Assessment of Neural Network Models. In: Girosi, F., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, pp. 30–39. IEEE, Piscataway (1995)

    Google Scholar 

  25. Larsen, J., Hansen, L.K., Svarer, C., Ohlsson, M.: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 62–71. IEEE, Piscataway (1996)

    Google Scholar 

  26. Larsen, J., et al.: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates (in preparation)

    Google Scholar 

  27. Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal Brain Damage. In: Touretzky, D.S. (ed.) Proceedings of the 1989 Conference on Advances in Neural Information Processing Systemsshers, vol. 2, pp. 598–605. Morgan Kaufmann Publishers, San Mateo (1990)

    Google Scholar 

  28. Lowe, D.: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. In: Proc. IEE Conf. on Artificial Neural Networks, pp. 171–175 (1989)

    Google Scholar 

  29. Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1987)

    MATH  Google Scholar 

  30. MacKay, D.J.C.: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3), 448–472 (1992)

    Article  Google Scholar 

  31. Moody, J.: Prediction Risk and Architecture Selection for Neural Networks. In: Cherkassky, V., et al. (eds.) From Statistics to Neural Networks: Theory and Pattern Recognition Applications, vol. 136. Springer-Verlag Series F, Berlin (1994)

    Google Scholar 

  32. Moody, J., Rögnvaldsson, T.: Smoothing Regularizers for Projective Basis Function Networks. In: Proceedings of the 1996 Conference on Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)

    Google Scholar 

  33. Murata, N., Yoshizawa, S., Amari, S.: Network Information Criterion — Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6), 865–872 (1994)

    Article  Google Scholar 

  34. Nowlan, S., Hinton, G.: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4), 473–493 (1992)

    Article  Google Scholar 

  35. With Pedersen, M.: Training Recurrent Networks. In: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII. IEEE, Piscataway (1997)

    Google Scholar 

  36. Peterson, G.E., Barney, H.L.: Control Methods Used in a Study of the Vowels. JASA 24, 175–184 (1952)

    Article  Google Scholar 

  37. Shadafan, R.S., Niranjan, M.: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6), 1202–1222 (1994)

    Article  MATH  Google Scholar 

  38. Sjöberg, J.: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linköping University, Sweden (1995)

    Google Scholar 

  39. Stone, M.: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2), 111–147 (1974)

    MATH  Google Scholar 

  40. Svarer, C., Hansen, L.K., Larsen, J., Rasmussen, C.E.: Designer Networks for Time Series Processing. In: Kamm, C.A., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, vol. 3, pp. 78–87. IEEE, Piscataway (1993)

    Google Scholar 

  41. Watrous, R.L.: Current Status of PetersonBarney Vowel Formant Data. JASA 89, 2459–2460 (1991)

    Article  Google Scholar 

  42. Weigend, A.S., Huberman, B.A., Rumelhart, D.E.: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3), 193–209 (1990)

    Article  Google Scholar 

  43. Williams, P.M.: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1), 117–143 (1995)

    Article  Google Scholar 

  44. Wolpert, D.H., Macready, W.G.: The Mathematics of Search. Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)

    Google Scholar 

  45. Wu, L., Moody, J.: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) (1996)

    Google Scholar 

  46. Zhu, H., Rohwer, R.: No Free Lunch for Cross Validation. Neural Computation 8(7), 1421–1426 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (2012). Adaptive Regularization in Neural Network Modeling. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35289-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35288-1

  • Online ISBN: 978-3-642-35289-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics