Adaptive Regularization in Neural Network Modeling

Larsen, Jan; Svarer, Claus; Andersen, Lars Nonboe; Hansen, Lars Kai

doi:10.1007/978-3-642-35289-8_8

Jan Larsen¹⁸,
Claus Svarer¹⁹,
Lars Nonboe Andersen¹⁸ &
…
Lars Kai Hansen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

65k Accesses
6 Citations

Abstract

In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [25]. The idea is to minimize an empirical estimate - like the cross-validation estimate - of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feed-forward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike, H.: Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21, 243–247 (1969)
Article MathSciNet MATH Google Scholar
Amari, S., Murata, N., Müller, K.R., Finke, M., Yang, H.: Asymptotic Statistical Theory of Overtraining and Cross-Validation. Technical report METR 95-06 and IEEE Transactions on Neural Networks 8(5), 985–996 (1995)
Article Google Scholar
Nonboe Andersen, L., Larsen, J., Hansen, L.K., Hintz-madsen, M.: Adaptive Regularization of Neural Classifiers. In: Principe, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, pp. 24–33. IEEE, Piscataway (1997)
Google Scholar
Bishop, C.M.: Curvature-Driven Smoothing: A Learning Algorithm for Feedforward Neural Networks. IEEE Transactions on Neural Networks 4(4), 882–884 (1993)
Article Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
MATH Google Scholar
Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Non-linear Equations. Prentice- Hall, Englewood Cliffs (1983)
MATH Google Scholar
Drucker, H., Le Cun, Y.: Improving Generalization Performance in Character Recognition. In: Juang, B.H., et al. (eds.) Neural Networks for Signal Processing: Proceedings of the 1991 IEEE-SP Workshop, pp. 198–207. IEEE, Piscataway (1991)
Chapter Google Scholar
Geisser, S.: The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association 50, 320–328 (1975)
Article MATH Google Scholar
Geman, S., Bienenstock, E., Doursat, R.: Neural Networks and the Bias/Variance Dilemma. Neural Computation 4, 1–58 (1992)
Article Google Scholar
Girosi, F., Jones, M., Poggio, T.: Regularization Theory and Neural Networks Architectures. Neural Computation 7(2), 219–269 (1995)
Article Google Scholar
Goutte, C., Larsen, J.: Adaptive Regularization of Neural Networks using Conjugate Gradient. In: Proceedings of ICASSP 1998, Seattle, USA, vol. 2, pp. 1201–1204 (1998)
Google Scholar
Goutte, C.: Note on Free Lunches and Cross-Validation. Neural Computation 9(6), 1211–1215 (1997)
Article Google Scholar
Goutte, C.: Regularization with a Pruning Prior. Neural Networks (1997) (to appear)
Google Scholar
Hansen, L.K., Rasmussen, C.E.: Pruning from Adaptive Regularization. Neural Computation 6, 1223–1232 (1994)
Article MATH Google Scholar
Hansen, L.K., Rasmussen, C.E., Svarer, C., Larsen, J.: Adaptive Regularization. In: Vlontzos, J., Hwang, J.-N., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 78–87. IEEE, Piscataway (1994)
Chapter Google Scholar
Hansen, L.K., Larsen, J.: Linear Unlearning for Cross-Validation. Advances in Computational Mathematics 5, 269–280 (1996)
Article MathSciNet MATH Google Scholar
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company, Redwood City (1991)
Google Scholar
Hintz-Madsen, M., With Pedersen, M., Hansen, L.K., Larsen, J.: Design and Evaluation of Neural Classifiers. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 223–232. IEEE, Piscataway (1996)
Google Scholar
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)
Article Google Scholar
Kearns, M.: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. Neural Computation 9(5), 1143–1161 (1997)
Article Google Scholar
Larsen, J.: A Generalization Error Estimate for Nonlinear Systems. In: Kung, S.Y., et al. (eds.) Proceedings of the 1992 IEEE-SP Workshop on Neural Networks for Signal Processing, vol. 2, pp. 29–38. IEEE, Piscataway (1992)
Chapter Google Scholar
Larsen, J.: Design of Neural Network Filters, Ph.D. Thesis, Electronics Institute, Technical University of Denmark (1993), ftp://eivind.imm.dtu.dk/dist/PhD_thesis/jlarsen.thesis.ps.Z
Larsen, J., Hansen, L.K.: Generalization Performance of Regularized Neural Network Models. In: Vlontzos, J., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pp. 42–51. IEEE, Piscataway (1994)
Chapter Google Scholar
Larsen, J., Hansen, L.K.: Empirical Generalization Assessment of Neural Network Models. In: Girosi, F., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing V, pp. 30–39. IEEE, Piscataway (1995)
Google Scholar
Larsen, J., Hansen, L.K., Svarer, C., Ohlsson, M.: Design and Regularization of Neural Networks: The Optimal Use of a Validation Set. In: Usui, S., Tohkura, Y., Katagiri, S., Wilson, E. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, pp. 62–71. IEEE, Piscataway (1996)
Google Scholar
Larsen, J., et al.: Optimal Data Set Split Ratio for Empirical Generalization Error Estimates (in preparation)
Google Scholar
Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal Brain Damage. In: Touretzky, D.S. (ed.) Proceedings of the 1989 Conference on Advances in Neural Information Processing Systemsshers, vol. 2, pp. 598–605. Morgan Kaufmann Publishers, San Mateo (1990)
Google Scholar
Lowe, D.: Adaptive Radial Basis Function Nonlinearities and the Problem of Generalisation. In: Proc. IEE Conf. on Artificial Neural Networks, pp. 171–175 (1989)
Google Scholar
Ljung, L.: System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs (1987)
MATH Google Scholar
MacKay, D.J.C.: A Practical Bayesian Framework for Backprop Networks. Neural Computation 4(3), 448–472 (1992)
Article Google Scholar
Moody, J.: Prediction Risk and Architecture Selection for Neural Networks. In: Cherkassky, V., et al. (eds.) From Statistics to Neural Networks: Theory and Pattern Recognition Applications, vol. 136. Springer-Verlag Series F, Berlin (1994)
Google Scholar
Moody, J., Rögnvaldsson, T.: Smoothing Regularizers for Projective Basis Function Networks. In: Proceedings of the 1996 Conference on Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)
Google Scholar
Murata, N., Yoshizawa, S., Amari, S.: Network Information Criterion — Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks 5(6), 865–872 (1994)
Article Google Scholar
Nowlan, S., Hinton, G.: Simplifying Neural Networks by Soft Weight Sharing. Neural Computation 4(4), 473–493 (1992)
Article Google Scholar
With Pedersen, M.: Training Recurrent Networks. In: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII. IEEE, Piscataway (1997)
Google Scholar
Peterson, G.E., Barney, H.L.: Control Methods Used in a Study of the Vowels. JASA 24, 175–184 (1952)
Article Google Scholar
Shadafan, R.S., Niranjan, M.: A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space. Neural Computation 6(6), 1202–1222 (1994)
Article MATH Google Scholar
Sjöberg, J.: Non-Linear System Identification with Neural Networks, Ph.D. Thesis no. 381, Department of Electrical Engineering, Linköping University, Sweden (1995)
Google Scholar
Stone, M.: Cross-validatory Choice and Assessment of Statistical Predictors. Journal of the Royal Statistical Society B 36(2), 111–147 (1974)
MATH Google Scholar
Svarer, C., Hansen, L.K., Larsen, J., Rasmussen, C.E.: Designer Networks for Time Series Processing. In: Kamm, C.A., et al. (eds.) Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, vol. 3, pp. 78–87. IEEE, Piscataway (1993)
Google Scholar
Watrous, R.L.: Current Status of PetersonBarney Vowel Formant Data. JASA 89, 2459–2460 (1991)
Article Google Scholar
Weigend, A.S., Huberman, B.A., Rumelhart, D.E.: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems 1(3), 193–209 (1990)
Article Google Scholar
Williams, P.M.: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation 7(1), 117–143 (1995)
Article Google Scholar
Wolpert, D.H., Macready, W.G.: The Mathematics of Search. Technical Report SFI-TR-95-02-010, Santa Fe Instute (1995)
Google Scholar
Wu, L., Moody, J.: A Smoothing Regularizer for Feedforward and Recurrent Neural Networks. Neural Computation 8(3) (1996)
Google Scholar
Zhu, H., Rohwer, R.: No Free Lunch for Cross Validation. Neural Computation 8(7), 1421–1426 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Modeling, Building 321, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jan Larsen, Lars Nonboe Andersen & Lars Kai Hansen
Neurobiology Research Unit Department of Neurology, Building 9201, Copenhagen University Hospital, Blegdamsvej 9, DK-2100, Copenhagen, Denmark
Claus Svarer

Authors

Jan Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Claus Svarer
View author publications
You can also search for this author in PubMed Google Scholar
Lars Nonboe Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kai Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Larsen, J., Svarer, C., Andersen, L.N., Hansen, L.K. (2012). Adaptive Regularization in Neural Network Modeling. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics