How to Train Neural Networks

Neuneier, Ralph; Zimmermann, Hans Georg

doi:10.1007/978-3-642-35289-8_23

Ralph Neuneier¹⁸ &
Hans Georg Zimmermann¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

65k Accesses
4 Citations

Abstract

The purpose of this paper is to give a guidance in neural network modeling. Starting with the preprocessing of the data, we discuss different types of network architecture and show how these can be combined effectively. We analyze several cost functions to avoid unstable learning due to outliers and heteroscedasticity. The Observer - Observation Dilemma is solved by forcing the network to construct smooth approximation functions. Furthermore, we propose some pruning algorithms to optimize the network architecture. All these features and techniques are linked up to a complete and consistent training procedure (see figure 17.25 for an overview), such that the synergy of the methods is maximized.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985); Reprinted in [2]
Google Scholar
Anderson, J.A., Rosenfeld, E. (eds.): Neurocomputing: Foundations of Research. The MIT Press, Cambridge (1988)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
MATH Google Scholar
Breiman, L.: Bagging predictors. Technical Report TR No. 421, Department of Statistics, University of California (1994)
Google Scholar
Bunke, H., Bunke, O.: Nonlinear Regression, Functional Analysis and Robust Methods, vol. 2. John Wiley and Sons (1989)
Google Scholar
Caruana, R.: Multitask learning. Machine Learning 28, 41 (1997)
Article Google Scholar
Elton, E.J., Gruber, M.J.: Modern Portfolio Theory and Investment Analysis. John Wiley & Sons (1995)
Google Scholar
Finnoff, W., Hergert, F., Zimmermann, H.G.: Improving generalization performance by nonconvergent model selection methods. In: Aleksander, I., Taylor, J. (eds.) Proc. of the Inter. Conference on Artificial Neural Networks, ICANN 1992, vol. 2, pp. 233–236 (1992)
Google Scholar
Finnoff, W., Hergert, F., Zimmermann, H.G.: Neuronale Lernverfahren mit variabler Schrittweite, Tech. report, Siemens AG (1993)
Google Scholar
Flake, G.W.: Square Unit Augmented, Radially Extended, Multilayer Perceptrons. In: Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 1st edn. LNCS, vol. 7700, pp. 143–161. Springer, Heidelberg (2012)
Google Scholar
Gershenfeld, N.A.: An experimentalist’s introduction to the observation of dynamical systems. In: Hao, B.L. (ed.) Directions in Chaos, vol. 2, pp. 310–384. World Scientific, Singapore (1989)
Google Scholar
Herve, P., Naim, P., Zimmermann, H.G.: Advanced Adaptive Architectures for Asset Allocation: A Trial Application. In: Forecasting Financial Markets (1996)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Computation 9(1), 1–42 (1997)
Article MATH Google Scholar
Hornik, K.: Approximation Capabilities of Multilayer Feedforward Networks. Neural Networks 4, 251–257 (1991)
Article Google Scholar
le Cun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, NIPS 1989, vol. 2, pp. 598–605. Morgan Kaufmann, San Mateo (1990)
Google Scholar
Moody, J.E., Rögnvaldsson, T.S.: Smoothing regularizers for projective basis function networks. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, p. 585. The MIT Press (1997)
Google Scholar
Williams, P.M.: Using Neural Networks to Model Conditional Multivariate Densities. Technical Report CSRP 371, School of Cognitive and Computing Sciences, Univ. of Sussex (February 1995)
Google Scholar
Neuneier, R.: Optimal asset allocation using adaptive dynamic programming. In: Advances in Neural Information Processing Systems, vol. 8. MIT Press (1996)
Google Scholar
Neuneier, R.: Optimale Investitionsentscheidungen mit Neuronalen Netzen. PhD thesis, Universität Kaiserslautern, Institut für Informatik (1998)
Google Scholar
Neuneier, R., Finnoff, W., Hergert, F., Ormoneit, D.: Estimation of Conditional Densities: A Comparison of Neural Network Approaches. In: Intern. Conf. on Artificial Neural Networks, ICANN, vol. 1, pp. 689–692. Springer (1994)
Google Scholar
Nix, D.A., Weigend, A.S.: Estimating the mean and variance of the target probability distribution. In: World Congress of Neural Networks. Lawrence Erlbaum Associates (1994)
Google Scholar
Ormoneit, D.: Estimation of Probability Densities using Neural Networks. Master’s thesis, Fakultät für Informatik, Technische Universität München (1993)
Google Scholar
Papoulis, A.: Probability, Random Variables, and Stochastic Processes, 3rd edn. McGraw Hill, Inc. (1991)
Google Scholar
Perrone, M.P.: Improving Regression Estimates: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. PhD thesis, Brown University (1993)
Google Scholar
Refenes, A.P. (ed.): Neural Networks in the Capital Market. Wiley & Sons (1994)
Google Scholar
Sanger, T.D.: Optimal unsupervised learning in a single-layer linear feedforward network. Neural Networks 2, 459–473 (1989)
Article Google Scholar
Seber, G.A.F., Wild, C.J.: Nonlinear Regression. John Wiley & Sons, New York (1989)
Book MATH Google Scholar
Srivastava, A.N., Weigend, A.S.: Computing the probability density in connectionist regression. In: Marinaro, M., Morasso, P.G. (eds.) Proceedings of the International Conference on Artificial Neural Networks, Sorrento, Italy (ICANN 1994), pp. 685–688. Springer (1994); Also in Proceedings of the IEEE International Conference on Neural Networks, Orlando, FL (IEEE–ICNN 1994), pp. 3786–3789. IEEE Press (1994)
Google Scholar
Takens, F.: Detecting Strange Attractors in Turbulence. In: Rand, D.A., Young, L.S. (eds.) Dynamical Systems and Turbulence. Lecture Notes in Mathematics, vol. 898, pp. 366–381. Springer (1981)
Google Scholar
Tang, B., Hsieh, W., Tangang, F.: Clearning neural networks with continuity constraints for prediction of noisy time series. In: Progres in Neural Information Processing (ICONIP 1996), pp. 722–725. Springer, Berlin (1996)
Google Scholar
Tresp, V., Neuneier, R., Zimmermann, H.G.: Early brain damage. In: Advances in Neural Information Processing Systems, vol. 9. MIT Press (1997)
Google Scholar
Weigend, A.S., Zimmermann, H.G.: Exploiting local relations as soft constraints to improve forecasting. Computational Intelligence in Finance 6(1) (January 1998)
Google Scholar
Weigend, A.S., Zimmermann, H.G., Neuneier, R.: The observer-observation dilemma in neuro-forecasting: Reliable models from unreliable data through clearning. In: Freedman, R. (ed.) AI Applications on Wall Street, pp. 308–317. Software Engineering Press, New York (1995)
Google Scholar
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 875–882. Morgan Kaufmann, San Mateo (1991)
Google Scholar
White, H.: Parametrical statistical estimation with artificial neural networks. Technical report, University of California, San Diego (1991)
Google Scholar
Zimmermann, H.G., Weigend, A.S.: Representing dynamical systems in feed-forward networks: A six layer architecture. In: Weigend, A.S., Abu-Mostafa, Y., Refenes, A.-P.N. (eds.) Decision Technologies for Financial Engineering: Proceedings of the Fourth International Conference on Neural Networks in the Capital Markets (NNCM 1996). World Scientific, Singapore (1997)
Google Scholar
Zimmermann, H.G.: Neuronale Netze als Entscheidungskalkül. In: Rehkugler, H., Zimmermann, H.G. (eds.) Neuronale Netze in der Ökonomie. Verlag Franz Vahlen (1994)
Google Scholar
Zimmermann, H.G., Neuneier, R.: The observer-observation dilemma in neuro-forecasting. In: Advances in Neural Information Processing Systems, vol. 10. MIT Press (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Siemens AG, Corporate Technology, D-81730, München, Germany
Ralph Neuneier & Hans Georg Zimmermann

Authors

Ralph Neuneier
View author publications
You can also search for this author in PubMed Google Scholar
Hans Georg Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Neuneier, R., Zimmermann, H.G. (2012). How to Train Neural Networks. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics