Abstract
Averaging over many predictors leads to a reduction of the variance portion of the error. We present a method for evaluating the mean squared error of an infinite ensemble of predictors from finite (small size) ensemble information. We demonstrate it on ensembles of networks with different initial choices of synaptic weights. We find that the optimal stopping criterion for large ensembles occurs later in training time than for single networks. We test our method on the suspots data set and obtain excellent results.
Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elman, J.L., Zipser, D.: Learning the Hidden Structure of Speech. J. Acoust. Soc. Amer. 83, 1615–1626 (1988)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comp. 4(1), 1–58 (1992)
Lincoln, W.P., Skrzypek, J.: Synergy of clustering multiple back propagation networks. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, vol. 2, pp. 650–657. Morgan Kaufmann, SanMateo (1990)
Morris, J.: Forecasting the sunspot cycle. J. Roy. Stat. Soc. Ser. A 140, 437–447 (1977)
Naftaly, U., Intrator, N., Horn, D.: Optimal Ensemble Averaging of Neural Networks. Network, Comp. Neural Sys. 8, 283–296 (1997)
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Computation 4, 473–493 (1992)
Perrone, P.M.: Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization. PhD thesis, Brown University, Institute for Brain and Neural Systems (1993)
Pi, H., Peterson, C.: Finding the Embedding Dimension and Variable Dependencies in Time Series. Neural Comp. 6, 509–520 (1994)
Priestley, M.B.: Spectral Analysis and Time Series. Academic Press (1981)
Weigend, A.S., Huberman, B.A., Rumelhart, D.: Predicting the future: A connectionist approach. Int. J. Neural Syst. 1, 193–209 (1990)
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Horn, D., Naftaly, U., Intrator, N. (2012). Large Ensemble Averaging. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-35289-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)