# Supervised Neural Networks and Ensemble Methods

Chapter

First Online:

## Abstract

**What the reader should know to understand this chapter** \(\bullet \) Fundamentals of machine learning (Chap. 4). \(\bullet \) Statistics (Appendix A).

## Keywords

Hide Layer Activation Function Recognition Rate Hide Node Output Node
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

- 1.C. Bishop.
*Neural Networks for Pattern Recognition*. Oxford University Press, 1996.Google Scholar - 2.H. Bourlard and Y. Kamp. Auto-association by Multi-Layer Perceptron and Singular Value Decomposition.
*Biological Cybernetics*, 59:291–294, 1988.Google Scholar - 3.L. Breiman. Bagging predictors.
*Machine Learning*, 24(2):123–140, 1996.Google Scholar - 4.L. Breiman. Arcing classifiers.
*The Annals of Statistics*, 26(3):801–824, 1998.Google Scholar - 5.F. Camastra and A. Vinciarelli. Cursive character recognition by learning vector quantization.
*Pattern Recognition Letters*, 22(6–7):625–629, 2001.Google Scholar - 6.K.J. Cherkauer. Human expert-level performance on a scientific image analysis task by a system using combined Artificial Neural Networks. In
*Working Notes of the AAAI Workshop on Integrating Multiple Learned Models*, pages 15–21, 1996.Google Scholar - 7.R. Collobert, S. Bengio, and J. Mariethoz. Torch: a modular machine learning software library. Technical Report IDIAP-RR-02-46, IDIAP Research Institute, 2002.Google Scholar
- 8.K. Crammer, R. Gilad-Bachrach, A. Navot, and N. Tishby. Margin analysis of the LVQ algorithm. In
*Advances in Neural Information Processing Systems*, volume 14, pages 109–114, 2002.Google Scholar - 9.G. Cybenko. Approximation by superpositions of a sigmoidal function.
*Mathematics of Control, Signals and Systems*, 2:303–314, 1989.Google Scholar - 10.T. Dietterich. Ensemble methods in machine learning. In
*Proceedings of*\(1^{st}\)*International Workshop on Multiple Classifier Systems*, pages 1–15, 2000.Google Scholar - 11.T.G. Dietterich. Ensemble learning. In M. Arbib, editor,
*The handbook of brain theory and neural networks*. MIT Press, 2002.Google Scholar - 12.T.G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes.
*Journal of Artificial Intelligence Research*, 2:263–286, 1995.Google Scholar - 13.R. O. Duda, P. E. Hart, and D. G. Stork.
*Pattern Classification*. John Wiley, 2001.Google Scholar - 14.B. Efron and R. J. Tibshirani.
*An Introduction to the Bootstrap*. Chapman and Hall, 1993.Google Scholar - 15.Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In
*International Conference in Machine Learning*, pages 138–146, 1996.Google Scholar - 16.Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting.
*Journal of Computer and System Sciences*, 55(1):119–139, 1997.Google Scholar - 17.B. Hammer and T. Villmann. Generalized relevance learning vector quantization.
*Neural Networks*, 15(8–9):1059–1068, 2002.Google Scholar - 18.S. Haykin.
*Neural Networks: a comprehensive foundation*. Prentice-Hall, 1998.Google Scholar - 19.R. Hecht-Nielsen, editor.
*Neurocomputing*. Addison-Wesley, 1990.Google Scholar - 20.J. Hertz, A. Krogh, and R.G. Palmer, editors.
*Introduction to the Theory of Neural Computation*. Addison-Wesley, 1991.Google Scholar - 21.K. Hornik, M. Stinchcombe, and H. White. Multi-Layer feedforward networks are universal approximators.
*Neural Networks*, 2(5):359–366, 1989.Google Scholar - 22.A.K. Jain, J. Mao, and K.M. Mohiuddin. Artificial neural networks: a tutorial.
*IEEE Computer*, pages 31–44, 1996.Google Scholar - 23.J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. On combining classifiers.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 20(3):226–239, 1998.Google Scholar - 24.T. Kohonen.
*Self-Organizing Maps*. Springer-Verlag, 1997.Google Scholar - 25.T. Kohonen, J. Hynninen, J. Kangas, J. Laaksonen, and K. Torkkola. Lvq\_pak: the Learning Vector Quantization program package. Technical Report A30, Helsinki University of Technology - Laboratory of Computer and Information Science, 1996.Google Scholar
- 26.L. Kuncheva.
*Combining Pattern Classifiers*. Wiley-Interscience, 2004.Google Scholar - 27.J.L. McClelland, G.E. Hinton, and D.E. Rumelhart. A general framework for parallel distributed processing. In J.L. McClelland and Rumelhart, editors,
*Parallel Distributed Processing*, volume Vol. 1: Foundations, pages 45–76. MIT Press, 1986.Google Scholar - 28.J.L. McClelland, D.E. Rumelhart, and G.E. Hinton. The appeal of a parallel distributed processing. In J.L. McClelland and Rumelhart, editors,
*Parallel Distributed Processing*, volume Vol. 1: Foundations, pages 3–44. MIT Press, 1986.Google Scholar - 29.W.S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.
*Bulletin of Mathematical Biophysics*, 9:127–147, 1943.Google Scholar - 30.E. McDermott and S. Katagiri. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units.
*Computer Speech and Languages*, 8(4):351–368, 1994.Google Scholar - 31.D.A. Medler. A brief history of connectionism.
*Neural Computing Surveys*, 1:61–101, 1998.Google Scholar - 32.M.L. Minsky and S.A. Papert.
*Perceptrons*. MIT Press, 1969.Google Scholar - 33.T. Mitchell.
*Machine Learning*. McGraw-Hill, 1997.Google Scholar - 34.J. Nolte.
*The human brain: an introduction to its functional anatomy*. Mosby, 2002.Google Scholar - 35.M. Oja, S. Karski, and T. Kohonen. Bibliography of self-organizing map papers: 1998–2001 addendum.
*Neural Computing Surveys*, 3:1–156, 2002.Google Scholar - 36.D. Opitz and R. Maclin. Popular ensemble methods: an empirical study.
*Journal of Artificial Intelligence Research*, 11:169–198, 1999.Google Scholar - 37.B. Parmanto, P.W. Munro, and H.R. Doyle. Improving committee diagnosis with resampling techniques. In
*Advances in Neural Information Processing Systems*, volume 8, pages 882–888, 1996.Google Scholar - 38.M.P. Perrone and L.N. Cooper. When networks disagree: ensemble methods for hybrid neural networks. In R.J. Mammone, editor,
*Artificial Neural Networks for speech and vision*, pages 126–142. Chapman & Hall, 1993.Google Scholar - 39.F. Rosenblatt.
*Principles of neurodynamics: perceptrons and the theory of brain mechanisms*. Spartan, 1961.Google Scholar - 40.D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal representations by error propagation. In J.L. McClelland and Rumelhart, editors,
*Parallel Distributed Processing*, volume Vol. 1: Foundations, pages 318–362. MIT Press, 1986.Google Scholar - 41.D.E. Rumelhart, G.E. Hinton, and R.J. Williams. A decision-theoretic generalization of on-line learning and an application to boosting.
*Journal of Computer and System Sciences*, 55(1):119–139, 1997.Google Scholar - 42.D.E. Rumelhart and J.L. McClelland, editors.
*Parallel Distributed Processing*. MIT Press, 1986.Google Scholar - 43.A. S. Sato and K. Yamada. Generalized learning vector quantization. In
*Advances in Neural Information Processing Systems*, volume 7, pages 423–429, 1995.Google Scholar - 44.R. E. Schapire. The strength of weak learnability.
*Machine Learning*, 5(2):197–227, 1990.Google Scholar - 45.K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers.
*Connection Science*, 8(3–4):385–404, 1996.Google Scholar - 46.P. J. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Technical report, Harvard University, Ph.D. Dissertation, 1974.Google Scholar
- 47.B. Widrow and M.E. Hoff. Adaptive switching circuits. In
*Convention Record of the Institute of Radio Engineers, Western Electronic Show and Convention*, pages 96–104. Institute for Radio Engineers, 1960.Google Scholar

## Copyright information

© Springer-Verlag London 2015