Ensemble Learning

Polikar, Robi

doi:10.1007/978-1-4419-9326-7_1

Robi Polikar³

19k Accesses

Abstract

Over the last couple of decades, multiple classifier systems, also called ensemble systems have enjoyed growing attention within the computational intelligence and machine learning community. This attention has been well deserved, as ensemble systems have proven themselves to be very effective and extremely versatile in a broad spectrum of problem domains and real-world applications. Originally developed to reduce the variance—thereby improving the accuracy—of an automated decision-making system, ensemble systems have since been successfully used to address a variety of machine learning problems, such as feature selection, confidence estimation, missing feature, incremental learning, error correction, class-imbalanced data, learning concept drift from nonstationary distributions, among others. This chapter provides an overview of ensemble systems, their properties, and how they can be applied to such a wide spectrum of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

B. V. Dasarathy and B. V. Sheela, “Composite classifier system design: concepts and methodology,” Proceedings of the IEEE, vol. 67, no. 5, pp. 708–713, 1979
Article Google Scholar
L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993–1001, 1990
Article Google Scholar
R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, June 1990
Article Google Scholar
Y. Freund and R. E. Schapire, “Decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997
Article MathSciNet MATH Google Scholar
L. I. Kuncheva, Combining pattern classifiers, methods and algorithms. New York, NY: Wiley Interscience, 2005
MATH Google Scholar
L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996
Article MATH Google Scholar
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991
Article Google Scholar
M. J. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, no. 2, pp. 181–214, 1994
Article Google Scholar
D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992
Article Google Scholar
J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 4, pp. 688–704, 1992
Article MATH Google Scholar
L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992
Article Google Scholar
T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, 1994
Article Google Scholar
G. Rogova, “Combining the results of several neural network classifiers,” Neural Networks, vol. 7, no. 5, pp. 777–781, 1994
Article Google Scholar
L. Lam and C. Y. Suen, “Optimal combinations of pattern classifiers,” Pattern Recognition Letters, vol. 16, no. 9, pp. 945–954, 1995
Article Google Scholar
K. Woods, W. P. J. Kegelmeyer, and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405–410, 1997
Article Google Scholar
I. Bloch, “Information combination operators for data fusion: A comparative review with classification,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 26, no. 1, pp. 52–67, 1996
Article Google Scholar
S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral for robust classification,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 2, pp. 380–384, 1995
Article Google Scholar
L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001
Article MATH Google Scholar
H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, “Boosting and other ensemble methods,” Neural Computation, vol. 6, no. 6, pp. 1289–1301, 1994
Article MATH Google Scholar
L. I. Kuncheva, “Classifier ensembles for changing environments,” 5th International Workshop on Multiple Classifier Systems in Lecture Notes in Computer Science, eds. F. Roli, J. Kittler, and T. Windeatt, vol. 3077, pp. 1–15, Cagliari, Italy, 2004
Google Scholar
L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 32, no. 2, pp. 146–156, 2002
Article Google Scholar
E. Alpaydin and M. I. Jordan, “Local linear perceptrons for classification,” IEEE Transactions on Neural Networks, vol. 7, no. 3, pp. 788–792, 1996
Article Google Scholar
G. Giacinto and F. Roli, “Approach to the automatic design of multiple classifier systems,” Pattern Recognition Letters, vol. 22, no. 1, pp. 25–33, 2001
Article MATH Google Scholar
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001
Article MATH Google Scholar
L. Breiman, “Arcing classifiers,” Annals of Statistics, vol. 26, no. 3, pp. 801–849, 1998
MathSciNet MATH Google Scholar
F. M. Alkoot and J. Kittler, “Experimental evaluation of expert fusion strategies,” Pattern Recognition Letters, vol. 20, no. 11–13, pp. 1361–1369, Nov. 1999
Article Google Scholar
J. Kittler, M. Hatef, R. P. W. Duin, and J. Mates, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998
Article Google Scholar
M. Muhlbaier, A. Topalis, and R. Polikar, “Ensemble confidence estimates posterior probability,” 6th Int. Workshop on Multiple Classifier Systems, Lecture Notes on Computer Science, eds. N. C. Oza, R. Polikar, J. Kittler, and F. Roli, Eds., vol. 3541, pp. 326–335, Monterey, CA, 2005
Google Scholar
L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281–286, 2002
Article Google Scholar
Y. Lu, “Knowledge integration in a multiple classifier system,” Applied Intelligence, vol. 6, no. 2, pp. 75–86, 1996
Article Google Scholar
D. M. J. Tax, M. van Breukelen, R. P. W. Duin, and J. Kittler, “Combining multiple classifiers by averaging or by multiplying?” Pattern Recognition, vol. 33, no. 9, pp. 1475–1485, 2000
Article Google Scholar
G. Brown, “Diversity in neural network ensembles.” PhD, University of Birmingham, UK, 2004
Google Scholar
G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: a survey and categorisation,” Information Fusion, vol. 6, no. 1, pp. 5–20, 2005
Article Google Scholar
A. Chandra and X. Yao, “Evolving hybrid ensembles of learning machines for better generalisation,” Neurocomputing, vol. 69, no. 7–9, pp. 686–700, Mar. 2006
Article Google Scholar
Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, vol. 12, no. 10, pp. 1399–1404, 1999
Article Google Scholar
T. K. Ho, “Random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998
Article Google Scholar
R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “Ensemble diversity measures and their application to thinning,” Information Fusion, vol. 6, no. 1, pp. 49–62, 2005
Article Google Scholar
L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181–207, 2003
Article MATH Google Scholar
L. I. Kuncheva, That elusive diversity in classifier ensembles,” Pattern Recognition and Image Analysis, Lecture Notes in Computer Science, vol. 2652, 2003, pp. 1126–1138
Google Scholar
N. Littlestone and M. Warmuth, “Weighted majority algorithm,” Information and Computation, vol. 108, pp. 212–261, 1994
Article MathSciNet MATH Google Scholar
R. O. Duda, P. E. Hart, and D. Stork, “Algorithm independent techniques,” in Pattern classification, 2 edn New York: Wiley, 2001, pp. 453–516
Google Scholar
L. Breiman, “Pasting small votes for classification in large databases and on-line,” Machine Learning, vol. 36, no. 1–2, pp. 85–103, 1999
Article Google Scholar
M. I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures,” Neural Networks, vol. 8, no. 9, pp. 1409–1431, 1995
Article Google Scholar
R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn\(++\): An incremental learning algorithm for supervised neural networks,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 31, no. 4, pp. 497–508, 2001
Google Scholar
H. S. Mohammed, J. Leander, M. Marbach, Polikar, and R. Polikar, “Can AdaBoost.M1 learn incrementally? A comparison to Learn\(++\) under different combination rules,” International Conference on Artificial Neural Networks (ICANN2006) in Lecture Notes in Computer Science, vol. 4131, pp. 254–263, Springer, 2006
Google Scholar
M. D. Muhlbaier, A. Topalis, and R. Polikar, “Learn\(++\).NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 152–168, 2009
Google Scholar
D. Parikh and R. Polikar, “An ensemble-based incremental learning approach to data fusion,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 437–450, 2007
Article Google Scholar
H. Altincay and M. Demirekler, “Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence,” Speech Communication, vol. 41, no. 4, pp. 531–547, 2003
Article Google Scholar
Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, “Combining multiple classifiers using dempster’s rule of combination for text categorization,” First International Conference, MDAI 2004, Aug 2–4 2004 in Lecture Notes in Artificial Intelligence, vol. 3131, Barcelona, Spain, pp. 127–138, 2004
Google Scholar
T. Denoeux, “Neural network classifier based on Dempster-Shafer theory,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 30, no. 2, pp. 131–150, 2000
Article Google Scholar
G. A. Carpenter, S. Martens, and O. J. Ogas, “Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks,” Neural Networks, vol. 18, no. 3, pp. 287–295, 2005
Article Google Scholar
B. F. Buxton, W. B. Langdon, and S. J. Barrett, “Data fusion by intelligent classifier combination,” Measurement and Control, vol. 34, no. 8, pp. 229–234, 2001
Article Google Scholar
G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Use of multiple classifiers in classification of data from multiple data sources,” 2001 International Geoscience and Remote Sensing Symposium (IGARSS 2001), vol. 2, Sydney, NSW: Institute of Electrical and Electronics Engineers Inc., pp. 882–884, 2001
Google Scholar
W. Fan, M. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval,” Decision Support Systems, vol. 42, no. 2, pp. 975–987, 2005
Article Google Scholar
S. Jianbo, W. Jun, and X. Yugeng, “Incremental learning with balanced update on receptive fields for multi-sensor data fusion,” IEEE Transactions on Systems, Man and Cybernetics (B), vol. 34, no. 1, pp. 659–665, 2004
Google Scholar
D. Leonard, D. Lillis, L. Zhang, F. Toolan, R. Collier, and J. Dunnion, “Applying machine learning diversity metrics to data fusion in information retrieval,” in Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 6611, P. Clough, C. Foley, C. Gurrin, G. Jones, W. Kraaij, H. Lee, and V. Mudoch, eds. Springer, Berlin/Heidelberg, 2011, pp. 695–698
Google Scholar
R. Polikar, J. DePasquale, H. Syed Mohammed, G. Brown, and L. I. Kuncheva, “Learn\(++\).MF: A random subspace approach for the missing feature problem,” Pattern Recognition, vol. 43, no. 11, pp. 3817–3832, 2010
Google Scholar
G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine Learning, vol. 23, no. 1, pp. 69–101, 1996
Article Google Scholar
R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Transactions on Neural Networks, doi: 10.1109/TNN.2011.2160459, vol. 22, no. 10, pp. 1517–1531, October 2011
Google Scholar
J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine Learning, vol. 1, no. 3, pp. 317–354, Sept. 1986
Google Scholar
R. Klinkenberg, “Learning drifting concepts: example selection vs. example weighting,” Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, vol. 8, no. 3, pp. 281–300, 2004
Google Scholar
M. Nunez, R. Fidalgo, and R. Morales, “Learning in environments with unknown dynamics: towards more robust concept learners,” Journal of Machine Learning Research, vol. 8, pp. 2595–2628, 2007
MathSciNet MATH Google Scholar
P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A low-granularity classifier for data streams with concept drifts and biased class distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1202–1213, 2007
Article Google Scholar
C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part I: detecting nonstationary changes,” IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1145–1153, 2008
Article Google Scholar
C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part II: designing the classifier,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2053–2064, 2008
Article Google Scholar
J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Advances in Artificial Intelligence—SBIA 2004 in Lecture Notes in Computer Science, vol. 3171, pp. 286–295, 2004
Google Scholar
L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, “Real-time data mining of non-stationary data streams from sensor networks,” Information Fusion, vol. 9, no. 3, pp. 344–353, 2008
Article Google Scholar
M. Markou and S. Singh, “Novelty detection: a review—part 2: neural network based approaches,” Signal Processing, vol. 83, no. 12, pp. 2499–2521, 2003
Article MATH Google Scholar
L. I. Kuncheva, “Classifier ensembles for changing environments,” Multiple Classifier Systems (MCS 2004) in Lecture Notes in Computer Science, vol. 3077, pp. 1–15, 2004
Google Scholar
A. Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain,” Machine Learning, vol. 26, no. 1, pp. 5–23, 1997
Article Google Scholar
Z. Xingquan, W. Xindong, and Y. Ying, “Dynamic classifier selection for effective mining from noisy data streams,” Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 305–312, 2004
Google Scholar
N. Littlestone, “Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm,” Machine Learning, vol. 2, no. 4, pp. 285–318, Apr. 1988
Article Google Scholar
N. Oza, “Online ensemble learning.” Ph.D. Dissertation, University of California, Berkeley, 2001
Google Scholar
W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” Seventh ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-01), pp. 377–382, 2001
Google Scholar
S. Chen and H. He, “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach,” Evolving Systems, vol. in press 2011
Google Scholar
A. Tsymbal, M. Pechenizkiy, P. Cunningham, and S. Puuronen, “Dynamic integration of classifiers for handling concept drift,” Information Fusion, vol. 9, no. 1, pp. 56–68, Jan. 2008
Article Google Scholar
J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: an ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007
MATH Google Scholar
J. Gao, W. Fan, and J. Han, “On appropriate assumptions to mine data streams: analysis and practice,” International Conference on Data Mining, pp. 143–152, 2007
Google Scholar
J. Gao, B. Ding, F. Wei, H. Jiawei, and P. S. Yu, “Classifying data streams with skewed class distributions and concept drifts,” IEEE Internet Computing, vol. 12, no. 6, pp. 37–49, 2008
Article Google Scholar
A. Bifet, “Adaptive learning and mining for data streams and frequent patterns.” Ph.D. Dissertation, Universitat Politècnica de Catalunya, 2009
Book Google Scholar
A. Bifet, E. Frank, G. Holmes, and B. Pfahringer, “Accurate ensembles for data streams: Combining restricted Hoeffding trees using stacking,” 2nd Asian Conference on Machine Learning in Journal of Machine Learning Research, vol. 13, Tokyo, 2010
Google Scholar
A. Bifet, MOA: Massive Online Analysis, Available at: http://moa.cs.waikato.ac.nz/.Lastaccessed:7/22/2011

Download references

Author information

Authors and Affiliations

Rowan University, Glassboro, NJ, 08028, USA
Robi Polikar

Authors

Robi Polikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robi Polikar .

Editor information

Editors and Affiliations

Microsoft, One Microsoft Road, Redmond, 98052, USA
Cha Zhang
Honeywell, Douglas Drive North 1985, Golden Valley, 55422, USA
Yunqian Ma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Polikar, R. (2012). Ensemble Learning. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_1

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9326-7_1
Published: 19 January 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9325-0
Online ISBN: 978-1-4419-9326-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics