Ensemble Learning

Chapter

Abstract

Over the last couple of decades, multiple classifier systems, also called ensemble systems have enjoyed growing attention within the computational intelligence and machine learning community. This attention has been well deserved, as ensemble systems have proven themselves to be very effective and extremely versatile in a broad spectrum of problem domains and real-world applications. Originally developed to reduce the variance—thereby improving the accuracy—of an automated decision-making system, ensemble systems have since been successfully used to address a variety of machine learning problems, such as feature selection, confidence estimation, missing feature, incremental learning, error correction, class-imbalanced data, learning concept drift from nonstationary distributions, among others. This chapter provides an overview of ensemble systems, their properties, and how they can be applied to such a wide spectrum of applications.

Keywords

Expense 

References

  1. 1.
    B. V. Dasarathy and B. V. Sheela, “Composite classifier system design: concepts and methodology,” Proceedings of the IEEE, vol. 67, no. 5, pp. 708–713, 1979CrossRefGoogle Scholar
  2. 2.
    L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993–1001, 1990CrossRefGoogle Scholar
  3. 3.
    R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, June 1990Google Scholar
  4. 4.
    Y. Freund and R. E. Schapire, “Decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    L. I. Kuncheva, Combining pattern classifiers, methods and algorithms. New York, NY: Wiley Interscience, 2005Google Scholar
  6. 6.
    L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996MathSciNetMATHGoogle Scholar
  7. 7.
    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991CrossRefGoogle Scholar
  8. 8.
    M. J. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Computation, vol. 6, no. 2, pp. 181–214, 1994CrossRefGoogle Scholar
  9. 9.
    D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992MathSciNetCrossRefGoogle Scholar
  10. 10.
    J. A. Benediktsson and P. H. Swain, “Consensus theoretic classification methods,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 4, pp. 688–704, 1992MATHCrossRefGoogle Scholar
  11. 11.
    L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition,” IEEE Transactions on Systems, Man and Cybernetics, vol. 22, no. 3, pp. 418–435, 1992CrossRefGoogle Scholar
  12. 12.
    T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, 1994CrossRefGoogle Scholar
  13. 13.
    G. Rogova, “Combining the results of several neural network classifiers,” Neural Networks, vol. 7, no. 5, pp. 777–781, 1994CrossRefGoogle Scholar
  14. 14.
    L. Lam and C. Y. Suen, “Optimal combinations of pattern classifiers,” Pattern Recognition Letters, vol. 16, no. 9, pp. 945–954, 1995CrossRefGoogle Scholar
  15. 15.
    K. Woods, W. P. J. Kegelmeyer, and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 405–410, 1997CrossRefGoogle Scholar
  16. 16.
    I. Bloch, “Information combination operators for data fusion: A comparative review with classification,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 26, no. 1, pp. 52–67, 1996CrossRefGoogle Scholar
  17. 17.
    S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral for robust classification,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 2, pp. 380–384, 1995CrossRefGoogle Scholar
  18. 18.
    L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin, “Decision templates for multiple classifier fusion: an experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314, 2001MATHCrossRefGoogle Scholar
  19. 19.
    H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, “Boosting and other ensemble methods,” Neural Computation, vol. 6, no. 6, pp. 1289–1301, 1994MATHCrossRefGoogle Scholar
  20. 20.
    L. I. Kuncheva, “Classifier ensembles for changing environments,” 5th International Workshop on Multiple Classifier Systems in Lecture Notes in Computer Science, eds. F. Roli, J. Kittler, and T. Windeatt, vol. 3077, pp. 1–15, Cagliari, Italy, 2004Google Scholar
  21. 21.
    L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 32, no. 2, pp. 146–156, 2002CrossRefGoogle Scholar
  22. 22.
    E. Alpaydin and M. I. Jordan, “Local linear perceptrons for classification,” IEEE Transactions on Neural Networks, vol. 7, no. 3, pp. 788–792, 1996CrossRefGoogle Scholar
  23. 23.
    G. Giacinto and F. Roli, “Approach to the automatic design of multiple classifier systems,” Pattern Recognition Letters, vol. 22, no. 1, pp. 25–33, 2001MATHCrossRefGoogle Scholar
  24. 24.
    L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001MATHCrossRefGoogle Scholar
  25. 25.
    L. Breiman, “Arcing classifiers,” Annals of Statistics, vol. 26, no. 3, pp. 801–849, 1998MathSciNetMATHCrossRefGoogle Scholar
  26. 26.
    F. M. Alkoot and J. Kittler, “Experimental evaluation of expert fusion strategies,” Pattern Recognition Letters, vol. 20, no. 11–13, pp. 1361–1369, Nov. 1999CrossRefGoogle Scholar
  27. 27.
    J. Kittler, M. Hatef, R. P. W. Duin, and J. Mates, “On combining classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239, 1998CrossRefGoogle Scholar
  28. 28.
    M. Muhlbaier, A. Topalis, and R. Polikar, “Ensemble confidence estimates posterior probability,” 6th Int. Workshop on Multiple Classifier Systems, Lecture Notes on Computer Science, eds. N. C. Oza, R. Polikar, J. Kittler, and F. Roli, Eds., vol. 3541, pp. 326–335, Monterey, CA, 2005Google Scholar
  29. 29.
    L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281–286, 2002CrossRefGoogle Scholar
  30. 30.
    Y. Lu, “Knowledge integration in a multiple classifier system,” Applied Intelligence, vol. 6, no. 2, pp. 75–86, 1996CrossRefGoogle Scholar
  31. 31.
    D. M. J. Tax, M. van Breukelen, R. P. W. Duin, and J. Kittler, “Combining multiple classifiers by averaging or by multiplying?” Pattern Recognition, vol. 33, no. 9, pp. 1475–1485, 2000CrossRefGoogle Scholar
  32. 32.
    G. Brown, “Diversity in neural network ensembles.” PhD, University of Birmingham, UK, 2004Google Scholar
  33. 33.
    G. Brown, J. Wyatt, R. Harris, and X. Yao, “Diversity creation methods: a survey and categorisation,” Information Fusion, vol. 6, no. 1, pp. 5–20, 2005CrossRefGoogle Scholar
  34. 34.
    A. Chandra and X. Yao, “Evolving hybrid ensembles of learning machines for better generalisation,” Neurocomputing, vol. 69, no. 7–9, pp. 686–700, Mar. 2006CrossRefGoogle Scholar
  35. 35.
    Y. Liu and X. Yao, “Ensemble learning via negative correlation,” Neural Networks, vol. 12, no. 10, pp. 1399–1404, 1999CrossRefGoogle Scholar
  36. 36.
    T. K. Ho, “Random subspace method for constructing decision forests,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832–844, 1998CrossRefGoogle Scholar
  37. 37.
    R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “Ensemble diversity measures and their application to thinning,” Information Fusion, vol. 6, no. 1, pp. 49–62, 2005CrossRefGoogle Scholar
  38. 38.
    L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,” Machine Learning, vol. 51, no. 2, pp. 181–207, 2003MATHCrossRefGoogle Scholar
  39. 39.
    L. I. Kuncheva, That elusive diversity in classifier ensembles,” Pattern Recognition and Image Analysis, Lecture Notes in Computer Science, vol. 2652, 2003, pp. 1126–1138Google Scholar
  40. 40.
    N. Littlestone and M. Warmuth, “Weighted majority algorithm,” Information and Computation, vol. 108, pp. 212–261, 1994MathSciNetMATHCrossRefGoogle Scholar
  41. 41.
    R. O. Duda, P. E. Hart, and D. Stork, “Algorithm independent techniques,” in Pattern classification, 2 edn New York: Wiley, 2001, pp. 453–516Google Scholar
  42. 42.
    L. Breiman, “Pasting small votes for classification in large databases and on-line,” Machine Learning, vol. 36, no. 1–2, pp. 85–103, 1999CrossRefGoogle Scholar
  43. 43.
    M. I. Jordan and L. Xu, “Convergence results for the EM approach to mixtures of experts architectures,” Neural Networks, vol. 8, no. 9, pp. 1409–1431, 1995CrossRefGoogle Scholar
  44. 44.
    R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn\(++\): An incremental learning algorithm for supervised neural networks,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 31, no. 4, pp. 497–508, 2001CrossRefGoogle Scholar
  45. 45.
    H. S. Mohammed, J. Leander, M. Marbach, Polikar, and R. Polikar, “Can AdaBoost.M1 learn incrementally? A comparison to Learn\(++\) under different combination rules,” International Conference on Artificial Neural Networks (ICANN2006) in Lecture Notes in Computer Science, vol. 4131, pp. 254–263, Springer, 2006Google Scholar
  46. 46.
    M. D. Muhlbaier, A. Topalis, and R. Polikar, “Learn\(++\).NC: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 152–168, 2009Google Scholar
  47. 47.
    D. Parikh and R. Polikar, “An ensemble-based incremental learning approach to data fusion,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 437–450, 2007CrossRefGoogle Scholar
  48. 48.
    H. Altincay and M. Demirekler, “Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence,” Speech Communication, vol. 41, no. 4, pp. 531–547, 2003CrossRefGoogle Scholar
  49. 49.
    Y. Bi, D. Bell, H. Wang, G. Guo, and K. Greer, “Combining multiple classifiers using dempster’s rule of combination for text categorization,” First International Conference, MDAI 2004, Aug 2–4 2004 in Lecture Notes in Artificial Intelligence, vol. 3131, Barcelona, Spain, pp. 127–138, 2004Google Scholar
  50. 50.
    T. Denoeux, “Neural network classifier based on Dempster-Shafer theory,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 30, no. 2, pp. 131–150, 2000MathSciNetCrossRefGoogle Scholar
  51. 51.
    G. A. Carpenter, S. Martens, and O. J. Ogas, “Self-organizing information fusion and hierarchical knowledge discovery: a new framework using ARTMAP neural networks,” Neural Networks, vol. 18, no. 3, pp. 287–295, 2005CrossRefGoogle Scholar
  52. 52.
    B. F. Buxton, W. B. Langdon, and S. J. Barrett, “Data fusion by intelligent classifier combination,” Measurement and Control, vol. 34, no. 8, pp. 229–234, 2001Google Scholar
  53. 53.
    G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson, “Use of multiple classifiers in classification of data from multiple data sources,” 2001 International Geoscience and Remote Sensing Symposium (IGARSS 2001), vol. 2, Sydney, NSW: Institute of Electrical and Electronics Engineers Inc., pp. 882–884, 2001Google Scholar
  54. 54.
    W. Fan, M. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval,” Decision Support Systems, vol. 42, no. 2, pp. 975–987, 2005CrossRefGoogle Scholar
  55. 55.
    S. Jianbo, W. Jun, and X. Yugeng, “Incremental learning with balanced update on receptive fields for multi-sensor data fusion,” IEEE Transactions on Systems, Man and Cybernetics (B), vol. 34, no. 1, pp. 659–665, 2004Google Scholar
  56. 56.
    D. Leonard, D. Lillis, L. Zhang, F. Toolan, R. Collier, and J. Dunnion, “Applying machine learning diversity metrics to data fusion in information retrieval,” in Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 6611, P. Clough, C. Foley, C. Gurrin, G. Jones, W. Kraaij, H. Lee, and V. Mudoch, eds. Springer, Berlin/Heidelberg, 2011, pp. 695–698Google Scholar
  57. 57.
    R. Polikar, J. DePasquale, H. Syed Mohammed, G. Brown, and L. I. Kuncheva, “Learn\(++\).MF: A random subspace approach for the missing feature problem,” Pattern Recognition, vol. 43, no. 11, pp. 3817–3832, 2010Google Scholar
  58. 58.
    G. Widmer and M. Kubat, “Learning in the presence of concept drift and hidden contexts,” Machine Learning, vol. 23, no. 1, pp. 69–101, 1996Google Scholar
  59. 59.
    R. Elwell and R. Polikar, “Incremental learning of concept drift in nonstationary environments,” IEEE Transactions on Neural Networks, doi: 10.1109/TNN.2011.2160459, vol. 22, no. 10, pp. 1517–1531, October 2011Google Scholar
  60. 60.
    J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine Learning, vol. 1, no. 3, pp. 317–354, Sept. 1986Google Scholar
  61. 61.
    R. Klinkenberg, “Learning drifting concepts: example selection vs. example weighting,” Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, vol. 8, no. 3, pp. 281–300, 2004Google Scholar
  62. 62.
    M. Nunez, R. Fidalgo, and R. Morales, “Learning in environments with unknown dynamics: towards more robust concept learners,” Journal of Machine Learning Research, vol. 8, pp. 2595–2628, 2007MathSciNetMATHGoogle Scholar
  63. 63.
    P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A low-granularity classifier for data streams with concept drifts and biased class distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1202–1213, 2007CrossRefGoogle Scholar
  64. 64.
    C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part I: detecting nonstationary changes,” IEEE Transactions on Neural Networks, vol. 19, no. 7, pp. 1145–1153, 2008CrossRefGoogle Scholar
  65. 65.
    C. Alippi and M. Roveri, “Just-in-time adaptive classifiers; part II: designing the classifier,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2053–2064, 2008CrossRefGoogle Scholar
  66. 66.
    J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” Advances in Artificial Intelligence—SBIA 2004 in Lecture Notes in Computer Science, vol. 3171, pp. 286–295, 2004Google Scholar
  67. 67.
    L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, “Real-time data mining of non-stationary data streams from sensor networks,” Information Fusion, vol. 9, no. 3, pp. 344–353, 2008CrossRefGoogle Scholar
  68. 68.
    M. Markou and S. Singh, “Novelty detection: a review—part 2: neural network based approaches,” Signal Processing, vol. 83, no. 12, pp. 2499–2521, 2003MATHCrossRefGoogle Scholar
  69. 69.
    L. I. Kuncheva, “Classifier ensembles for changing environments,” Multiple Classifier Systems (MCS 2004) in Lecture Notes in Computer Science, vol. 3077, pp. 1–15, 2004Google Scholar
  70. 70.
    A. Blum, “Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain,” Machine Learning, vol. 26, no. 1, pp. 5–23, 1997CrossRefGoogle Scholar
  71. 71.
    Z. Xingquan, W. Xindong, and Y. Ying, “Dynamic classifier selection for effective mining from noisy data streams,” Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 305–312, 2004Google Scholar
  72. 72.
    N. Littlestone, “Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm,” Machine Learning, vol. 2, no. 4, pp. 285–318, Apr. 1988Google Scholar
  73. 73.
    N. Oza, “Online ensemble learning.” Ph.D. Dissertation, University of California, Berkeley, 2001Google Scholar
  74. 74.
    W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” Seventh ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-01), pp. 377–382, 2001Google Scholar
  75. 75.
    S. Chen and H. He, “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach,” Evolving Systems, vol. in press 2011Google Scholar
  76. 76.
    A. Tsymbal, M. Pechenizkiy, P. Cunningham, and S. Puuronen, “Dynamic integration of classifiers for handling concept drift,” Information Fusion, vol. 9, no. 1, pp. 56–68, Jan. 2008CrossRefGoogle Scholar
  77. 77.
    J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: an ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, pp. 2755–2790, 2007MATHGoogle Scholar
  78. 78.
    J. Gao, W. Fan, and J. Han, “On appropriate assumptions to mine data streams: analysis and practice,” International Conference on Data Mining, pp. 143–152, 2007Google Scholar
  79. 79.
    J. Gao, B. Ding, F. Wei, H. Jiawei, and P. S. Yu, “Classifying data streams with skewed class distributions and concept drifts,” IEEE Internet Computing, vol. 12, no. 6, pp. 37–49, 2008CrossRefGoogle Scholar
  80. 80.
    A. Bifet, “Adaptive learning and mining for data streams and frequent patterns.” Ph.D. Dissertation, Universitat Politècnica de Catalunya, 2009Google Scholar
  81. 81.
    A. Bifet, E. Frank, G. Holmes, and B. Pfahringer, “Accurate ensembles for data streams: Combining restricted Hoeffding trees using stacking,” 2nd Asian Conference on Machine Learning in Journal of Machine Learning Research, vol. 13, Tokyo, 2010Google Scholar
  82. 82.
    A. Bifet, MOA: Massive Online Analysis, Available at: http://moa.cs.waikato.ac.nz/.Lastaccessed:7/22/2011

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Rowan UniversityGlassboroUSA

Personalised recommendations