Skip to main content

Combining Multiple Learners: Data Fusion and Emsemble Learning

  • Chapter
  • First Online:
Neural Networks and Statistical Learning
  • 9729 Accesses

Abstract

Different learning algorithms have different accuracies. The no free lunch theorem asserts that no single learning algorithm always achieves the best performance in any domain. They can be combined to attain higher accuracy. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. Fusion of data for improving prediction accuracy and reliability is an important problem in machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141.

    MathSciNet  Google Scholar 

  2. Bartlett, P. L., & Traskin, M. (2007). AdaBoost is consistent. Journal of Machine Learning Research, 8, 2347–2368.

    MATH  MathSciNet  Google Scholar 

  3. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139.

    Article  Google Scholar 

  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.

    Google Scholar 

  5. Breiman, L. (1996). Bias variance and arcing classifiers. Technical Report TR 460, Berkeley, CA: Statistics Department, University of California.

    Google Scholar 

  6. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  7. Breiman, L. (2004). Population theory for predictor ensembles. Annals of Statistics, 32(1), 1–11.

    Article  MATH  MathSciNet  Google Scholar 

  8. Chang, C.-C., Chien, L.-J., & Lee, Y.-J. (2011). A novel framework for multi-class classification via ternary smooth support vector machine. Pattern Recognition, 44, 1235–1244.

    Article  MATH  Google Scholar 

  9. Clarke, B. (2003). Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. Journal of Machine Learning Research, 4, 683–712.

    Google Scholar 

  10. Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 47, 253–285.

    Article  Google Scholar 

  11. Collobert, R., Bengio, S., & Bengio, Y. (2002). A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 1105–1114.

    Article  MATH  Google Scholar 

  12. Dempster, A. P. (1967). Upper and lower probabilities induced by multivalued mappings. The Annals of Mathematical Statistics, 38, 325–339.

    Article  MATH  MathSciNet  Google Scholar 

  13. Denoeux, T. (1995). A \(k\)-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, 25(5), 804–813.

    Google Scholar 

  14. Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.

    MATH  Google Scholar 

  15. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158.

    Article  Google Scholar 

  16. Domingos, P. (2000). A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the 17th National Conference on Artificial Intelligence (pp. 564–569). Austin, TX.

    Google Scholar 

  17. Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  18. Escalera, S., Tax, D., Pujol, O., Radeva, P., & Duin, R. (2008). Subclass problem dependent design of error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1041–1054.

    Article  Google Scholar 

  19. Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.

    Article  Google Scholar 

  20. Escalera, S., Masip, D., Puertas, E., Radeva, P., & Pujol, O. (2011). Online error correcting output codes. Pattern Recognition Letters, 32, 458–467.

    Article  Google Scholar 

  21. Freund, Y. & Schapire, R.E. (1996). Experiments with a new boosting algorithm: Proceedings of the 13th International Conference on Machine Learning (pp. 148–156). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  22. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.

    Article  MATH  MathSciNet  Google Scholar 

  23. Freund, Y. (2001). An adaptive version of the boost by majority algorithm. Machine Learning, 43, 293–318.

    Article  MATH  Google Scholar 

  24. Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.

    Article  Google Scholar 

  25. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.

    Article  MathSciNet  Google Scholar 

  26. Friedman, J., & Hall, P. (2000). On bagging and nonlinear estimation. Technical Report. Stanford, CA: Statistics Department, Stanford University.

    Google Scholar 

  27. Friedman, J. H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2), 337–407.

    Article  MATH  MathSciNet  Google Scholar 

  28. Fu, Z., Robles-Kelly, A., & Zhou, J. (2010). Mixing linear SVMs for nonlinear classification. IEEE Transactions on Neural Networks, 21(12), 1963–1975.

    Article  Google Scholar 

  29. Gambs, S., Kegl, B., & Aimeur, E. (2007). Privacy-preserving boosting. Data Mining and Knowledge Discovery, 14, 131–170.

    Article  MathSciNet  Google Scholar 

  30. Gao, C., Sang, N., & Tang, Q. (2010). On selection and combination of weak learners in AdaBoost. Pattern Recognition Letters, 31, 991–1001.

    Article  Google Scholar 

  31. Hastie, T. & Tibshirani, R. (1998). Classification by pairwise grouping. In: Advances in neural information processing systems (Vol. 26, pp. 451–471). Cambridge, MA: MIT Press.

    Google Scholar 

  32. Ho, T.K. (1995). Random decision forests: Proceedings of the 3rd International Conference on Document Analysis and Recognition (pp. 278–282). Washington DC, USA.

    Google Scholar 

  33. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

    Article  Google Scholar 

  34. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.

    Article  Google Scholar 

  35. Jiang, W. (2000). The VC dimension for mixtures of binary classifiers. Neural Computation, 12, 1293–1301.

    Article  Google Scholar 

  36. Kanamori, T., Takenouchi, T., Eguchi, S., & Murata, N. (2007). Robust loss functions for boosting. Neural Computation, 19, 2183–2244.

    Article  MATH  MathSciNet  Google Scholar 

  37. Klautau, A., Jevtic, N., & Orlitsky, A. (2003). On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. Journal of Machine Learning Research, 4, 1–15.

    MathSciNet  Google Scholar 

  38. Kleinberg, E. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(5), 473–490.

    Article  Google Scholar 

  39. Kong, E., & Dietterich, T.G. (1995). Error-correcting output coding correct bias and variance: In Proceedings of 12th International Conference on Machine Learning (pp. 313–321). San Francisco, CA.

    Google Scholar 

  40. Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207.

    Article  MATH  Google Scholar 

  41. Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of \(k\)-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.

    Article  Google Scholar 

  42. Lee, H. K. H., & Clyde, M. A. (2004). Lossless online Bayesian bagging. Journal of Machine Learning Research, 5, 143–151.

    MathSciNet  Google Scholar 

  43. Li, S. Z., & Zhang, Z. (2004). FloatBoost learning and statistical face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1112–1123.

    Article  Google Scholar 

  44. Mease, D., & Wyner, A. (2008). Evidence contrary to the statistical view of boosting. Journal of Machine Learning Research, 9, 131–156.

    Google Scholar 

  45. Meynet, J., & Thiran, J.-P. (2010). Information theoretic combination of pattern classifiers. Pattern Recognition, 43, 3412–3421.

    Article  MATH  Google Scholar 

  46. Mirikitani, D. T., & Nikolaev, N. (2010). Efficient online recurrent connectionist learning with the ensemble Kalman filter. Neurocomputing, 73, 1024–1030.

    Article  Google Scholar 

  47. Muhlbaier, M. D., Topalis, A., & Polikar, R. (2009). Learn++.NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Transactions on Neural Networks, 20(1), 152–168.

    Article  Google Scholar 

  48. Oza, N.C. & Russell, S. (2001). Online bagging and boosting. In: T. Richardson & T. Jaakkola (Eds.), Proceedings of the 8th International Workshops on Artificial Intelligence and Statistics (pp. 105–112). San Mateo, CA : Morgan Kaufmann.

    Google Scholar 

  49. Pavlov, D., Mao, J. & Dom, B. (2000). Scaling-up support vector machines using boosting algorithm. In: Proceedings of International Conference on Pattern Recognition(Vol. 2, pp. 2219–2222). Barcelona, Spain.

    Google Scholar 

  50. Pedrajas, N. G., & Boyer, D. O. (2006). Improving multiclass pattern recognition by the combination of two strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1001–1006.

    Article  Google Scholar 

  51. Platt, J.C., Christiani, N. & ShaweCTaylor, J. (1999). Large margin DAGs for multiclass classification. In: S.A. Solla, T.K. Leen, & K.R. Muller (Eds.), Advances in neural information processing systems (pp. 547–553). Cambridge, MA: MIT Press.

    Google Scholar 

  52. Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1001–1007.

    Article  Google Scholar 

  53. Quost, B., Masson, M.-H., & Denoeux, T. (2011). Classifier fusion in the DempsterCShafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning, 52, 353–374.

    Article  MathSciNet  Google Scholar 

  54. Ratsch, G., Onoda, T., & Muller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 43(3), 287–320.

    Article  Google Scholar 

  55. Ratsch, G., & Warmuth, M. (2005). Efficient margin maximizing with boosting. Journal of Machine Learning Research, 6, 2153–2175.

    Google Scholar 

  56. Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.

    Article  Google Scholar 

  57. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.

    Google Scholar 

  58. Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.

    Article  MATH  MathSciNet  Google Scholar 

  59. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.

    Article  MATH  Google Scholar 

  60. Servedio, R. A. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648.

    MathSciNet  Google Scholar 

  61. Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.

    MATH  Google Scholar 

  62. Shalev-Shwartz, S., & Singer, Y. (2010). On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. Machine Learning, 80, 141–163.

    Article  MathSciNet  Google Scholar 

  63. Shigei, N., Miyajima, H., Maeda, M., & Ma, L. (2009). Bagging and AdaBoost algorithms for vector quantization. Neurocomputing, 73, 106–114.

    Article  Google Scholar 

  64. Shrestha, D. L., & Solomatine, D. P. (2006). Experiments with AdaBoost.RT, an improved boosting scheme for regression. Neural Computation, 18, 1678–1710.

    Article  MATH  Google Scholar 

  65. Singh, V., Mukherjee, L., Peng, J., & Xu, J. (2010). Ensemble clustering using semidefinite programming with applications. Machine Learning, 79, 177–200.

    Article  MathSciNet  Google Scholar 

  66. Smets, P. (1990). The combination of evidence in the transferable belief model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5), 447–458.

    Article  Google Scholar 

  67. Steele, B. M. (2009). Exact bootstrap \(k\)-nearest neighbor learners. Machine Learning, 74, 235–255.

    Article  Google Scholar 

  68. Tang, E. K., Suganthan, P. N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247–271.

    Article  Google Scholar 

  69. Tresp, V. (2000). A Bayesian committee machine. Neural Computation, 12, 2719–2741.

    Article  Google Scholar 

  70. Valentini, G., & Dietterich, T. G. (2004). Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research, 5, 725–775.

    MATH  MathSciNet  Google Scholar 

  71. Valentini, G. (2005). An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Transactions on Systems, Man, and Cybernetics Part B, 35(6), 1252–1271.

    Google Scholar 

  72. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.

    Article  Google Scholar 

  73. Xu, L., Krzyzak, A. & Suen, C.Y. (1992). Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man, and Cybernetics, 22, 418–435.

    Google Scholar 

  74. Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision-making. IEEE Transactions on Systems, Man, and Cybernetics, 18(1), 183–190.

    Article  MATH  MathSciNet  Google Scholar 

  75. Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7, 1315–1338.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Combining Multiple Learners: Data Fusion and Emsemble Learning. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5571-3_20

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5570-6

  • Online ISBN: 978-1-4471-5571-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics