Combining Multiple Learners: Data Fusion and Emsemble Learning

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-5571-3_20

Ke-Lin Du^3,4 &
M. N. S. Swamy³

9729 Accesses

Abstract

Different learning algorithms have different accuracies. The no free lunch theorem asserts that no single learning algorithm always achieves the best performance in any domain. They can be combined to attain higher accuracy. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. Fusion of data for improving prediction accuracy and reliability is an important problem in machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141.
MathSciNet Google Scholar
Bartlett, P. L., & Traskin, M. (2007). AdaBoost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
MATH MathSciNet Google Scholar
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139.
Article Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.
Google Scholar
Breiman, L. (1996). Bias variance and arcing classifiers. Technical Report TR 460, Berkeley, CA: Statistics Department, University of California.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Breiman, L. (2004). Population theory for predictor ensembles. Annals of Statistics, 32(1), 1–11.
Article MATH MathSciNet Google Scholar
Chang, C.-C., Chien, L.-J., & Lee, Y.-J. (2011). A novel framework for multi-class classification via ternary smooth support vector machine. Pattern Recognition, 44, 1235–1244.
Article MATH Google Scholar
Clarke, B. (2003). Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. Journal of Machine Learning Research, 4, 683–712.
Google Scholar
Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 47, 253–285.
Article Google Scholar
Collobert, R., Bengio, S., & Bengio, Y. (2002). A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 1105–1114.
Article MATH Google Scholar
Dempster, A. P. (1967). Upper and lower probabilities induced by multivalued mappings. The Annals of Mathematical Statistics, 38, 325–339.
Article MATH MathSciNet Google Scholar
Denoeux, T. (1995). A \(k\)-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, 25(5), 804–813.
Google Scholar
Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
MATH Google Scholar
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158.
Article Google Scholar
Domingos, P. (2000). A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the 17th National Conference on Artificial Intelligence (pp. 564–569). Austin, TX.
Google Scholar
Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge, UK: Cambridge University Press.
Google Scholar
Escalera, S., Tax, D., Pujol, O., Radeva, P., & Duin, R. (2008). Subclass problem dependent design of error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1041–1054.
Article Google Scholar
Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Article Google Scholar
Escalera, S., Masip, D., Puertas, E., Radeva, P., & Pujol, O. (2011). Online error correcting output codes. Pattern Recognition Letters, 32, 458–467.
Article Google Scholar
Freund, Y. & Schapire, R.E. (1996). Experiments with a new boosting algorithm: Proceedings of the 13th International Conference on Machine Learning (pp. 148–156). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MATH MathSciNet Google Scholar
Freund, Y. (2001). An adaptive version of the boost by majority algorithm. Machine Learning, 43, 293–318.
Article MATH Google Scholar
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.
Article Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.
Article MathSciNet Google Scholar
Friedman, J., & Hall, P. (2000). On bagging and nonlinear estimation. Technical Report. Stanford, CA: Statistics Department, Stanford University.
Google Scholar
Friedman, J. H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2), 337–407.
Article MATH MathSciNet Google Scholar
Fu, Z., Robles-Kelly, A., & Zhou, J. (2010). Mixing linear SVMs for nonlinear classification. IEEE Transactions on Neural Networks, 21(12), 1963–1975.
Article Google Scholar
Gambs, S., Kegl, B., & Aimeur, E. (2007). Privacy-preserving boosting. Data Mining and Knowledge Discovery, 14, 131–170.
Article MathSciNet Google Scholar
Gao, C., Sang, N., & Tang, Q. (2010). On selection and combination of weak learners in AdaBoost. Pattern Recognition Letters, 31, 991–1001.
Article Google Scholar
Hastie, T. & Tibshirani, R. (1998). Classification by pairwise grouping. In: Advances in neural information processing systems (Vol. 26, pp. 451–471). Cambridge, MA: MIT Press.
Google Scholar
Ho, T.K. (1995). Random decision forests: Proceedings of the 3rd International Conference on Document Analysis and Recognition (pp. 278–282). Washington DC, USA.
Google Scholar
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Article Google Scholar
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Article Google Scholar
Jiang, W. (2000). The VC dimension for mixtures of binary classifiers. Neural Computation, 12, 1293–1301.
Article Google Scholar
Kanamori, T., Takenouchi, T., Eguchi, S., & Murata, N. (2007). Robust loss functions for boosting. Neural Computation, 19, 2183–2244.
Article MATH MathSciNet Google Scholar
Klautau, A., Jevtic, N., & Orlitsky, A. (2003). On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. Journal of Machine Learning Research, 4, 1–15.
MathSciNet Google Scholar
Kleinberg, E. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(5), 473–490.
Article Google Scholar
Kong, E., & Dietterich, T.G. (1995). Error-correcting output coding correct bias and variance: In Proceedings of 12th International Conference on Machine Learning (pp. 313–321). San Francisco, CA.
Google Scholar
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207.
Article MATH Google Scholar
Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of \(k\)-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.
Article Google Scholar
Lee, H. K. H., & Clyde, M. A. (2004). Lossless online Bayesian bagging. Journal of Machine Learning Research, 5, 143–151.
MathSciNet Google Scholar
Li, S. Z., & Zhang, Z. (2004). FloatBoost learning and statistical face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1112–1123.
Article Google Scholar
Mease, D., & Wyner, A. (2008). Evidence contrary to the statistical view of boosting. Journal of Machine Learning Research, 9, 131–156.
Google Scholar
Meynet, J., & Thiran, J.-P. (2010). Information theoretic combination of pattern classifiers. Pattern Recognition, 43, 3412–3421.
Article MATH Google Scholar
Mirikitani, D. T., & Nikolaev, N. (2010). Efficient online recurrent connectionist learning with the ensemble Kalman filter. Neurocomputing, 73, 1024–1030.
Article Google Scholar
Muhlbaier, M. D., Topalis, A., & Polikar, R. (2009). Learn++.NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Transactions on Neural Networks, 20(1), 152–168.
Article Google Scholar
Oza, N.C. & Russell, S. (2001). Online bagging and boosting. In: T. Richardson & T. Jaakkola (Eds.), Proceedings of the 8th International Workshops on Artificial Intelligence and Statistics (pp. 105–112). San Mateo, CA : Morgan Kaufmann.
Google Scholar
Pavlov, D., Mao, J. & Dom, B. (2000). Scaling-up support vector machines using boosting algorithm. In: Proceedings of International Conference on Pattern Recognition(Vol. 2, pp. 2219–2222). Barcelona, Spain.
Google Scholar
Pedrajas, N. G., & Boyer, D. O. (2006). Improving multiclass pattern recognition by the combination of two strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1001–1006.
Article Google Scholar
Platt, J.C., Christiani, N. & ShaweCTaylor, J. (1999). Large margin DAGs for multiclass classification. In: S.A. Solla, T.K. Leen, & K.R. Muller (Eds.), Advances in neural information processing systems (pp. 547–553). Cambridge, MA: MIT Press.
Google Scholar
Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1001–1007.
Article Google Scholar
Quost, B., Masson, M.-H., & Denoeux, T. (2011). Classifier fusion in the DempsterCShafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning, 52, 353–374.
Article MathSciNet Google Scholar
Ratsch, G., Onoda, T., & Muller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 43(3), 287–320.
Article Google Scholar
Ratsch, G., & Warmuth, M. (2005). Efficient margin maximizing with boosting. Journal of Machine Learning Research, 6, 2153–2175.
Google Scholar
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.
Article Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Google Scholar
Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.
Article MATH MathSciNet Google Scholar
Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
Article MATH Google Scholar
Servedio, R. A. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648.
MathSciNet Google Scholar
Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.
MATH Google Scholar
Shalev-Shwartz, S., & Singer, Y. (2010). On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. Machine Learning, 80, 141–163.
Article MathSciNet Google Scholar
Shigei, N., Miyajima, H., Maeda, M., & Ma, L. (2009). Bagging and AdaBoost algorithms for vector quantization. Neurocomputing, 73, 106–114.
Article Google Scholar
Shrestha, D. L., & Solomatine, D. P. (2006). Experiments with AdaBoost.RT, an improved boosting scheme for regression. Neural Computation, 18, 1678–1710.
Article MATH Google Scholar
Singh, V., Mukherjee, L., Peng, J., & Xu, J. (2010). Ensemble clustering using semidefinite programming with applications. Machine Learning, 79, 177–200.
Article MathSciNet Google Scholar
Smets, P. (1990). The combination of evidence in the transferable belief model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5), 447–458.
Article Google Scholar
Steele, B. M. (2009). Exact bootstrap \(k\)-nearest neighbor learners. Machine Learning, 74, 235–255.
Article Google Scholar
Tang, E. K., Suganthan, P. N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247–271.
Article Google Scholar
Tresp, V. (2000). A Bayesian committee machine. Neural Computation, 12, 2719–2741.
Article Google Scholar
Valentini, G., & Dietterich, T. G. (2004). Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research, 5, 725–775.
MATH MathSciNet Google Scholar
Valentini, G. (2005). An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Transactions on Systems, Man, and Cybernetics Part B, 35(6), 1252–1271.
Google Scholar
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Article Google Scholar
Xu, L., Krzyzak, A. & Suen, C.Y. (1992). Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man, and Cybernetics, 22, 418–435.
Google Scholar
Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision-making. IEEE Transactions on Systems, Man, and Cybernetics, 18(1), 183–190.
Article MATH MathSciNet Google Scholar
Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7, 1315–1338.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Enjoyor Labs, Enjoyor Inc., Hangzhou, China
Ke-Lin Du & M. N. S. Swamy
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Combining Multiple Learners: Data Fusion and Emsemble Learning. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_20

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5571-3_20
Published: 07 December 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5570-6
Online ISBN: 978-1-4471-5571-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics