Abstract
Different learning algorithms have different accuracies. The no free lunch theorem asserts that no single learning algorithm always achieves the best performance in any domain. They can be combined to attain higher accuracy. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. Fusion of data for improving prediction accuracy and reliability is an important problem in machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141.
Bartlett, P. L., & Traskin, M. (2007). AdaBoost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.
Breiman, L. (1996). Bias variance and arcing classifiers. Technical Report TR 460, Berkeley, CA: Statistics Department, University of California.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Breiman, L. (2004). Population theory for predictor ensembles. Annals of Statistics, 32(1), 1–11.
Chang, C.-C., Chien, L.-J., & Lee, Y.-J. (2011). A novel framework for multi-class classification via ternary smooth support vector machine. Pattern Recognition, 44, 1235–1244.
Clarke, B. (2003). Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. Journal of Machine Learning Research, 4, 683–712.
Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 47, 253–285.
Collobert, R., Bengio, S., & Bengio, Y. (2002). A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 1105–1114.
Dempster, A. P. (1967). Upper and lower probabilities induced by multivalued mappings. The Annals of Mathematical Statistics, 38, 325–339.
Denoeux, T. (1995). A \(k\)-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, 25(5), 804–813.
Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158.
Domingos, P. (2000). A unified bias-variance decomposition for zero-one and squared loss. In: Proceedings of the 17th National Conference on Artificial Intelligence (pp. 564–569). Austin, TX.
Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge, UK: Cambridge University Press.
Escalera, S., Tax, D., Pujol, O., Radeva, P., & Duin, R. (2008). Subclass problem dependent design of error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1041–1054.
Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Escalera, S., Masip, D., Puertas, E., Radeva, P., & Pujol, O. (2011). Online error correcting output codes. Pattern Recognition Letters, 32, 458–467.
Freund, Y. & Schapire, R.E. (1996). Experiments with a new boosting algorithm: Proceedings of the 13th International Conference on Machine Learning (pp. 148–156). San Mateo, CA: Morgan Kaufmann.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Freund, Y. (2001). An adaptive version of the boost by majority algorithm. Machine Learning, 43, 293–318.
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.
Friedman, J., & Hall, P. (2000). On bagging and nonlinear estimation. Technical Report. Stanford, CA: Statistics Department, Stanford University.
Friedman, J. H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2), 337–407.
Fu, Z., Robles-Kelly, A., & Zhou, J. (2010). Mixing linear SVMs for nonlinear classification. IEEE Transactions on Neural Networks, 21(12), 1963–1975.
Gambs, S., Kegl, B., & Aimeur, E. (2007). Privacy-preserving boosting. Data Mining and Knowledge Discovery, 14, 131–170.
Gao, C., Sang, N., & Tang, Q. (2010). On selection and combination of weak learners in AdaBoost. Pattern Recognition Letters, 31, 991–1001.
Hastie, T. & Tibshirani, R. (1998). Classification by pairwise grouping. In: Advances in neural information processing systems (Vol. 26, pp. 451–471). Cambridge, MA: MIT Press.
Ho, T.K. (1995). Random decision forests: Proceedings of the 3rd International Conference on Document Analysis and Recognition (pp. 278–282). Washington DC, USA.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87.
Jiang, W. (2000). The VC dimension for mixtures of binary classifiers. Neural Computation, 12, 1293–1301.
Kanamori, T., Takenouchi, T., Eguchi, S., & Murata, N. (2007). Robust loss functions for boosting. Neural Computation, 19, 2183–2244.
Klautau, A., Jevtic, N., & Orlitsky, A. (2003). On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. Journal of Machine Learning Research, 4, 1–15.
Kleinberg, E. (2000). On the algorithmic implementation of stochastic discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(5), 473–490.
Kong, E., & Dietterich, T.G. (1995). Error-correcting output coding correct bias and variance: In Proceedings of 12th International Conference on Machine Learning (pp. 313–321). San Francisco, CA.
Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181–207.
Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of \(k\)-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.
Lee, H. K. H., & Clyde, M. A. (2004). Lossless online Bayesian bagging. Journal of Machine Learning Research, 5, 143–151.
Li, S. Z., & Zhang, Z. (2004). FloatBoost learning and statistical face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1112–1123.
Mease, D., & Wyner, A. (2008). Evidence contrary to the statistical view of boosting. Journal of Machine Learning Research, 9, 131–156.
Meynet, J., & Thiran, J.-P. (2010). Information theoretic combination of pattern classifiers. Pattern Recognition, 43, 3412–3421.
Mirikitani, D. T., & Nikolaev, N. (2010). Efficient online recurrent connectionist learning with the ensemble Kalman filter. Neurocomputing, 73, 1024–1030.
Muhlbaier, M. D., Topalis, A., & Polikar, R. (2009). Learn++.NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Transactions on Neural Networks, 20(1), 152–168.
Oza, N.C. & Russell, S. (2001). Online bagging and boosting. In: T. Richardson & T. Jaakkola (Eds.), Proceedings of the 8th International Workshops on Artificial Intelligence and Statistics (pp. 105–112). San Mateo, CA : Morgan Kaufmann.
Pavlov, D., Mao, J. & Dom, B. (2000). Scaling-up support vector machines using boosting algorithm. In: Proceedings of International Conference on Pattern Recognition(Vol. 2, pp. 2219–2222). Barcelona, Spain.
Pedrajas, N. G., & Boyer, D. O. (2006). Improving multiclass pattern recognition by the combination of two strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1001–1006.
Platt, J.C., Christiani, N. & ShaweCTaylor, J. (1999). Large margin DAGs for multiclass classification. In: S.A. Solla, T.K. Leen, & K.R. Muller (Eds.), Advances in neural information processing systems (pp. 547–553). Cambridge, MA: MIT Press.
Pujol, O., Radeva, P., & Vitria, J. (2006). Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1001–1007.
Quost, B., Masson, M.-H., & Denoeux, T. (2011). Classifier fusion in the DempsterCShafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning, 52, 353–374.
Ratsch, G., Onoda, T., & Muller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 43(3), 287–320.
Ratsch, G., & Warmuth, M. (2005). Efficient margin maximizing with boosting. Journal of Machine Learning Research, 6, 2153–2175.
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.
Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
Servedio, R. A. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648.
Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.
Shalev-Shwartz, S., & Singer, Y. (2010). On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. Machine Learning, 80, 141–163.
Shigei, N., Miyajima, H., Maeda, M., & Ma, L. (2009). Bagging and AdaBoost algorithms for vector quantization. Neurocomputing, 73, 106–114.
Shrestha, D. L., & Solomatine, D. P. (2006). Experiments with AdaBoost.RT, an improved boosting scheme for regression. Neural Computation, 18, 1678–1710.
Singh, V., Mukherjee, L., Peng, J., & Xu, J. (2010). Ensemble clustering using semidefinite programming with applications. Machine Learning, 79, 177–200.
Smets, P. (1990). The combination of evidence in the transferable belief model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5), 447–458.
Steele, B. M. (2009). Exact bootstrap \(k\)-nearest neighbor learners. Machine Learning, 74, 235–255.
Tang, E. K., Suganthan, P. N., & Yao, X. (2006). An analysis of diversity measures. Machine Learning, 65(1), 247–271.
Tresp, V. (2000). A Bayesian committee machine. Neural Computation, 12, 2719–2741.
Valentini, G., & Dietterich, T. G. (2004). Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. Journal of Machine Learning Research, 5, 725–775.
Valentini, G. (2005). An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Transactions on Systems, Man, and Cybernetics Part B, 35(6), 1252–1271.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Xu, L., Krzyzak, A. & Suen, C.Y. (1992). Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man, and Cybernetics, 22, 418–435.
Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decision-making. IEEE Transactions on Systems, Man, and Cybernetics, 18(1), 183–190.
Zhang, Y., Burer, S., & Street, W. N. (2006). Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7, 1315–1338.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Du, KL., Swamy, M.N.S. (2014). Combining Multiple Learners: Data Fusion and Emsemble Learning. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_20
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5571-3_20
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5570-6
Online ISBN: 978-1-4471-5571-3
eBook Packages: EngineeringEngineering (R0)