Abstract
One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. Decorate is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and Decorate to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, Decorate is the most robust. For classification noise, bagging and Decorate are both robust, with bagging being slightly better than Decorate, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 (1999)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Proc. of 13th Intl. Conf. on Machine Learning (ICML 1996), July 1996, Morgan Kaufmann, San Francisco (1996)
Kalai, A., Servedio, R.A.: Boosting in the presence of noise. In: Thirty-Fifth Annual ACM Symposium on Theory of Computing (2003)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation and active learning. In: Advances in Neural Information Processing Systems, vol. 7 (1995)
McDonald, R.A., Eckley, I.A., Hand, D.J.: A multi-class extension to the brownboost algorithm. Technical Report TR-03-14, Imperial College, London (2003)
McDonald, R.A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 35–44. Springer, Heidelberg (2003)
Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of 18th Intl. Joint Conf. on Artificial Intelligence, Acapulco, Mexico, August 2003, pp. 505–510 (2003)
Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems (2004)
Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. Technical Report UT-AI-TR-04-310, University of Texas at Austin (2004)
Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Ross Quinlan, J.: Bagging, boosting, and C4.5. In: Proc. of 13th Natl. Conf. on Artificial Intelligence (AAAI 1996), Portland, OR, August 1996, pp. 725–730 (1996)
Servedio, R.A.: Smooth boosting and learning with malicious noise. The Journal of Machine Learning Research 4, 633–648 (2003)
Verbaeten, S., Van Assche, A.: Ensemble methods for noise elimination in classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40 (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Melville, P., Shah, N., Mihalkova, L., Mooney, R.J. (2004). Experiments on Ensembles with Missing and Noisy Data. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-25966-4_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22144-9
Online ISBN: 978-3-540-25966-4
eBook Packages: Springer Book Archive