Experiments on Ensembles with Missing and Noisy Data

Melville, Prem; Shah, Nishit; Mihalkova, Lilyana; Mooney, Raymond J.

doi:10.1007/978-3-540-25966-4_29

Prem Melville¹⁸,
Nishit Shah¹⁸,
Lilyana Mihalkova¹⁸ &
…
Raymond J. Mooney¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3077))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

1756 Accesses
48 Citations

Abstract

One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. Decorate is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and Decorate to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, Decorate is the most robust. For classification noise, bagging and Decorate are both robust, with bagging being slightly better than Decorate, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 (1999)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Article Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Proc. of 13th Intl. Conf. on Machine Learning (ICML 1996), July 1996, Morgan Kaufmann, San Francisco (1996)
Google Scholar
Kalai, A., Servedio, R.A.: Boosting in the presence of noise. In: Thirty-Fifth Annual ACM Symposium on Theory of Computing (2003)
Google Scholar
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation and active learning. In: Advances in Neural Information Processing Systems, vol. 7 (1995)
Google Scholar
McDonald, R.A., Eckley, I.A., Hand, D.J.: A multi-class extension to the brownboost algorithm. Technical Report TR-03-14, Imperial College, London (2003)
Google Scholar
McDonald, R.A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 35–44. Springer, Heidelberg (2003)
Chapter Google Scholar
Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of 18th Intl. Joint Conf. on Artificial Intelligence, Acapulco, Mexico, August 2003, pp. 505–510 (2003)
Google Scholar
Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems (2004)
Google Scholar
Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. Technical Report UT-AI-TR-04-310, University of Texas at Austin (2004)
Google Scholar
Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Ross Quinlan, J.: Bagging, boosting, and C4.5. In: Proc. of 13th Natl. Conf. on Artificial Intelligence (AAAI 1996), Portland, OR, August 1996, pp. 725–730 (1996)
Google Scholar
Servedio, R.A.: Smooth boosting and learning with malicious noise. The Journal of Machine Learning Research 4, 633–648 (2003)
Article MathSciNet Google Scholar
Verbaeten, S., Van Assche, A.: Ensemble methods for noise elimination in classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)
Chapter Google Scholar
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40 (2000)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, University of Texas at Austin, Austin, TX, 78712, USA
Prem Melville, Nishit Shah, Lilyana Mihalkova & Raymond J. Mooney

Authors

Prem Melville
View author publications
You can also search for this author in PubMed Google Scholar
Nishit Shah
View author publications
You can also search for this author in PubMed Google Scholar
Lilyana Mihalkova
View author publications
You can also search for this author in PubMed Google Scholar
Raymond J. Mooney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Electronic Engineering, Piazza d’Armi, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli
Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH, Guildford, UK
Josef Kittler
Centre for Vision, Speech and Signal Proc (CVSSP), University of Surrey, GU2 7XH, Guildford, Surrey, United Kingdom
Terry Windeatt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Melville, P., Shah, N., Mihalkova, L., Mooney, R.J. (2004). Experiments on Ensembles with Missing and Noisy Data. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-25966-4_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22144-9
Online ISBN: 978-3-540-25966-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics