Skip to main content

Experiments on Ensembles with Missing and Noisy Data

  • Conference paper
Multiple Classifier Systems (MCS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3077))

Included in the following conference series:

Abstract

One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. Decorate is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and Decorate to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, Decorate is the most robust. For classification noise, bagging and Decorate are both robust, with bagging being slightly better than Decorate, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 (1999)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  3. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)

    Article  Google Scholar 

  4. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Proc. of 13th Intl. Conf. on Machine Learning (ICML 1996), July 1996, Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  5. Kalai, A., Servedio, R.A.: Boosting in the presence of noise. In: Thirty-Fifth Annual ACM Symposium on Theory of Computing (2003)

    Google Scholar 

  6. Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation and active learning. In: Advances in Neural Information Processing Systems, vol. 7 (1995)

    Google Scholar 

  7. McDonald, R.A., Eckley, I.A., Hand, D.J.: A multi-class extension to the brownboost algorithm. Technical Report TR-03-14, Imperial College, London (2003)

    Google Scholar 

  8. McDonald, R.A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 35–44. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Melville, P., Mooney, R.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of 18th Intl. Joint Conf. on Artificial Intelligence, Acapulco, Mexico, August 2003, pp. 505–510 (2003)

    Google Scholar 

  10. Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems (2004)

    Google Scholar 

  11. Melville, P., Shah, N., Mihalkova, L., Mooney, R.J.: Experiments on ensembles with missing and noisy data. Technical Report UT-AI-TR-04-310, University of Texas at Austin (2004)

    Google Scholar 

  12. Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  13. Ross Quinlan, J.: Bagging, boosting, and C4.5. In: Proc. of 13th Natl. Conf. on Artificial Intelligence (AAAI 1996), Portland, OR, August 1996, pp. 725–730 (1996)

    Google Scholar 

  14. Servedio, R.A.: Smooth boosting and learning with malicious noise. The Journal of Machine Learning Research 4, 633–648 (2003)

    Article  MathSciNet  Google Scholar 

  15. Verbaeten, S., Van Assche, A.: Ensemble methods for noise elimination in classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Webb, G.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40 (2000)

    Google Scholar 

  17. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Melville, P., Shah, N., Mihalkova, L., Mooney, R.J. (2004). Experiments on Ensembles with Missing and Noisy Data. In: Roli, F., Kittler, J., Windeatt, T. (eds) Multiple Classifier Systems. MCS 2004. Lecture Notes in Computer Science, vol 3077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25966-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25966-4_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22144-9

  • Online ISBN: 978-3-540-25966-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics