Skip to main content

Using Boosting to Detect Noisy Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2112))

Abstract

Noisy data is inherent in many real-life and industrial modelling situations. If prior knowledge of such data was available, it would be a simple process to remove or account for noise and improve model robustness. Unfortunately, in the majority of learning situations, the presence of underlying noise is suspected but difficult to detect.

Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases, and this paper demonstrates that they may also be employed as noise detectors. Recently defined diagnostics such as edge and margin (Breiman, 1997; Freund & Schapire, 1997; Schapire et al., 1998) have been used to explain the improvements made in generalisation error when ensemble classifiers are built. The distributions of these measures are key in the noise detection process introduced in this study.

This paper presents some empirical results on edge distributions which confirm exisiting theories on boosting’s tendency to ‘balance’ error rates. The results are then extended to introduce a methodology whereby boosting may be used to identify noise in training data by examining the changes in edge and margin distributions as boosting proceeds.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  • Breiman, L. (1996a). Bagging predictors. Machine Learning, 26(2), 123–140.

    Google Scholar 

  • Breiman, L. (1996b). Bias, Variance and Arcing Classifiers (Technical Report 460). Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Breiman, L. (1997). Arcing the edge (Technical Report 486). Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Breiman, L. (1999). Random Forests-Random Features (Technical Report 567). Statistics Department, University of California, Berkeley.

    Google Scholar 

  • Dietterich, T.G. (1997). Machine learning research: Four current directions. AI Magazine, 18(4), 99–137.

    Google Scholar 

  • Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148–156). Morgan Kaufmann.

    Google Scholar 

  • Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalisation to on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman, J.H. (1997). On bias, variance, 0/1-loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77.

    Article  Google Scholar 

  • Friedman, J.H., Hastie, T. & Tibshirani, R. (1998). Additive logistic regression: a statistical perspective on boosting. (Technical Report 199). Department of Statistics, Stanford Univeristy.

    Google Scholar 

  • Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J.R. (1996). Bagging, boosting and C4.5. Proceedings of the Thirteenth National Conference on Articifical Intelligence. (pp. 725–730). Menlo Park California, American Association for Artificial Intelligence.

    Google Scholar 

  • Schapire, R.E., Freund, Y., Bartlett, P. & Lee, W.S. (1998). Boosting the margin:a new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.

    Article  MATH  MathSciNet  Google Scholar 

  • Schapire, R.E. & Singer, Y. (1998). Improved boosting algorithms using confidence rated predictions. Proceedings of the Eleventh Computational Learning Theory (pp.80–91)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wheway, V. (2001). Using Boosting to Detect Noisy Data. In: Kowalczyk, R., Loke, S.W., Reed, N.E., Williams, G.J. (eds) Advances in Artificial Intelligence. PRICAI 2000 Workshop Reader. PRICAI 2000. Lecture Notes in Computer Science(), vol 2112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45408-X_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-45408-X_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42597-7

  • Online ISBN: 978-3-540-45408-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics