Using Boosting to Detect Noisy Data

Wheway, Virginia

doi:10.1007/3-540-45408-X_13

Using Boosting to Detect Noisy Data

Virginia Wheway⁵

Conference paper
First Online: 01 January 2001

427 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2112))

Abstract

Noisy data is inherent in many real-life and industrial modelling situations. If prior knowledge of such data was available, it would be a simple process to remove or account for noise and improve model robustness. Unfortunately, in the majority of learning situations, the presence of underlying noise is suspected but difficult to detect.

Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases, and this paper demonstrates that they may also be employed as noise detectors. Recently defined diagnostics such as edge and margin (Breiman, 1997; Freund & Schapire, 1997; Schapire et al., 1998) have been used to explain the improvements made in generalisation error when ensemble classifiers are built. The distributions of these measures are key in the noise detection process introduced in this study.

This paper presents some empirical results on edge distributions which confirm exisiting theories on boosting’s tendency to ‘balance’ error rates. The results are then extended to introduce a methodology whereby boosting may be used to identify noise in training data by examining the changes in edge and margin distributions as boosting proceeds.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Reference

Breiman, L. (1996a). Bagging predictors. Machine Learning, 26(2), 123–140.
Google Scholar
Breiman, L. (1996b). Bias, Variance and Arcing Classifiers (Technical Report 460). Statistics Department, University of California, Berkeley.
Google Scholar
Breiman, L. (1997). Arcing the edge (Technical Report 486). Statistics Department, University of California, Berkeley.
Google Scholar
Breiman, L. (1999). Random Forests-Random Features (Technical Report 567). Statistics Department, University of California, Berkeley.
Google Scholar
Dietterich, T.G. (1997). Machine learning research: Four current directions. AI Magazine, 18(4), 99–137.
Google Scholar
Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148–156). Morgan Kaufmann.
Google Scholar
Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalisation to on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MATH MathSciNet Google Scholar
Friedman, J.H. (1997). On bias, variance, 0/1-loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77.
Article Google Scholar
Friedman, J.H., Hastie, T. & Tibshirani, R. (1998). Additive logistic regression: a statistical perspective on boosting. (Technical Report 199). Department of Statistics, Stanford Univeristy.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Google Scholar
Quinlan, J.R. (1996). Bagging, boosting and C4.5. Proceedings of the Thirteenth National Conference on Articifical Intelligence. (pp. 725–730). Menlo Park California, American Association for Artificial Intelligence.
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P. & Lee, W.S. (1998). Boosting the margin:a new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5), 1651–1686.
Article MATH MathSciNet Google Scholar
Schapire, R.E. & Singer, Y. (1998). Improved boosting algorithms using confidence rated predictions. Proceedings of the Eleventh Computational Learning Theory (pp.80–91)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Virginia Wheway

Authors

Virginia Wheway
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CSIRO Mathematical and Information Sciences, 723 Swanston Street, Carlton, VIC 3053, Australia
Ryszard Kowalczyk
School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne, VIC 3001, Australia
Seng Wai Loke
Department of Computer and Information Science, Linköping University, 581 83, Linköping, Sweden
Nancy E. Reed
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wheway, V. (2001). Using Boosting to Detect Noisy Data. In: Kowalczyk, R., Loke, S.W., Reed, N.E., Williams, G.J. (eds) Advances in Artificial Intelligence. PRICAI 2000 Workshop Reader. PRICAI 2000. Lecture Notes in Computer Science(), vol 2112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45408-X_13

Download citation

DOI: https://doi.org/10.1007/3-540-45408-X_13
Published: 02 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42597-7
Online ISBN: 978-3-540-45408-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics