Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

Yang, Ying; Wu, Xindong; Zhu, Xingquan

doi:10.1007/978-3-540-30116-5_43

Ying Yang²²,
Xindong Wu²² &
Xingquan Zhu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3202))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2252 Accesses
15 Citations

Abstract

Attribute noise can affect classification learning. Previous work in handling attribute noise has focused on those predictable attributes that can be predicted by the class and other attributes. However, attributes can often be predictive but unpredictable. Being predictive, they are essential to classification learning and it is important to handle their noise. Being unpredictable, they require strategies different from those of predictable attributes. This paper presents a study on identifying, cleansing and measuring noise for predictive-but-unpredictable attributes. New strategies are accordingly proposed. Both theoretical analysis and empirical evidence suggest that these strategies are more effective and more efficient than previous alternatives.

Download to read the full chapter text

Chapter PDF

Attribute augmented and weighted naive Bayes

Article 17 November 2022

Noisy Data Set Identification

Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise

Article 06 July 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blake, C.L., Merz, C.J.U.: repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine (1998)
Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: Proc. of the 13th National Conf. on Artificial Intelligence, pp. 799–805 (1996)
Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
MATH Google Scholar
Gamberger, D., Lavrac, N., Dzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Applied Artificial Intelligence 14, 205–223 (2000)
Article Google Scholar
Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proc. of the 16th International Conf. on Machine Learning, pp. 143–151 (1999)
Google Scholar
Guyon, I., Matic, N., Vapnik, V.: Discovering Informative Patterns and Data Cleaning, AAAI/MIT Press, pp. 181–203 (1996)
Google Scholar
Kubica, J., Moore, A.: Probabilistic noise identification and data cleaning. In: Proc. of the 3rd IEEE International Conf. on Data Mining, pp. 131–138 (2003)
Google Scholar
Maletic, J.I., Marcus, A.: Data cleansing: Beyond integrity analysis. In: Proc. of the 5th Conf. on Information Quality, pp. 200–209 (2000)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Rijsbergen, C.J.v.: Information Retrieval, 2nd edn., Butterworths (1979)
Google Scholar
Schwarm, S., Wolfman, S.: Cleaning data with Bayesian methods, Final project report for CSE574, University of Washington (2000)
Google Scholar
Teng, C.M.: Correcting noisy data. In: Proc. of the 16th International Conf. on Machine Learning, pp. 239–248 (1999)
Google Scholar
Teng, C.M.: Applying noise handling techniques to genomic data: A case study. In: Proc. of the 3rd IEEE International Conf. on Data Mining, pp. 743–746 (2003)
Google Scholar
Verbaeten, S.: Identifying mislabeled training examples in ILP classification problems. In: Proc. of the 12th Belgian-Dutch Conf. on Machine Learning, pp. 1–8 (2002)
Google Scholar
Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: Proc. of the 20th International Conf. on Machine Learning, pp. 920–927 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Ying Yang, Xindong Wu & Xingquan Zhu

Authors

Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y., Wu, X., Zhu, X. (2004). Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-30116-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

Abstract

Chapter PDF

Similar content being viewed by others

Attribute augmented and weighted naive Bayes

Noisy Data Set Identification

Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

Abstract

Chapter PDF

Similar content being viewed by others

Attribute augmented and weighted naive Bayes

Noisy Data Set Identification

Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation