Skip to main content

Combining Noise Correction with Feature Selection

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2737))

Abstract

Polishing is a noise correction mechanism which makes use of the inter-relationship between attribute and class values in the data set to identify and selectively correct components that are noisy. We applied polishing to a data set of amino acid sequences and associated information of point mutations of the gene COLIA1 for the classification of the phenotypes of the genetic collagenous disease Osteogenesis Imperfecta (OI). OI is associated with mutations in one or both of the genes COLIA1 and COLIA2. There are at least four known phenotypes of OI, of which type II is the severest and often lethal. Preliminary results of polishing suggest that it can lead to a higher classification accuracy. We further investigated the use of polishing as a scoring mechanism for feature selection, and the effect of the features so derived on the resulting classifier. Our experiments on the OI data set suggest that combining polishing and feature selection is a viable mechanism for improving data quality.

This work was supported by NASA NCC2-1239 and ONR N00014-03-1-0516.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)

    MATH  Google Scholar 

  2. Cardie, C.: Using decision trees to improve case-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 25–32 (1993)

    Google Scholar 

  3. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3(4), 261–283 (1989)

    Google Scholar 

  4. Domingos, P., Pazzani, M.: Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 105–112 (1996)

    Google Scholar 

  5. Drastal, G.: Informed pruning in constructive induction. In: Proceedings of the Eighth International Workshop on Machine Learning, pp. 132–136 (1991)

    Google Scholar 

  6. Gamberger, D., Lavrač, N., Džeroski, S.: Noise elimination in inductive concept learning: A case study in medical diagnosis. In: Proceedings of the Seventh International Workshop on Algorithmic Learning Theory, pp. 199–212 (1996)

    Google Scholar 

  7. Hewett, R., Leuchner, J., Teng, C.M., Mooney, S.D., Klein, T.E.: Compression-based induction and genome data. In: Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference, pp. 344–348 (2002)

    Google Scholar 

  8. Hunter, L., Klein, T.E.: Finding relevant biomolecular features. In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology, pp. 190–197 (1993)

    Google Scholar 

  9. John, G.H.: Robust decision trees: Removing outliers from databases. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 174–179 (1995)

    Google Scholar 

  10. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Conference on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  11. Klein, T.E., Wong, E.: Neural networks applied to the collagenous disease osteogenesis imperfecta. In: Proceedings of the Hawaii International Conference on System Sciences, vol. I, pp. 697–705 (1992)

    Google Scholar 

  12. Kohavi, R., John, G.H.: Wrappers for feature selection. Artificial Intelligence 97(l-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  13. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292 (1996)

    Google Scholar 

  14. Kononenko, I.: Semi-naive Bayesian classifier. In: Proceedings of the Sixth European Working Session on Learning, pp. 206–219 (1991)

    Google Scholar 

  15. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228 (1992)

    Google Scholar 

  16. Liu, H., Motoda, H. (eds.): Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  17. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  18. Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223–228 (1992)

    Google Scholar 

  19. Quinlan, J.R.: Simplifying decision trees. International Journal of Man-Machine Studies 27(3), 221–234 (1987)

    Article  Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  21. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley & Sons, Chichester (1987)

    Book  MATH  Google Scholar 

  22. Teng, C.M.: Correcting noisy data. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 239–248 (1999)

    Google Scholar 

  23. Teng, C.M.: Evaluating noise correction. In: Proceedings of the Sixth Pacific Rim International Conference on Artificial Intelligence. Springer, Heidelberg (2000)

    Google Scholar 

  24. Teng, C.M.: A comparison of noise handling techniques. In: Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference, pp. 269–273 (2001)

    Google Scholar 

  25. Teng, C.M.: Noise correction in genomic data. In: Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, Heidelberg (2003) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Teng, C.M. (2003). Combining Noise Correction with Feature Selection. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2003. Lecture Notes in Computer Science, vol 2737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45228-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45228-7_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40807-9

  • Online ISBN: 978-3-540-45228-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics