Skip to main content

A Rule-Based Scheme for Filtering Examples from Majority Class in an Imbalanced Training Set

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2003)

Abstract

Developing a Computer-Assisted Detection (CAD) system for automatic diagnosis of pulmonary nodules in thoracic CT is a highly challenging research area in the medical domain. It requires a successful application of quite sophisticated, state-of-the-art image processing and pattern recognition technologies. The object recognition and feature extraction phase of such a system generates a huge imbalanced training set, as is the case in many learning problems in medical domain. The performance of concept learning systems is traditionally assessed with the percentage of testing examples classified correctly, termed as accuracy. This accuracy measurement becomes inappropriate for imbalanced training sets like in this case, where the non-nodules (negative) examples outnumber nodule (positive) examples. This paper introduces the mechanism developed for filtering negative examples in the training so as to remove ‘obvious’ ones, and discusses alternative evaluation criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lee, Y., Hara, A., Hara, T., Fujita, H., Itoh, S., Ishigaki, T.: Automated Detection of Pulmonary Nodules in Helical CT Images Based on an Improved Template-Matching Technique. In: IEEE Transactions on Medical Imaging, Vol. 20, No. 7. (2001) 595–604

    Article  Google Scholar 

  2. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. (1967) 281–297

    MathSciNet  Google Scholar 

  3. Bishop, C..: Neural Networks for Pattern Recognition. Oxford University Press, UK (1995)

    Google Scholar 

  4. Dempster, A., Laird, N., Rubin, D.: Maximum Likelihood from Incomplete Data via the EM Algorithm. In: Journal of the Royal Statistical Society. B39(1) (1977) 1–38

    MathSciNet  Google Scholar 

  5. Nickerson, A., Japkowicz, N., Milios, E.: Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets. In: Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics. (2001)

    Google Scholar 

  6. Kubat, M., Holte, R., Matwin, S.,: Learning when Negative Examples Abound. In: Proceedings of ECML-97, Vol. 1224. Springer Verlag, (1997) 146–153

    Google Scholar 

  7. Kubat, M., Matwin, S.,: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proceedings of 14th International Conference on Machine Learning, (1997) 179–186

    Google Scholar 

  8. Metz, C.: Fundamental ROC analysis. In: Beutel, J., Kundel, H., MetterHandbook, R. (eds.): Medical Imaging, Vol. 1. SPIE Press, Bellingham, WA (2000) 751–769

    Google Scholar 

  9. Domingos, P.: Unifying Instance-Based and Rule-Based Induction. In: Machine Learning, Vol. 24, No. 2. (1996) 141–168

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dehmeshki, J., Karaköy, M., Casique, M.V. (2003). A Rule-Based Scheme for Filtering Examples from Majority Class in an Imbalanced Training Set. In: Perner, P., Rosenfeld, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2003. Lecture Notes in Computer Science, vol 2734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45065-3_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45065-3_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40504-7

  • Online ISBN: 978-3-540-45065-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics