Skip to main content

An Empirical Study of Applying Ensembles of Heterogeneous Classifiers on Imperfect Data

  • Conference paper
New Frontiers in Applied Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5669))

Included in the following conference series:

Abstract

Two factors that slow down the deployment of classification or supervised learning in real-world situations. One is the reality that data are not perfect in practice, while the other is the fact that every technique has its own limits. Although there have been techniques developed to resolve issues about imperfectness of real-world data, there is no single one that outperforms all others and each such technique focuses on some types of imperfectness. Furthermore, quite a few works apply ensembles of heterogeneous classifiers to such situations. In this paper, we report a work on progress that studies the impact of heterogeneity on ensemble, especially focusing on the following aspects: diversity and classification quality for imbalanced data. Our goal is to evaluate how introducing heterogeneity into ensemble influences its behavior and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Aksela, M.: Comparison of Classifier Selection Methods for Improving Committee Performance. MCS, 84–93 (2003)

    Google Scholar 

  3. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine (2007)

    Google Scholar 

  4. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble Diversity Measures and their Application to Thinning. Information Fusion J. 6(1), 49–62 (2005)

    Article  Google Scholar 

  5. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Com-parison of De-cision Tree Ensemble Creation Techniques. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)

    Article  Google Scholar 

  6. Banfield, R.E., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Boosting Lite - Handling Larger data sets and Slower Base Classifiers. MCS (2007)

    Google Scholar 

  7. Chawla, N., Moore, T., Bowyer, K., Hall, L., Springer, C., Kegelmeyer, P.: Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction. In: Workshop on Data Min-ing in Bioinformatics, KDD (2001)

    Google Scholar 

  8. Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learn-ing from Imbalanced data sets. SIGKDD Expl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  9. Cieslak, D., Chawla, N.: Start Globally, Optimize Locally, Predict Globally: Improving Performance on Unbalanced Data. In: ICDM (2008)

    Google Scholar 

  10. Fan, T.-G., Zhu, Y., Chen, J.-M.: A new measure of classifier diversity in multiple classifier system. ICMLC 1, 18–21 (2008)

    Google Scholar 

  11. Forman, G., Cohen, I.: Learning from Little: Comparison of Classifiers Given Little Training. ECML. HPL-2004-19R1 (2004)

    Google Scholar 

  12. Friedman, J.H.: Multivariate adaptive regression splines. Ann. of Stat. 19(1), 1–67 (1991)

    Article  MATH  Google Scholar 

  13. Hettich, S., Bay, S.D.: The UCI KDD Archive. Department of Informa-tion and Computer Science. University of California, Irvine (1999)

    Google Scholar 

  14. Hsu, K.-W., Srivastava, J.: Diversity in Combinations of Heterogeneous Classifiers. In: PAKDD (2009)

    Google Scholar 

  15. John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  16. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30 (2001)

    Google Scholar 

  17. Kuncheva, L.I., Whitaker, C.J.: 10 measures of diversity in classifier ensembles: limits for two classifiers. In: A DERA/IEE Workshop on Intelligent Sensor Processing, pp. 10/1–10/10 (2001)

    Google Scholar 

  18. Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier En-sembles and Their Relationship with the Ensemble Accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  19. Kuncheva, L.I.: That elusive diversity in classifier ensembles. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1126–1138. Springer, Heidelberg (2003)

    Google Scholar 

  20. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory under-sampling for class-imbalance learning. In: ICDM, pp. 965–969 (2006)

    Google Scholar 

  21. Partridge, D., Krzanowski, W.J.: Software diversity: practical statistics for its meas-urement and exploitation. Information and Software Technol-ogy 39, 707–717 (1997)

    Article  Google Scholar 

  22. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  23. Weiss, G.M.: Mining with Rarity: A Unifying Framework. SIGKDD Expl. 6(1), 7–19 (2004)

    Article  Google Scholar 

  24. Weiss, G.M.: Mining Rare Cases. In: O. Data Mining and Knowledge Discovery Hand-book: A Complete Guide for Practitioners and Research-ers, pp. 765–776 (2005)

    Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  26. Whitaker, C.J., Kuncheva, L.I.: Examining the relationship between ma-jority vote accuracy and diversity in bagging and boosting, Technical Re-port, School of Informatics, University of Wales, Bangor (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hsu, KW., Srivastava, J. (2010). An Empirical Study of Applying Ensembles of Heterogeneous Classifiers on Imperfect Data. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14640-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14639-8

  • Online ISBN: 978-3-642-14640-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics