An Empirical Study of Applying Ensembles of Heterogeneous Classifiers on Imperfect Data

Hsu, Kuo-Wei; Srivastava, Jaideep

doi:10.1007/978-3-642-14640-4_3

Kuo-Wei Hsu²⁷ &
Jaideep Srivastava²⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5669))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

662 Accesses
5 Citations

Abstract

Two factors that slow down the deployment of classification or supervised learning in real-world situations. One is the reality that data are not perfect in practice, while the other is the fact that every technique has its own limits. Although there have been techniques developed to resolve issues about imperfectness of real-world data, there is no single one that outperforms all others and each such technique focuses on some types of imperfectness. Furthermore, quite a few works apply ensembles of heterogeneous classifiers to such situations. In this paper, we report a work on progress that studies the impact of heterogeneity on ensemble, especially focusing on the following aspects: diversity and classification quality for imbalanced data. Our goal is to evaluate how introducing heterogeneity into ensemble influences its behavior and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Aksela, M.: Comparison of Classifier Selection Methods for Improving Committee Performance. MCS, 84–93 (2003)
Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine (2007)
Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble Diversity Measures and their Application to Thinning. Information Fusion J. 6(1), 49–62 (2005)
Article Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Com-parison of De-cision Tree Ensemble Creation Techniques. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)
Article Google Scholar
Banfield, R.E., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Boosting Lite - Handling Larger data sets and Slower Base Classifiers. MCS (2007)
Google Scholar
Chawla, N., Moore, T., Bowyer, K., Hall, L., Springer, C., Kegelmeyer, P.: Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction. In: Workshop on Data Min-ing in Bioinformatics, KDD (2001)
Google Scholar
Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learn-ing from Imbalanced data sets. SIGKDD Expl. 6(1), 1–6 (2004)
Article Google Scholar
Cieslak, D., Chawla, N.: Start Globally, Optimize Locally, Predict Globally: Improving Performance on Unbalanced Data. In: ICDM (2008)
Google Scholar
Fan, T.-G., Zhu, Y., Chen, J.-M.: A new measure of classifier diversity in multiple classifier system. ICMLC 1, 18–21 (2008)
Google Scholar
Forman, G., Cohen, I.: Learning from Little: Comparison of Classifiers Given Little Training. ECML. HPL-2004-19R1 (2004)
Google Scholar
Friedman, J.H.: Multivariate adaptive regression splines. Ann. of Stat. 19(1), 1–67 (1991)
Article MATH Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive. Department of Informa-tion and Computer Science. University of California, Irvine (1999)
Google Scholar
Hsu, K.-W., Srivastava, J.: Diversity in Combinations of Heterogeneous Classifiers. In: PAKDD (2009)
Google Scholar
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Google Scholar
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30 (2001)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: 10 measures of diversity in classifier ensembles: limits for two classifiers. In: A DERA/IEE Workshop on Intelligent Sensor Processing, pp. 10/1–10/10 (2001)
Google Scholar
Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier En-sembles and Their Relationship with the Ensemble Accuracy. Mach. Learn. 51(2), 181–207 (2003)
Article MATH Google Scholar
Kuncheva, L.I.: That elusive diversity in classifier ensembles. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1126–1138. Springer, Heidelberg (2003)
Google Scholar
Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory under-sampling for class-imbalance learning. In: ICDM, pp. 965–969 (2006)
Google Scholar
Partridge, D., Krzanowski, W.J.: Software diversity: practical statistics for its meas-urement and exploitation. Information and Software Technol-ogy 39, 707–717 (1997)
Article Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Weiss, G.M.: Mining with Rarity: A Unifying Framework. SIGKDD Expl. 6(1), 7–19 (2004)
Article Google Scholar
Weiss, G.M.: Mining Rare Cases. In: O. Data Mining and Knowledge Discovery Hand-book: A Complete Guide for Practitioners and Research-ers, pp. 765–776 (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Whitaker, C.J., Kuncheva, L.I.: Examining the relationship between ma-jority vote accuracy and diversity in bagging and boosting, Technical Re-port, School of Informatics, University of Wales, Bangor (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Kuo-Wei Hsu & Jaideep Srivastava

Authors

Kuo-Wei Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Jaideep Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Thammasat University, Sirindhorn International Institute of Technology,, 131 Moo 5 Tiwanont Road, Bangkadi, 12000, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Department of Architecture for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka,Ibaraki, 567-0047, Osaka, Japan
Cholwich Nattee
Center for Informatics, Federal University of Pernambuco, Brazil
Paulo J. L. Adeodato
Computer Science and Engineering Department, University of Notre Dame, 353 Fitzpatrick Hall, 46556, Notre Dame, IN, USA
Nitesh Chawla
Department of Computer Science, The Australian National University, Australia
Peter Christen
TELECOM Bretagne, Lab-STICC, Institut TELECOM, Brest, France
Philippe Lenca
School of Information Technologies, University of Sydney, P.O. Box, Australia
Josiah Poon
Australian Taxation Office, Australia
Graham Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsu, KW., Srivastava, J. (2010). An Empirical Study of Applying Ensembles of Heterogeneous Classifiers on Imperfect Data. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-14640-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14639-8
Online ISBN: 978-3-642-14640-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics