Abstract
Two factors that slow down the deployment of classification or supervised learning in real-world situations. One is the reality that data are not perfect in practice, while the other is the fact that every technique has its own limits. Although there have been techniques developed to resolve issues about imperfectness of real-world data, there is no single one that outperforms all others and each such technique focuses on some types of imperfectness. Furthermore, quite a few works apply ensembles of heterogeneous classifiers to such situations. In this paper, we report a work on progress that studies the impact of heterogeneity on ensemble, especially focusing on the following aspects: diversity and classification quality for imbalanced data. Our goal is to evaluate how introducing heterogeneity into ensemble influences its behavior and performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Aksela, M.: Comparison of Classifier Selection Methods for Improving Committee Performance. MCS, 84–93 (2003)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. School of Information and Computer Science. University of California, Irvine (2007)
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Ensemble Diversity Measures and their Application to Thinning. Information Fusion J. 6(1), 49–62 (2005)
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Com-parison of De-cision Tree Ensemble Creation Techniques. IEEE Trans. on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)
Banfield, R.E., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Boosting Lite - Handling Larger data sets and Slower Base Classifiers. MCS (2007)
Chawla, N., Moore, T., Bowyer, K., Hall, L., Springer, C., Kegelmeyer, P.: Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction. In: Workshop on Data Min-ing in Bioinformatics, KDD (2001)
Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learn-ing from Imbalanced data sets. SIGKDD Expl. 6(1), 1–6 (2004)
Cieslak, D., Chawla, N.: Start Globally, Optimize Locally, Predict Globally: Improving Performance on Unbalanced Data. In: ICDM (2008)
Fan, T.-G., Zhu, Y., Chen, J.-M.: A new measure of classifier diversity in multiple classifier system. ICMLC 1, 18–21 (2008)
Forman, G., Cohen, I.: Learning from Little: Comparison of Classifiers Given Little Training. ECML. HPL-2004-19R1 (2004)
Friedman, J.H.: Multivariate adaptive regression splines. Ann. of Stat. 19(1), 1–67 (1991)
Hettich, S., Bay, S.D.: The UCI KDD Archive. Department of Informa-tion and Computer Science. University of California, Irvine (1999)
Hsu, K.-W., Srivastava, J.: Diversity in Combinations of Heterogeneous Classifiers. In: PAKDD (2009)
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences, vol. 30 (2001)
Kuncheva, L.I., Whitaker, C.J.: 10 measures of diversity in classifier ensembles: limits for two classifiers. In: A DERA/IEE Workshop on Intelligent Sensor Processing, pp. 10/1–10/10 (2001)
Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier En-sembles and Their Relationship with the Ensemble Accuracy. Mach. Learn. 51(2), 181–207 (2003)
Kuncheva, L.I.: That elusive diversity in classifier ensembles. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 1126–1138. Springer, Heidelberg (2003)
Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory under-sampling for class-imbalance learning. In: ICDM, pp. 965–969 (2006)
Partridge, D., Krzanowski, W.J.: Software diversity: practical statistics for its meas-urement and exploitation. Information and Software Technol-ogy 39, 707–717 (1997)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Weiss, G.M.: Mining with Rarity: A Unifying Framework. SIGKDD Expl. 6(1), 7–19 (2004)
Weiss, G.M.: Mining Rare Cases. In: O. Data Mining and Knowledge Discovery Hand-book: A Complete Guide for Practitioners and Research-ers, pp. 765–776 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Whitaker, C.J., Kuncheva, L.I.: Examining the relationship between ma-jority vote accuracy and diversity in bagging and boosting, Technical Re-port, School of Informatics, University of Wales, Bangor (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hsu, KW., Srivastava, J. (2010). An Empirical Study of Applying Ensembles of Heterogeneous Classifiers on Imperfect Data. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-14640-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14639-8
Online ISBN: 978-3-642-14640-4
eBook Packages: Computer ScienceComputer Science (R0)