Abstract
Ensembles are often capable of greater prediction accuracy than any of their individual members. As a consequence of the diversity between individual base-learners, an ensemble will not suffer from overfitting. In this regard, development of a systematic and automatic approach for the evaluation of ensemble solutions is particularly important. Based on the mechanism of homogeneous ensembling (known, also, as bagging), we can construct a passport of the solution as a unified validation trajectory against all available training data. Assuming that passports mimic closely the corresponding test solutions, we can use them for the consideration of many tasks including optimizations of blends and ensembles, calculation of the biases and any other tests as required. The reported results were obtained online during the International PAKDD data mining competition in 2010, where we were awarded a certificate for the fourth best result. We, also, report results from the second most popular contest on the Kaggle platform named ‘‘Credit’’, where we demonstrate one of the best results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Djukova, E.V., Zhuravlev, Y.I., Sotnezov, R.M.: Construction of an ensemble of logical correctors on the basis of elementary classifiers. Pattern Recognition and Image Analysis 21(4), 599–605 (2011)
Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9, 2015–2033 (2007)
Wang, W.: Some fundamental issues in ensemble methods. In: World Congress on Computational Intelligence, pp. 2244–2251. IEEE, Hong Kong (2008)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Zhang, B.-L., Pham, T.D., Zhang, Y.: Bagging support vector machine for classification of SELDI-toF mass spectra of ovarian cancer serum samples. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 820–826. Springer, Heidelberg (2007)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Nikulin, V.: Learning with mean-variance filtering, SVM and gradient-based optimization. In: International Joint Conference on Neural Networks, Vancouver, BC, Canada, July 16-21, pp. 4195–4202. IEEE (2006)
Nikulin, V.: Classification of imbalanced data with random sets and mean-variance filtering. International Journal of Data Warehousing and Mining 4(2), 63–78 (2008)
Nikulin, V., McLachlan, G.J., Ng, S.K.: Ensemble approach for the classification of imbalanced data. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 291–300. Springer, Heidelberg (2009)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Nikulin, V., McLachlan, J.: Classification of imbalanced marketing data with balanced random sets. JMLR: Workshop and Conference Proceedings 7, 89–100 (2009)
Nikulin, V.: On the homogeneous ensembling with balanced random sets and boosting. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 180–189. Springer, Heidelberg (2012)
Heckerman, J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)
Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., Boulesteix, A.L.: Over-optimism in bioinformatics: an illustration. Bioinformatics 26(16), 1990–1998 (2010)
Carpenter, J.: the best analyst win. Science 331, 698–699 (2011)
Cudeck, R., Browne, M.: Cross-validation of covariance structures. Multivariate Behavioral Research 18(2), 147–167 (1983)
Efimov, D., Nikulin, V.: Prediction of a biological response of molecules from their chemical properties. Advanced Science (published in Russian by Vyatka State University) 2(2), 107–123 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nikulin, V., Bakharia, A., Huang, TH. (2013). On the Evaluation of the Homogeneous Ensembles with CV-Passports. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)