On the Evaluation of the Homogeneous Ensembles with CV-Passports

Nikulin, Vladimir; Bakharia, Aneesha; Huang, Tian-Hsiang

doi:10.1007/978-3-642-40319-4_10

Vladimir Nikulin²⁵,
Aneesha Bakharia²⁶ &
Tian-Hsiang Huang²⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3457 Accesses
3 Citations

Abstract

Ensembles are often capable of greater prediction accuracy than any of their individual members. As a consequence of the diversity between individual base-learners, an ensemble will not suffer from overfitting. In this regard, development of a systematic and automatic approach for the evaluation of ensemble solutions is particularly important. Based on the mechanism of homogeneous ensembling (known, also, as bagging), we can construct a passport of the solution as a unified validation trajectory against all available training data. Assuming that passports mimic closely the corresponding test solutions, we can use them for the consideration of many tasks including optimizations of blends and ensembles, calculation of the biases and any other tests as required. The reported results were obtained online during the International PAKDD data mining competition in 2010, where we were awarded a certificate for the fourth best result. We, also, report results from the second most popular contest on the Kaggle platform named ‘‘Credit’’, where we demonstrate one of the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Djukova, E.V., Zhuravlev, Y.I., Sotnezov, R.M.: Construction of an ensemble of logical correctors on the basis of elementary classifiers. Pattern Recognition and Image Analysis 21(4), 599–605 (2011)
Article Google Scholar
Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9, 2015–2033 (2007)
MathSciNet Google Scholar
Wang, W.: Some fundamental issues in ensemble methods. In: World Congress on Computational Intelligence, pp. 2244–2251. IEEE, Hong Kong (2008)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Zhang, B.-L., Pham, T.D., Zhang, Y.: Bagging support vector machine for classification of SELDI-toF mass spectra of ovarian cancer serum samples. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 820–826. Springer, Heidelberg (2007)
Chapter Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MathSciNet MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Article MATH Google Scholar
Nikulin, V.: Learning with mean-variance filtering, SVM and gradient-based optimization. In: International Joint Conference on Neural Networks, Vancouver, BC, Canada, July 16-21, pp. 4195–4202. IEEE (2006)
Google Scholar
Nikulin, V.: Classification of imbalanced data with random sets and mean-variance filtering. International Journal of Data Warehousing and Mining 4(2), 63–78 (2008)
Article MathSciNet Google Scholar
Nikulin, V., McLachlan, G.J., Ng, S.K.: Ensemble approach for the classification of imbalanced data. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 291–300. Springer, Heidelberg (2009)
Chapter Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Article MathSciNet MATH Google Scholar
Nikulin, V., McLachlan, J.: Classification of imbalanced marketing data with balanced random sets. JMLR: Workshop and Conference Proceedings 7, 89–100 (2009)
Google Scholar
Nikulin, V.: On the homogeneous ensembling with balanced random sets and boosting. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 180–189. Springer, Heidelberg (2012)
Chapter Google Scholar
Heckerman, J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)
Article MathSciNet Google Scholar
Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., Boulesteix, A.L.: Over-optimism in bioinformatics: an illustration. Bioinformatics 26(16), 1990–1998 (2010)
Article Google Scholar
Carpenter, J.: the best analyst win. Science 331, 698–699 (2011)
Article Google Scholar
Cudeck, R., Browne, M.: Cross-validation of covariance structures. Multivariate Behavioral Research 18(2), 147–167 (1983)
Article Google Scholar
Efimov, D., Nikulin, V.: Prediction of a biological response of molecules from their chemical properties. Advanced Science (published in Russian by Vyatka State University) 2(2), 107–123 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Methods in Economy, Vyatka State University, Kirov, Russia
Vladimir Nikulin
Faculty of Science and Engineering, Queensland University of Technology, Australia
Aneesha Bakharia
Cloud Computing Research Center, National Sun Yat-Sen University, Taiwan
Tian-Hsiang Huang

Authors

Vladimir Nikulin
View author publications
You can also search for this author in PubMed Google Scholar
Aneesha Bakharia
View author publications
You can also search for this author in PubMed Google Scholar
Tian-Hsiang Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Mathematical Sciences, University of South Australia, 1 Mawson Lakes Boulevard, 5095, Adelaide, SA, Australia
Jiuyong Li
Advanced Analytics Institute, University of Technology, 2-12 Blackfriars Street, Chippendale, Blackfriars Campus, 2008, Sydney, NSW, Australia
Longbing Cao & Can Wang &
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117576, Singapore, Singapore
Kay Chen Tan
School of Automation, Guangdong University of Technology, No. 100 Waihuan Xi Road, Panyu District, 510006, Guangzhou, China
Bo Liu
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, University Road, 701, Tainan, Taiwan
Vincent S. Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikulin, V., Bakharia, A., Huang, TH. (2013). On the Evaluation of the Homogeneous Ensembles with CV-Passports. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-40319-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics