Skip to main content

On the Evaluation of the Homogeneous Ensembles with CV-Passports

  • Conference paper
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Included in the following conference series:

Abstract

Ensembles are often capable of greater prediction accuracy than any of their individual members. As a consequence of the diversity between individual base-learners, an ensemble will not suffer from overfitting. In this regard, development of a systematic and automatic approach for the evaluation of ensemble solutions is particularly important. Based on the mechanism of homogeneous ensembling (known, also, as bagging), we can construct a passport of the solution as a unified validation trajectory against all available training data. Assuming that passports mimic closely the corresponding test solutions, we can use them for the consideration of many tasks including optimizations of blends and ensembles, calculation of the biases and any other tests as required. The reported results were obtained online during the International PAKDD data mining competition in 2010, where we were awarded a certificate for the fourth best result. We, also, report results from the second most popular contest on the Kaggle platform named ‘‘Credit’’, where we demonstrate one of the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Djukova, E.V., Zhuravlev, Y.I., Sotnezov, R.M.: Construction of an ensemble of logical correctors on the basis of elementary classifiers. Pattern Recognition and Image Analysis 21(4), 599–605 (2011)

    Article  Google Scholar 

  2. Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9, 2015–2033 (2007)

    MathSciNet  Google Scholar 

  3. Wang, W.: Some fundamental issues in ensemble methods. In: World Congress on Computational Intelligence, pp. 2244–2251. IEEE, Hong Kong (2008)

    Google Scholar 

  4. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Zhang, B.-L., Pham, T.D., Zhang, Y.: Bagging support vector machine for classification of SELDI-toF mass spectra of ovarian cancer serum samples. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 820–826. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  8. Nikulin, V.: Learning with mean-variance filtering, SVM and gradient-based optimization. In: International Joint Conference on Neural Networks, Vancouver, BC, Canada, July 16-21, pp. 4195–4202. IEEE (2006)

    Google Scholar 

  9. Nikulin, V.: Classification of imbalanced data with random sets and mean-variance filtering. International Journal of Data Warehousing and Mining 4(2), 63–78 (2008)

    Article  MathSciNet  Google Scholar 

  10. Nikulin, V., McLachlan, G.J., Ng, S.K.: Ensemble approach for the classification of imbalanced data. In: Nicholson, A., Li, X. (eds.) AI 2009. LNCS, vol. 5866, pp. 291–300. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  12. Nikulin, V., McLachlan, J.: Classification of imbalanced marketing data with balanced random sets. JMLR: Workshop and Conference Proceedings 7, 89–100 (2009)

    Google Scholar 

  13. Nikulin, V.: On the homogeneous ensembling with balanced random sets and boosting. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 180–189. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Heckerman, J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)

    Article  MathSciNet  Google Scholar 

  15. Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., Boulesteix, A.L.: Over-optimism in bioinformatics: an illustration. Bioinformatics 26(16), 1990–1998 (2010)

    Article  Google Scholar 

  16. Carpenter, J.: the best analyst win. Science 331, 698–699 (2011)

    Article  Google Scholar 

  17. Cudeck, R., Browne, M.: Cross-validation of covariance structures. Multivariate Behavioral Research 18(2), 147–167 (1983)

    Article  Google Scholar 

  18. Efimov, D., Nikulin, V.: Prediction of a biological response of molecules from their chemical properties. Advanced Science (published in Russian by Vyatka State University) 2(2), 107–123 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nikulin, V., Bakharia, A., Huang, TH. (2013). On the Evaluation of the Homogeneous Ensembles with CV-Passports. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40319-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40318-7

  • Online ISBN: 978-3-642-40319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics