Skip to main content

The Out-of-Source Error in Multi-Source Cross Validation-Type Procedures

  • Chapter
  • First Online:
New Advances in Statistics and Data Science

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

A scientific phenomenon under study may often be manifested by data arising from processes, i.e. sources, that may describe this phenomenon. In this context of multi-source data, we define the “out-of-source” error, that is the error committed when a new observation of unknown source origin is allocated to one of the sources using a rule that is trained on the known labeled data. We present an unbiased estimator of this error, and discuss its variance. We derive natural and easily verifiable assumptions under which the consistency of our estimator is guaranteed for a broad class of loss functions and data distributions. Finally, we evaluate our theoretical results via a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Afendras, G., & Markatou, M. (2016). Optimality of training/test size and resampling effectiveness of cross-validation estimators of the generalization error. arXiv:1511.02980v1 [math.ST].

    Google Scholar 

  • Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40–79.

    Article  MathSciNet  MATH  Google Scholar 

  • Ben-David, S., Blitzer, J, Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79, 151–175.

    Article  MathSciNet  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.

    MATH  Google Scholar 

  • Billingsley, P. (1995). Probability and measure, 3rd ed. Wiley series in probability and mathematical statistics. New York: Wiley.

    MATH  Google Scholar 

  • Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350), 320–328.

    Article  MATH  Google Scholar 

  • Geras, K., & Sutton, C. (2013). Multiple-source cross-validation. In Proceedings of the 30 th International Conference on Machine Learning, Atlanta, GA (2013). JMLR: W&CP, 28(3), 1292–1300.

    Google Scholar 

  • Isserlis, L. (1918). On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika, 12, 134–139.

    Article  Google Scholar 

  • Markatou, M., Tian, H, Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6, 1127–1168.

    MathSciNet  MATH  Google Scholar 

  • Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281.

    Article  MATH  Google Scholar 

  • Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: Rationale, conduct, and reporting. British Medical Journal, 340, c221. https://doi.org/doi:10.1136/bmj.c221

    Article  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B, 36(2), 111–147.

    MathSciNet  MATH  Google Scholar 

  • Stone, M. (1977). Asymptotics for and against cross-validation. Biometrika, 64(1), 29–35.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Dr. Markatou would like to thank the Jacobs School of Medicine and Biomedical Science for facilitating this work through institutional financial resources (to M. Markatou) that supported the work of the first author of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Afendras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Afendras, G., Markatou, M. (2017). The Out-of-Source Error in Multi-Source Cross Validation-Type Procedures. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_2

Download citation

Publish with us

Policies and ethics