The Out-of-Source Error in Multi-Source Cross Validation-Type Procedures

Afendras, Georgios; Markatou, Marianthi

doi:10.1007/978-3-319-69416-0_2

Georgios Afendras⁹ &
Marianthi Markatou⁹

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1682 Accesses
1 Altmetric

Abstract

A scientific phenomenon under study may often be manifested by data arising from processes, i.e. sources, that may describe this phenomenon. In this context of multi-source data, we define the “out-of-source” error, that is the error committed when a new observation of unknown source origin is allocated to one of the sources using a rule that is trained on the known labeled data. We present an unbiased estimator of this error, and discuss its variance. We derive natural and easily verifiable assumptions under which the consistency of our estimator is guaranteed for a broad class of loss functions and data distributions. Finally, we evaluate our theoretical results via a simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Afendras, G., & Markatou, M. (2016). Optimality of training/test size and resampling effectiveness of cross-validation estimators of the generalization error. arXiv:1511.02980v1 [math.ST].
Google Scholar
Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40–79.
Article MathSciNet MATH Google Scholar
Ben-David, S., Blitzer, J, Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79, 151–175.
Article MathSciNet Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
MATH Google Scholar
Billingsley, P. (1995). Probability and measure, 3rd ed. Wiley series in probability and mathematical statistics. New York: Wiley.
MATH Google Scholar
Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350), 320–328.
Article MATH Google Scholar
Geras, K., & Sutton, C. (2013). Multiple-source cross-validation. In Proceedings of the 30 th International Conference on Machine Learning, Atlanta, GA (2013). JMLR: W&CP, 28(3), 1292–1300.
Google Scholar
Isserlis, L. (1918). On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika, 12, 134–139.
Article Google Scholar
Markatou, M., Tian, H, Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6, 1127–1168.
MathSciNet MATH Google Scholar
Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281.
Article MATH Google Scholar
Riley, R. D., Lambert, P. C., & Abo-Zaid, G. (2010). Meta-analysis of individual participant data: Rationale, conduct, and reporting. British Medical Journal, 340, c221. https://doi.org/doi:10.1136/bmj.c221
Article Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society. Series B, 36(2), 111–147.
MathSciNet MATH Google Scholar
Stone, M. (1977). Asymptotics for and against cross-validation. Biometrika, 64(1), 29–35.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Dr. Markatou would like to thank the Jacobs School of Medicine and Biomedical Science for facilitating this work through institutional financial resources (to M. Markatou) that supported the work of the first author of this paper.

Author information

Authors and Affiliations

Department of Biostatistics, SPHHP and Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
Georgios Afendras & Marianthi Markatou

Authors

Georgios Afendras
View author publications
You can also search for this author in PubMed Google Scholar
Marianthi Markatou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Afendras .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Ding-Geng Chen
Columbia University, New York, New York, USA
Zhezhen Jin
University of California, Los Angeles, California, USA
Gang Li
University of Michigan-Ann Arbor, Ann Arbor, Michigan, USA
Yi Li
National Institutes of Health, Bethesda, Maryland, USA
Aiyi Liu
Georgia State University, Atlanta, Georgia, USA
Yichuan Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Afendras, G., Markatou, M. (2017). The Out-of-Source Error in Multi-Source Cross Validation-Type Procedures. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-69416-0_2
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics