Synonyms
Rotation estimation
Definition
Cross-validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model. In typical cross-validation, the training and validation sets must cross over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation. Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.
In k-fold cross-validation, the data is first partitioned into k equally (or nearly equally) sized segments or folds. Subsequently k iterations of training and validation are performed such that within each iteration a different fold of the data is held out for validation, while the remaining k − 1 folds are used for learning. Figure 1 demonstrates an example with k= 3. The darker section of...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Bouckaert RR. Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the 20th International Conference on Machine Learning; 2003. p. 51–8.
Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895–923.
Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc. 1983;78(382):316–31.
Geisser S. The predictive sample reuse method with applications. J Am Stat Assoc. 1975;70(350):320–8.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on AI; 1995. p. 1137–45.
Larson S. The shrinkage of the coefficient of multiple correlation. J Educat Psychol. 1931;22(1):45–55.
Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng. 2005;17(4):491–502.
Mosteller F, Tukey JW. Data analysis, including statistics. In: Handbook of social psychology. Reading: Addison-Wesley; 1968.
Mosteller F, Wallace DL. Inference in an authorship problem. J Am Stat Assoc. 1963;58(302):275–309.
Refaeilzadeh P, Tang L, Liu H. On comparison of feature selection algorithms. In: Proceedings of AAAI-07 Workshop on Evaluation Methods in Machine Learning II; 2007. p. 34–9.
Salzberg S. On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc. 1997;1(3):317–28.
Stone M. Cross-validatory choice and assessment of statistical predictions. J Royal Stat Soc. 1974;36(2):111–47.
Tang L, Liu H. Community detection and mining in social media. Morgan & Claypool Publishers, San Rafael; 2010.
Zafarani R, Abbasi MA, Liu H. Social media mining: an introduction. Cambridge University Press, New York; 2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Refaeilzadeh, P., Tang, L., Liu, H. (2018). Cross-Validation. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_565
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_565
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering