Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu


  • Payam RefaeilzadehEmail author
  • Lei Tang
  • Huan Liu
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_565


Rotation estimation


Cross-validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model. In typical cross-validation, the training and validation sets must cross over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation. Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

In k-fold cross-validation, the data is first partitioned into k equally (or nearly equally) sized segments or folds. Subsequently k iterations of training and validation are performed such that within each iteration a different fold of the data is held out for validation, while the remaining k − 1 folds are used for learning. Figure  1 demonstrates an example with k= 3. The darker section of...
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Bouckaert RR. Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the 20th International Conference on Machine Learning; 2003. p. 51–8.Google Scholar
  2. 2.
    Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895–923.CrossRefGoogle Scholar
  3. 3.
    Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc. 1983;78(382):316–31.MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Geisser S. The predictive sample reuse method with applications. J Am Stat Assoc. 1975;70(350):320–8.zbMATHCrossRefGoogle Scholar
  5. 5.
    Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on AI; 1995. p. 1137–45.Google Scholar
  6. 6.
    Larson S. The shrinkage of the coefficient of multiple correlation. J Educat Psychol. 1931;22(1):45–55.CrossRefGoogle Scholar
  7. 7.
    Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng. 2005;17(4):491–502.CrossRefGoogle Scholar
  8. 8.
    Mosteller F, Tukey JW. Data analysis, including statistics. In: Handbook of social psychology. Reading: Addison-Wesley; 1968.Google Scholar
  9. 9.
    Mosteller F, Wallace DL. Inference in an authorship problem. J Am Stat Assoc. 1963;58(302):275–309.zbMATHGoogle Scholar
  10. 10.
    Refaeilzadeh P, Tang L, Liu H. On comparison of feature selection algorithms. In: Proceedings of AAAI-07 Workshop on Evaluation Methods in Machine Learning II; 2007. p. 34–9.Google Scholar
  11. 11.
    Salzberg S. On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Disc. 1997;1(3):317–28.CrossRefGoogle Scholar
  12. 12.
    Stone M. Cross-validatory choice and assessment of statistical predictions. J Royal Stat Soc. 1974;36(2):111–47.MathSciNetzbMATHGoogle Scholar
  13. 13.
    Tang L, Liu H. Community detection and mining in social media. Morgan & Claypool Publishers, San Rafael; 2010.MathSciNetCrossRefGoogle Scholar
  14. 14.
    Zafarani R, Abbasi MA, Liu H. Social media mining: an introduction. Cambridge University Press, New York; 2014.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Google Inc.Los AngelesUSA
  2. 2.Chief Data Scientist, Clari Inc.SunnyvaleUSA
  3. 3.Data Mining and Machine Learning Lab, School of Computing, Informatics, and Decision Systems EngineeringArizona State UniversityTempeUSA

Section editors and affiliations

  • Kyuseok Shim
    • 1
  1. 1.School of Elec. Eng. and Computer ScienceSeoul National Univ.SeoulRepublic of Korea