Overfitting and optimism in prediction models

Steyerberg, E.W.

doi:10.1007/978-0-387-77244-8_5

Overfitting and optimism in prediction models

E.W. Steyerberg²

Chapter
First Online: 01 January 2008

10k Accesses
19 Citations

Part of the book series: Statistics for Biology and Health ((SBH))

Background

If we develop a statistical model with the main aim of outcome prediction, we are primarily interested in the validity of the predictions for new subjects, outside the sample under study. A key threat to validity is overfitting, i.e. that the data under study are well described, but that predictions are not valid for new subjects. Overfitting causes optimism about a model's performance in new subjects. After introducing overfitting and optimism, we illustrate overfitting with a simple example of comparisons of mortality figures by hospital. After appreciating the natural variability of outcomes within a single centre, we turn to comparisons across centres. We find that we would exaggerate any true patterns of differences between centres, if we would use the observed average outcomes per centre as predictions of mortality.

A solution is presented, which is generally named “shrinkage.” Estimates per centre are drawn towards the average to improve the quality of predictions. We then turn to overfitting in regression models, and discuss the concepts of selection and estimation bias. Again, shrinkage is a solution, which now draws estimated regression coefficients to less extreme values. Bootstrap resampling is presented as a central technique to correct overfitting and quantify optimism in model performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
¹ The “simple bootstrap” compares the performance of the model from the original sample in bootstrap samples. This was less efficient than the procedure described here, where models from the bootstrap samples are tested in the original sample (see Efron).

Author information

Authors and Affiliations

Department of Public Health, Erasmus MC, 3000, CA, Rotterdam, The Netherlands
E.W. Steyerberg

Authors

E.W. Steyerberg
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Steyerberg, E. (2009). Overfitting and optimism in prediction models. In: Clinical Prediction Models. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/978-0-387-77244-8_5

Download citation

DOI: https://doi.org/10.1007/978-0-387-77244-8_5
Published: 17 September 2008
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77243-1
Online ISBN: 978-0-387-77244-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics