Restrictions on candidate predictors
A major problem in predictive modelling is that we often have many candidate predictors available for the analysis, while the data set available for analysis is relatively small. A small sample size leads to problems as discussed in Chap. 5, such as limited power to test main effects of potential predictors, and too extreme predictions when predictions are based on the standard regression coefficients (overfitting). We discuss some procedures to increase the robustness and validity of a predictive model, including restriction of the number of candidate predictors, considering distributions of predictors, combining similar variables, and averaging the effects of similar variables. We provide a detailed description of a case study of modelling similar effects of aspects of family history for robust prediction of the presence of a genetic mutation.