Model-Selection Uncertainty with Examples
The understanding of model-selection uncertainty requires that one consider the process that generates the sample data we observe. For a given field, laboratory, or computer simulation study, data are observed on some process or system. If a second, independent, data set could be observed on the same process or system under nearly identical conditions, the new data set would differ somewhat from the first. Clearly, both data sets would contain information about the process, but the information would likely be slightly different, by chance. An obvious goal of data analysis is to make an inference about the process based on the data observed. A less obvious goal of data analysis is to make inferences about the process that are not overly specific with respect to the (single) data set observed. That is, we would like our inferences to be robust, with respect to the particular data set observed, in such a way that we tend to avoid problems associated with over-fitting (overinterpreting) the limited data we have. Thus, we would like some ability to make inferences about the process as if a large number of other data sets were also available. The interpretation of a confidence interval is similar; i.e., in repeated samples from the process, 95% of the data sets will generate a confidence interval that includes the true parameter value. This idea extends to the idea of generating a confidence (sub) set of the models considered such that with high relative frequency, over samples, that set of models contains the actual K-L best model of the set of models considered, while being as small a subset as possible (analogous to short confidence intervals).
KeywordsSampling Variance Bootstrap Sample Capture Probability Akaike Weight Relative Likelihood
Unable to display preview. Download preview PDF.