Minimum Description Length Model Selection in Gaussian Regression under Data Constraints
The normalized maximum likelihood (NML) formulation of the stochastic complexity Rissanen () contains two components: the maximized log likelihood and a component that may be interpreted as the parametric complexity of the model. The stochastic complexity for the data, relative to a suggested model, serves as a criterion for model selection. The calculation of the stochastic complexity can be considered as an implementation of the minimum description length principle (MDL) (cf. Rissanen ). To obtain an NML based model selection criterion for the Gaussian linear regression, Rissanen  constrains the data space appropriately. In this paper we demonstrate the effect of the data constraints on the selection criterion. In fact, we obtain various forms of the criterion by reformulating the shape of the data constraints. A special emphasis is placed on the performance of the criterion when collinearity is present in data.
KeywordsMinimum Description Length Data Constraint Potential Explanatory Variable Minimum Description Length Principle Stochastic Complexity
Unable to display preview. Download preview PDF.
- 1.Akaike, H.: Information theory as an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
- 3.Burnham, K.P., Anderson, D.R.: Model Selection and Multi-model Inference. Springer, New York (2002)Google Scholar
- 6.Grünwald, P.D.: The Minimum Description Length Principle. MIT, London (2007)Google Scholar
- 8.Liski, E.P.: Normalized ML and the MDL Principle for Variable Selection in Linear Regression. In: Festschrift for Tarmo Pukkila on his 60th Birthday, pp. 159–172, Tampere, Finland (2006)Google Scholar