Minimum Description Length Model Selection in Gaussian Regression under Data Constraints

  • Erkki P. Liski
  • Antti Liski


The normalized maximum likelihood (NML) formulation of the stochastic complexity Rissanen ([10]) contains two components: the maximized log likelihood and a component that may be interpreted as the parametric complexity of the model. The stochastic complexity for the data, relative to a suggested model, serves as a criterion for model selection. The calculation of the stochastic complexity can be considered as an implementation of the minimum description length principle (MDL) (cf. Rissanen [12]). To obtain an NML based model selection criterion for the Gaussian linear regression, Rissanen [11] constrains the data space appropriately. In this paper we demonstrate the effect of the data constraints on the selection criterion. In fact, we obtain various forms of the criterion by reformulating the shape of the data constraints. A special emphasis is placed on the performance of the criterion when collinearity is present in data.


Minimum Description Length Data Constraint Potential Explanatory Variable Minimum Description Length Principle Stochastic Complexity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akaike, H.: Information theory as an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
  2. 2.
    Belsley, H.: Conditioning Diagnostics. Wiley, New York (1991)MATHGoogle Scholar
  3. 3.
    Burnham, K.P., Anderson, D.R.: Model Selection and Multi-model Inference. Springer, New York (2002)Google Scholar
  4. 4.
    Cramer, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)MATHGoogle Scholar
  5. 5.
    Draper, N.R., Smith, H.: Applied Regression Analysis, 2nd edn. Wiley, New York (1981)MATHGoogle Scholar
  6. 6.
    Grünwald, P.D.: The Minimum Description Length Principle. MIT, London (2007)Google Scholar
  7. 7.
    Hansen, A.J., Yu, B.: Model Selection and the Principle of Minimum Description Length. J. Am. Stat. Assoc. 96, 746–774 (2001)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Liski, E.P.: Normalized ML and the MDL Principle for Variable Selection in Linear Regression. In: Festschrift for Tarmo Pukkila on his 60th Birthday, pp. 159–172, Tampere, Finland (2006)Google Scholar
  9. 9.
    Miller, A.: Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, New York (2002)MATHGoogle Scholar
  10. 10.
    Rissanen, J.: Fisher Information and Stochastic Complexity. IEEE Trans. Inf. Theory, IT-42, 1, 40–47 (1996)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Rissanen, J.: MDL Denoising. IEEE Trans. Inf. Theory, IT-46, 1, 2537–2543 (2000)CrossRefGoogle Scholar
  12. 12.
    Rissanen, J.: Information and Complexity and in Statistical Modeling. Springer, New York (2007)MATHGoogle Scholar
  13. 13.
    Schwarz, G.: Estimating the Dimension of a Model. Ann. Stat. 6, 461–464 (1978)MATHCrossRefGoogle Scholar

Copyright information

© Physica-Verlag Heidelberg 2009

Authors and Affiliations

  1. 1.University of TampereTampereFinland
  2. 2.Tampere University of TechnologyTampereFinland

Personalised recommendations