Abstract
As the size and complexity of models grow, the choice of the best model becomes a difficult and challenging task. Once the best model is specified, the goodness of fit of the model needs to be examined first. A highly complex model may provide a good fit, but giving no consideration to model complexity could result in incorrect estimates of parameter values and predictions. In order to improve the model selection process, model complexity needs to be defined clearly. This article studies different aspects of model complexity and discusses the extent to which they can be measured. The most common attribute that is usually ignored from many complexity measures is the parameter prior, which is an inherent part of the model and could impact the complexity significantly. The concept of parameter prior and its connection to model complexity are therefore discussed here, and some relationships to the entropy measure elements are also addressed.
Similar content being viewed by others
References
Balakrishnan, N., C. Koukouvinos, and C. Parpoula. 2012. Analysis of a supersaturated design using entropy prior complexity for binary responses via generalized linear models. Stat. Methodol., 9, 478–185.
Balasubramanian, V. 1997. Statistical inference, Occam’s Razor, and statistical mechanics on the space of probability distributions. Neural Comput., 9, 349–368.
Bennett, C. H. 1986. On the nature and origin of complexity in discrete, homogeneous locally-interacting systems. Found. Phys., 16, 585–592.
Berger, A. L, S. Della Pietra, and V. J. Della Pietra. 1996. A maximum-entropy approach to natural language processing. Comput. Linguistics, 22, 39–71.
Bialek, W., I. Nemenman, and N. Tishby. 2001. Predictability, complexity, and learning. Neural Comput., 13, 2409–2463.
Brooks, R. J., and A. M. Tobias. 1996. Choosing the best model: Level of detail, complexity and model performance. Math. Comput. Model., 24, 1–14.
Brookshear, J. G. 1989. Theory of computation: Formal languages, automata, and complexity. Redwood City, CA: Benjamin-Cummings Publishing Company.
Bueso, M. C., G. Qian, and J. M. Angulo. 1999. Stochastic complexity and model selection from incomplete data. J. Stat. Plan. Inference, 76, 273–284.
Catalan, R. G., J. Garay, and R. López-Ruiz. 2002. Features of the extension of a statistical measure of complexity for continuous systems. Phys. Rev. E, 66, 011102(6).
Caticha, A. 2007. Information and entropy. In Bayesian inference and maximum entropy methods in science and engineering, ed. K. Knuth et al., AIP Conf. Proc., vol. 954, 11. New York, NY: AIP.
Charles, S. B. 2002. A comparison of marginal likelihood computation methods. In COMPSTAT 2002: Proceedings in computational statistics, ed. W. Härdle and B. Ronz, 111–117. Berlin, Heidelberg: Springer-Verlag.
Crutchfield, J. P., and K. Young. 1989. Inferring statistical complexity. Phys. Rev. Lett., 63, 105–108.
Della Pietra, S., V. J. Della Pietra, and J. D. Lafferty. 1997. Inducing features of random fields. IEEE Trans. Pattern Anal. Machine Intelligence, 19, 380–393.
Dunn, J. 2000. Model complexity: The fit to random data reconsidered. Psychol. Res., 63, 174–182.
Feldman, D. P., and J. P. Crutchfield. 1998. Measures of statistical complexity. Phys. Lett. A, 238, 244–252.
Grünwald, P. D. 2005. MDL tutorial. In Advances in minimum description length: Theory and applications, ed. P. D. Grünwald, I. J. Myung, and M. A. Pitt, 16–17. Cambridge, MA: MIT Press.
Grünwald, P. D. 2007. The minimum description length principle. Cambridge, MA: MIT Press.
Hall, P., and J. Hannan. 1988. On stochastic complexity and nonparametric density estimation. Biometrika, 75, 705–714.
Hansen, A. J., and B. Yu. 2001. Model selection and the principle of minimum description length. J. Am. Stat. Assoc., 96, 746–774.
Hopcroft, J. E., R. Motwani, and J. D. Ullman. 2000. Introduction to automata theory, languages, and computation, 3rd ed. Reading, MA: Addison-Wesley.
Jaynes, E. T. 2003. Probability theory—The logic of science. Cambridge, UK: Cambridge University Press.
Kass, R. E., and A. E. Raftery. 1995. Bayes factors. J. Am. Stat. Assoc., 90, 773–795.
Lee, M. D. 2002. Generating additive clustering models with minimal stochastic complexity. J. Classification, 19, 69–85.
Li, M., and P. M. B. Vitanyi. 1993. An introduction to Kolmogorov complexity and its applications. New York, NY: Springer-Verlag.
López-Ruiz, R., H. L. Mancini, and X. Calbet. 1995. A statistical measure of complexity. Phys. Lett. A, 209, 321–326.
Myung, I. J., and M. A. Pitt. 1997. Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bull. Rev., 4, 79–95.
Myung, I. J. 2000. The importance of complexity in model selection. J. Math. Psychol., 44, 190–204.
Myung, I. J., V. Balasubramanian, and M. A. Pitt. 2000. Counting probability distributions: Differential geometry and model selection. Proc. Nat. Acad. Sci. USA, 97, 11170–11175.
Rissanen, J. 1986. Stochastic complexity and modeling. Ann. Statistics, 14, 1080–1100.
Rissanen, J. 1987. Stochastic complexity (with discussion). J. R. Stat. Soc. Ser. B, 49, 223–265.
Rissanen, J. 1989. Stochastic complexity in statistical inquiry. Singapore: World Scientific Publishing Company.
Rissanen, J. 1996. Fisher information and stochastic complexity. IEEE Trans. Information Theory, 42, 40–47.
Rissanen, J. 2005. Complexity and information in modeling. Chapter IV In Computability, complexity and constructivity in economic analysis, ed. K. Velupillai, chap. IV. Oxford, UK: Blackwell.
Rissanen, J. 2007. Information and complexity in statistical modeling. New York, NY: Springer-Verlag.
Rissanen, J. 2012. Optimal estimation of parameters. Cambridge, UK: Cambridge University Press.
Shannon, C. E. 1948. A mathematical theory of communication. Bell System Tech. J., 27, 379–423, 623–656.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. Van der Linde. 2002. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B, 64, 583–639 (with discussion).
Van der Linde, A. 2012. A Bayesian view of model complexity. Stat. Neerland., 66, 253–271.
Vanpaemel, W. 2009. Measuring model complexity with the prior predictive. In Advances in neural information processing systems (NIPS), ed. Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, vol. 22, 1919–1927. Red Hook, NY: Curran Associates.
Wallis, K. F. 2006. A note on the calculation of entropy from histograms. Unpublished paper, University of Warwick, Coventry, UK.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Balakrishnan, N., Koukouvinos, C. & Parpoula, C. On the Computation of Entropy Prior Complexity and Marginal Prior Distribution for the Bernoulli Model. J Stat Theory Pract 9, 59–72 (2015). https://doi.org/10.1080/15598608.2014.897139
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2014.897139