Skip to main content

Misspecification Resistant Model Selection Using Information Complexity with Applications

  • Conference paper
  • First Online:
  • 3411 Accesses

Abstract

In this paper, we address two issues that have long plagued researchers in statistical modeling and data mining. The first is well-known as the “curse of dimensionality”. Very large datasets are becoming more and more frequent, as mankind is now measuring everything he can as frequently as he can. Statistical analysis techniques developed even 50 years ago can founder in all this data. The second issue we address is that of model misspecification – specifically that of an incorrect assumed functional form. These issues are addressed in the context of multivariate regression modeling. To drive dimension reduction and model selection, we use the newly developed form of Bozdogan’s ICOMP, introduced in Bozdogan and Howe (Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function, Technical report 1, (2012)), that penalizes models with a complexity measure of the “sandwich” model covariance matrix. This information criterion is used by the genetic algorithm as the objective function in a two-step hybrid dimension reduction process. First, we use probabilistic principle components analysis to independently reduce the number of response and predictor variables. Then, we use the genetic algorithm with the multivariate Gaussian regression model to identify the best subset regression model. We apply these methods to identify a substantially reduced multivariate regression relationship for a dataset regarding Italian high school students. From 29 response variables, we get 4, and from 46 regressors, we get 1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B (Methodological),26, 211–246.

    Google Scholar 

  • Bozdogan, H. (1988). Icomp: A new model-selection criteria. In H. Bock (Ed.), Classification and related methods of data analysis (pp. 599–608). Amsterdam: Elsevier.

    Google Scholar 

  • Bozdogan, H. (2004). Intelligent statistical data mining with information complexity and genetic algorithms. In H. Bozdogan (Ed.), Statistical data mining and knowledge discovery (pp. 15–56). Boca Raton: Chapman and Hall/CRC.

    Google Scholar 

  • Bozdogan, H., & Howe, J. (2009). The curse of dimensionality in large-scale experiments using a novel hybridized dimension reduction approach. The University of Tennessee. (Tech. Rep. 1).

    Google Scholar 

  • Bozdogan, H., & Howe, J. (2012). Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function. European Journal of Pure and Applied Mathematics, 5(2), 211–249.

    MathSciNet  Google Scholar 

  • Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Boston: Addison-Wesley.

    MATH  Google Scholar 

  • Haupt, R., & Haupt, S. (2004). Practical genetic algorithms. Hoboken: Wiley.

    MATH  Google Scholar 

  • Holland, J. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: The University of Michigan Press.

    Google Scholar 

  • Kullback, A., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics,22, 79–86.

    Google Scholar 

  • Magnus, J. (2007). The asymptotic variance of the pseudo maximum likelihood estimator. Econometric Theory,23, 1022–1032.

    Google Scholar 

  • Magnus, J., & Neudecker, H. (1988). Matrix differential calculus with applications in statistis and econometrics. New York: Wiley.

    Book  Google Scholar 

  • Mardia, K. (1974). Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya,B36, 115–128.

    MathSciNet  Google Scholar 

  • Tipping, M., & Bishop, C. (1997). Probabilistic principal component analysis (Tech. Rep. NCRG/97/010). Neural Computing Research Group, Aston University.

    Google Scholar 

  • Van Emden, M. (1971). An analysis of complexity. In Mathematical centre tracts (Vol. 35). Amsterdam: Mathematisch Centrum.

    Google Scholar 

  • Vose, M. (1999). The simple genetic algorithm: Foundations and theory. Cambridge: MIT.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamparsum Bozdogan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bozdogan, H., Howe, J.A., Katragadda, S., Liberati, C. (2013). Misspecification Resistant Model Selection Using Information Complexity with Applications. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_20

Download citation

Publish with us

Policies and ethics