Abstract
In this paper, we address two issues that have long plagued researchers in statistical modeling and data mining. The first is well-known as the “curse of dimensionality”. Very large datasets are becoming more and more frequent, as mankind is now measuring everything he can as frequently as he can. Statistical analysis techniques developed even 50 years ago can founder in all this data. The second issue we address is that of model misspecification – specifically that of an incorrect assumed functional form. These issues are addressed in the context of multivariate regression modeling. To drive dimension reduction and model selection, we use the newly developed form of Bozdogan’s ICOMP, introduced in Bozdogan and Howe (Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function, Technical report 1, (2012)), that penalizes models with a complexity measure of the “sandwich” model covariance matrix. This information criterion is used by the genetic algorithm as the objective function in a two-step hybrid dimension reduction process. First, we use probabilistic principle components analysis to independently reduce the number of response and predictor variables. Then, we use the genetic algorithm with the multivariate Gaussian regression model to identify the best subset regression model. We apply these methods to identify a substantially reduced multivariate regression relationship for a dataset regarding Italian high school students. From 29 response variables, we get 4, and from 46 regressors, we get 1.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B (Methodological),26, 211–246.
Bozdogan, H. (1988). Icomp: A new model-selection criteria. In H. Bock (Ed.), Classification and related methods of data analysis (pp. 599–608). Amsterdam: Elsevier.
Bozdogan, H. (2004). Intelligent statistical data mining with information complexity and genetic algorithms. In H. Bozdogan (Ed.), Statistical data mining and knowledge discovery (pp. 15–56). Boca Raton: Chapman and Hall/CRC.
Bozdogan, H., & Howe, J. (2009). The curse of dimensionality in large-scale experiments using a novel hybridized dimension reduction approach. The University of Tennessee. (Tech. Rep. 1).
Bozdogan, H., & Howe, J. (2012). Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function. European Journal of Pure and Applied Mathematics, 5(2), 211–249.
Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Boston: Addison-Wesley.
Haupt, R., & Haupt, S. (2004). Practical genetic algorithms. Hoboken: Wiley.
Holland, J. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: The University of Michigan Press.
Kullback, A., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics,22, 79–86.
Magnus, J. (2007). The asymptotic variance of the pseudo maximum likelihood estimator. Econometric Theory,23, 1022–1032.
Magnus, J., & Neudecker, H. (1988). Matrix differential calculus with applications in statistis and econometrics. New York: Wiley.
Mardia, K. (1974). Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya,B36, 115–128.
Tipping, M., & Bishop, C. (1997). Probabilistic principal component analysis (Tech. Rep. NCRG/97/010). Neural Computing Research Group, Aston University.
Van Emden, M. (1971). An analysis of complexity. In Mathematical centre tracts (Vol. 35). Amsterdam: Mathematisch Centrum.
Vose, M. (1999). The simple genetic algorithm: Foundations and theory. Cambridge: MIT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bozdogan, H., Howe, J.A., Katragadda, S., Liberati, C. (2013). Misspecification Resistant Model Selection Using Information Complexity with Applications. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-28894-4_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)