Multivariate Regressions, Genetic Algorithms, and Information Complexity: A Three Way Hybrid

  • Peter Bearse
  • Hamparsum Bozdogan
Conference paper


We develop a computationally feasible intelligent data mining and knowledge discovery technique to select the best subset of predictors in multivariate regression (MR) models. Our approach integrates novel statistical modeling procedures based on the information-theoretic measure of complexity (ICOMP) criterion with the genetic algorithm (GA). When ICOMP is used as the fitness function, the GA, which by itself is an extremely clever non-local optimization algorithm, becomes an intelligent statistical model selection device capable of pruning combinatorially large numbers of sub-models to obtain an optimal or near-optimal subset MR model of multivariate data. We demonstrate our approach by determining the best predictors of taste and odor in a Japanese rice wine (i.e., sake) data set.


Genetic Algorithm Fitness Function Mating Pool Subset Model Feasible Generalize Little Square 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1973), Information theory and an extension of the maximum likelihood principle, In B.N. Petrov and F. Csaki (eds), Second International Symposium on Information Theory, Academiai Kiado, Budapest, 267–81.Google Scholar
  2. Bearse, P.M. and Bozdogan, H. (1998), Subset selection in vector autoregressive models using the genetic algorithm with informational complexity as the fitness function, Systems Analysis, Modeling, Simulation (SAMS), Vol. 31, pp. 61–91.MATHGoogle Scholar
  3. Bozdogan, H. (2000), Akaike’s Information Criterion and Recent Developments in Information Complexity, Journal of Mathematical Psychology, 44, 62–91.MathSciNetCrossRefMATHGoogle Scholar
  4. Harris, C.J. (1978), An information theoretic approach to estimation, In M.J. Greg-son (ed.), Recent Theoretical Developments in Control, London: Academic Press, pp. 563–590.Google Scholar
  5. Holland, J.H. (1975), Adaptation in natural and artificial systems (Ann Arbor, MI USA: University of Michigan Press). Second Edition, 1992, MIT Press.Google Scholar
  6. Kullback, S. (1968), Information Theory and Statistics, New York: Dover Publications.Google Scholar
  7. Kullback, S. and Leibler, R. (1951), On information and sufficiency, Annals of Mathematical Statistics, Vol. 22, pp. 79–86.MathSciNetCrossRefMATHGoogle Scholar
  8. Theil, H. and Feibig, D.G. (1984), Exploiting continuity: maximum entropy estimation of continuous distributions, Cambridge, MA: Ballinger Publishing Company.Google Scholar
  9. Van Emden, M. H. (1971), An Analysis of Complexity, Mathematical Centre Tracts, 35, Amsterdam.Google Scholar
  10. Zellner, A. (1962), An efficient method of estimating seemingly unrelated regression and tests for aggregation bias, Journal of the American Statistical Association, Vol. 57, pp. 348–368.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Japan 2002

Authors and Affiliations

  • Peter Bearse
    • 1
  • Hamparsum Bozdogan
    • 2
  1. 1.Department of EconomicsUniversity of North Carolina at GreensboroGreensboroUSA
  2. 2.Department of Statistics336 SMC, The University of TennesseeKnoxvilleUSA

Personalised recommendations