Multivariate Regressions, Genetic Algorithms, and Information Complexity: A Three Way Hybrid
We develop a computationally feasible intelligent data mining and knowledge discovery technique to select the best subset of predictors in multivariate regression (MR) models. Our approach integrates novel statistical modeling procedures based on the information-theoretic measure of complexity (ICOMP) criterion with the genetic algorithm (GA). When ICOMP is used as the fitness function, the GA, which by itself is an extremely clever non-local optimization algorithm, becomes an intelligent statistical model selection device capable of pruning combinatorially large numbers of sub-models to obtain an optimal or near-optimal subset MR model of multivariate data. We demonstrate our approach by determining the best predictors of taste and odor in a Japanese rice wine (i.e., sake) data set.
KeywordsGenetic Algorithm Fitness Function Mating Pool Subset Model Feasible Generalize Little Square
Unable to display preview. Download preview PDF.
- Akaike, H. (1973), Information theory and an extension of the maximum likelihood principle, In B.N. Petrov and F. Csaki (eds), Second International Symposium on Information Theory, Academiai Kiado, Budapest, 267–81.Google Scholar
- Harris, C.J. (1978), An information theoretic approach to estimation, In M.J. Greg-son (ed.), Recent Theoretical Developments in Control, London: Academic Press, pp. 563–590.Google Scholar
- Holland, J.H. (1975), Adaptation in natural and artificial systems (Ann Arbor, MI USA: University of Michigan Press). Second Edition, 1992, MIT Press.Google Scholar
- Kullback, S. (1968), Information Theory and Statistics, New York: Dover Publications.Google Scholar
- Theil, H. and Feibig, D.G. (1984), Exploiting continuity: maximum entropy estimation of continuous distributions, Cambridge, MA: Ballinger Publishing Company.Google Scholar
- Van Emden, M. H. (1971), An Analysis of Complexity, Mathematical Centre Tracts, 35, Amsterdam.Google Scholar