In Place of Regression
Assuming an adaptation of Suppes’s analysis of causality, we show that multiple regression methods are fundamentally incorrect procedures for identifying causes. This is because when regressors are correlated the existence of an unmeasured common cause of regressor X i and outcome variable Y may bias estimates of the influence of other regressors X k;, variables having no influence on Y whatsoever may thereby be given significant regression coefficients. The bias may be quite large. Simulation studies show that standard regression model specification procedures make the same error. The strategy of regressing on a larger set of variables and checking stability may compound rather than remedy the problem. A similar difficulty in the estimation of the influence of other regressors arises if some X i is an effect rather than a cause of Y. The problem appears endemic in uses of multiple regression on uncontrolled variables, and unless somehow corrected appears to invalidate many scientific uses of regression methods. We describe an implementation in the TETRAD II program of a model specification algorithm that avoids these and certain other errors in large samples. We illustrate the TETRAD II algorithm by applying it to a number of real and simulated data sets.
KeywordsCausal Structure Conditional Independence Markov Condition Causal Graph Multiple Regression Method
Unable to display preview. Download preview PDF.
- Cooper, G. and Herskovits, E.: 1992, ‘A Bayesian Method for the Induction of Probabilistic Networks from Data’, Machine Learning (to appear).Google Scholar
- Fox, J.: 1984, Linear Statistical Models and Related Methods, Wiley, New York.Google Scholar
- Linthurst, R. A.: 1979, ‘Aeration, Nitrogen, pH and Salinity as Factors Affecting Spartina Alterniflora Growth and Dieback’, Ph.D. thesis, North Carolina State University.Google Scholar
- Mosteller, F. and Tukey, J.: 1977, Data Analysis and Regression, A Second Course in Regression, Addison-Wesley, Massachusetts.Google Scholar
- Rawlings, J.: 1988, Applied Regression Analysis, Wadsworth, Belmont, CA.Google Scholar
- Spirtes, P.: 1992, ‘Building Causal Graphs from Statistical Data in the Presence of Latent Variables’, forthcoming in: B. Skyrms (Ed.), Proceedings of the IX International Congress on Logic, Methodology, and the Philosophy of Science, Uppsala, Sweden, 1991.Google Scholar
- Spirtes, P., Glymour, C., Scheines, R., and Sorensen, S.: 1990, ‘TETRAD Studies of Data for Naval Air Traffic Controller Trainees’, Report to the Navy Personnel Research Development Center, San Diego, CA.Google Scholar
- Spirtes, P., Glymour, C., and Scheines, R.: 1990, ‘Causality from Probability’, in: J. Tiles et al. (Eds.), Evolving Knowledge in Natural Science and Artificial Intelligence, Pitman, London, pp. 181–199.Google Scholar
- Suppes, P.: 1970, A Probabilistic Theory of Causality, North-Holland, Amsterdam.Google Scholar
- Verma, T. and Pearl, J.: 1990b, ‘Equivalence and Synthesis of Causal Models’, Proc. Sixth Conference on Uncertainty in AI, Association for Uncertainty in AI, Inc., Mountain View, CA, pp. 220–227.Google Scholar