A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression
High correlation among predictors has long been an annoyance in regression analysis. The crux of the problem is that the linear regression model assumes each predictor has an independent effect on the response that can be encapsulated in the predictor’s regression coefficient. When predictors are highly correlated, the data do not contain much information on the independent effects of each predictor. The high correlation among predictors can result in large standard errors for the regression coefficients and coefficients with signs opposite of what is expected based on a priori, subject-matter theory. We propose a Bayesian model that accounts for correlation among the predictors by simultaneously performing selection and clustering of the predictors. Our model combines a Dirichlet process prior and a variable selection prior for the regression coefficients. In our model highly correlated predictors can be grouped together by setting their corresponding coefficients exactly equal. Similarly, redundant predictors can be removed from the model through the variable selection component of our prior. We demonstrate the competitiveness of our method through simulation studies and analysis of real data.
AMS Subject Classification62F15 62J07 62P25
Key-wordsDirichlet process Variable selection Stochastic search
Unable to display preview. Download preview PDF.
- Belsley, D.A., 1984. Demeaning conditioning diagnostics through centering (with discussion). The American Statistician, 38, 73–93.Google Scholar
- Blanchard, O.J., 1987. Comment. Journal of Business and Economic Statisitics, 5, 449–451.Google Scholar
- Ehrlich, I., 197. The deterrent effect of capital punishment: a question of life or death. American Economic Review, 65, 397–417.Google Scholar
- Goldberger, A.S., 1991. A Course in Econometrics. Harvard University Press.Google Scholar
- Hill, R.C., Adkins, L.C., 2003. Collinearity. In A Companion to Theoretical Econometrics, Baltagi, B.H. (Editor), Chapter 12, 256–278, Blackwell Publishing.Google Scholar
- Plummer, M., Best, N., Cowles, K., Vines K., 2007. coda: Output analysis and diagnostics for MCMC. R package, version 0.12–1.Google Scholar
- Thomas, A., O’Hara, B., Ligges, U., Sturtz, S., 2006. Making BUGS open. R News, 6, 12–17.Google Scholar
- Vandaele, W., 1978. Participation in illegitimate activities: Ehrlich revisited. In Deterrence and Incapacitation, Blumstein, A., Cohen, J. and Nagin, D. (Editors), 270–335, National Academy of Sciences Press.Google Scholar