Abstract
Building a multiple linear regression (MLR) model from data is one of the most challenging regression problems. The “final full model” will have response variable Y = t(Z), a constant x 1, and predictor variables x 2 = t 2(w 2, …, w r ), …, x p = t p (w 2, …, w r ) where the initial data consists of Z, w 2, …, w r . Choosing t, t 2, …, t p so that the final full model is a useful MLR approximation to the data can be difficult.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ashworth, H. (1842). Statistical illustrations of the past and present state of Lancashire. Journal of the Royal Statistical Society, A, 5, 245–256.
Atkinson, A. C. (1985). Plots, transformations, and regression. Oxford: Clarendon Press.
Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38, 73–77.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.
Bertsimas, D., King, A., & Mazmunder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.
Bickel, P. J., & Doksum, K. A. (1981). An analysis of transformations revisited. Journal of the American Statistical Association, 76, 296–311.
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–246.
Box, G. E. P., & Cox, D. R. (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77, 209–210.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer.
Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304.
Buxton, L. H. D. (1920). The anthropology of Cyprus. The Journal of the Royal Anthropological Institute of Great Britain and Ireland, 50, 183–235.
Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.
Chang, J., & Olive, D. J. (2010). OLS for 1D regression models. Communications in statistics: Theory and methods, 39, 1869–1882.
Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: Wiley.
Chen, C. H., & Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, 289–316.
Claeskins, G., & Hjort, N. L. (2003). The focused information criterion (with discussion). Journal of the American Statistical Association, 98, 900–916.
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. New York, NY: Cambridge University Press.
Cook, R. D. (1977). Deletion of influential observations in linear regression. Technometrics,19, 15–18.
Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, 351–362.
Cook, R. D., & Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, 592–599.
Cook, R. D., & Olive, D. J. (2001). A note on visualizing response transformations in regression. Technometrics, 43, 443–449.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.
Cook, R. D., & Weisberg, S. (1994). Transforming a response variable for linearity. Biometrika, 81, 731–737.
Cook, R. D., & Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, 490–499.
Cook, R. D., & Weisberg, S. (1999a). Applied regression including computing and graphics. New York, NY: Wiley.
Cook, R. D., & Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? The American Statistician, 53, 29–37.
Daniel, C., & Wood, F. S. (1980). Fitting equations to data (2nd ed.). New York, NY: Wiley.
Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd ed.). New York, NY: Wiley.
Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM.
Efron, B. (2014), Estimation and accuracy after model selection (with discussion). Journal of the American Statistical Association, 109, 991–1007.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32, 407–451.
Ferrari, D., & Yang, Y. (2015). Confidence sets for model selection by F–testing. Statistica Sinica, 25, 1637–1658.
Fox, J. (1991). Regression diagnostics. Newbury Park, CA: Sage Publications.
Freedman, D. A. (1983). A note on screening regression equations. The American Statistician, 37, 152–155.
Freedman, D. A. (2005). Statistical models theory and practice. New York, NY: Cambridge University Press.
Frey, J. (2013). Data-driven nonparametric prediction intervals. Journal of Statistical Planning and Inference, 143, 1039–1048.
Furnival, G., & Wilson, R. (1974). Regression by leaps and bounds. Technometrics, 16, 499–511.
Gilmour, S. G. (1996). The interpretation of Mallows’s C p -statistic. The Statistician, 45, 49–56.
Gladstone, R. J. (1905). A study of the relations of the brain to the size of the head. Biometrika, 4, 105–123.
Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application: A data oriented approach. New York, NY: Marcel Dekker.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York, NY: Springer.
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: CRC Press Taylor & Francis.
Hebbler, B. (1847). Statistics of Prussia. Journal of the Royal Statistical Society, A, 10, 154–186.
Hinkley, D. V., & Runger, G. (1984). The analysis of transformed data (with discussion). Journal of the American Statistical Association, 79, 302–320.
Hjort, N. L., & Claeskins, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.
Hoaglin, D. C., & Welsh, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32, 17–22.
Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York, NY: Springer.
Jones, H. L. (1946). Linear regression functions with neglected variables. Journal of the American Statistical Association, 41, 356–369.
Kenard, R. W. (1971). A note on the C p statistics. Technometrics, 13, 899–900.
Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.
Léger, C., & Altman, N. (1993). Assessing influence in variable selection problems. Journal of the American Statistical Association, 88, 547–556.
Li, K. C., & Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.
Linhart, H., & Zucchini, W. (1986). Model selection. New York, NY: Wiley.
Mallows, C. (1973). Some comments on C p . Technometrics, 15, 661–676.
McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.
McKenzie, J. D., & Goldman, R. (1999). The student edition of MINITAB. Reading, MA: Addison Wesley Longman.
Nishi, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics, 12, 758–765.
Olive, D. J. (2002). Applications of robust distances for regression. Technometrics, 44, 64–71.
Olive, D. J. (2004b). Visualizing 1D regression. In M. Hubert, G. Pison, A. Struyf, & S. Van Aelst (Eds.), Theory and applications of recent robust methods (pp. 221–233). Basel, Switzerland: Birkhäuser.
Olive, D. J. (2005). Two simple resistant regression estimators. Computational Statistics & Data Analysis, 49, 809–819.
Olive, D. J. (2008), Applied robust statistics, online course notes, see http://lagrange.math.siu.edu/Olive/ol-bookp.htm
Olive, D. J. (2013a), Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. International Journal of Statistics and Probability, 2, 90–100.
Olive, D. J. (2013b). Plots for generalized additive models. Communications in Statistics: Theory and Methods, 42, 2610–2628.
Olive, D. J. (2016a). Bootstrapping hypothesis tests and confidence regions, preprint, see http://lagrange.math.siu.edu/Olive/ppvselboot.pdf
Olive, D. J. (2016b). Applications of hyperellipsoidal prediction regions. Statistical Papers, to appear.
Olive, D. J. (2016c). Robust multivariate analysis. New York, NY: Springer, to appear.
Olive, D. J., & Hawkins, D. M. (2005). Variable selection for 1D regression models. Technometrics, 47, 43–50.
Olive, D. J., & Hawkins, D.M. (2010). Robust multivariate location and dispersion, preprint at http://lagrange.math.siu.edu/Olive/pphbmld.pdf
Olive, D. J., & Hawkins, D. M. (2011). Practical high breakdown regression, preprint at http://lagrange.math.siu.edu/Olive/pphbreg.pdf
Pelawa Watagoda, L. C. R. (2017). Inference After Variable Selection. Ph.D. Thesis, Southern Illinois University, online at http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf
Pelawa Watagoda, L. C. R., & Olive, D. J. (2017). Inference after variable selection, preprint at http://lagrange.math.siu.edu/Olive/ppvsinf.pdf
Rouncefield, M. (1995). The statistics of poverty and inequality. Journal of Statistics and Education, 3, online www.amstat.org/publications/jse/
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY: Wiley.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
SAS Institute (1985). SAS user’s guide: Statistics. Version 5. Cary, NC: SAS Institute.
Schaaffhausen, H. (1878). Die Anthropologische Sammlung Des Anatom–ischen Der Universitat Bonn. Archiv fur Anthropologie, 10, 1–65. Appendix.
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York, NY: Wiley.
Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20–23.
Tremearne, A. J. N. (1911). Notes on some Nigerian tribal marks. Journal of the Royal Anthropological Institute of Great Britain and Ireland, 41, 162–178.
Tukey, J. W. (1957). Comparative anatomy of transformations. Annals of Mathematical Statistics, 28, 602–632.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.
Velilla, S. (1993). A note on the multivariate Box-Cox transformation to normality. Statistics & Probability Letters, 17, 259–263.
Velleman, P. F., & Welsch, R. E. (1981). Efficient computing of regression diagnostics. The American Statistician, 35, 234–242.
Walls, R. C., & Weeks, D. L. (1969). A note on the variance of a predicted response in regression. The American Statistician, 23, 24–26.
Yeo, I. K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.
Zhang, J., Olive, D. J., & Ye, P. (2012). Robust covariance matrix estimation with canonical correlation analysis. International Journal of Statistics and Probability, 1, 119–136.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Olive, D.J. (2017). Building an MLR Model. In: Linear Regression. Springer, Cham. https://doi.org/10.1007/978-3-319-55252-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-55252-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55250-7
Online ISBN: 978-3-319-55252-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)