Building an MLR Model

Olive, David J.

doi:10.1007/978-3-319-55252-1_3

David J. Olive²

8490 Accesses
1 Citations

Abstract

Building a multiple linear regression (MLR) model from data is one of the most challenging regression problems. The “final full model” will have response variable Y = t(Z), a constant x ₁, and predictor variables x ₂ = t ₂(w ₂, …, w _r), …, x _p = t _p(w ₂, …, w _r) where the initial data consists of Z, w ₂, …, w _r. Choosing t, t ₂, …, t _p so that the final full model is a useful MLR approximation to the data can be difficult.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ashworth, H. (1842). Statistical illustrations of the past and present state of Lancashire. Journal of the Royal Statistical Society, A, 5, 245–256.
Google Scholar
Atkinson, A. C. (1985). Plots, transformations, and regression. Oxford: Clarendon Press.
MATH Google Scholar
Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38, 73–77.
Google Scholar
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.
Book MATH Google Scholar
Bertsimas, D., King, A., & Mazmunder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.
Article MathSciNet MATH Google Scholar
Bickel, P. J., & Doksum, K. A. (1981). An analysis of transformations revisited. Journal of the American Statistical Association, 76, 296–311.
Article MathSciNet MATH Google Scholar
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–246.
MathSciNet MATH Google Scholar
Box, G. E. P., & Cox, D. R. (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77, 209–210.
Article MathSciNet MATH Google Scholar
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer.
MATH Google Scholar
Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304.
Article MathSciNet Google Scholar
Buxton, L. H. D. (1920). The anthropology of Cyprus. The Journal of the Royal Anthropological Institute of Great Britain and Ireland, 50, 183–235.
Article Google Scholar
Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.
MATH Google Scholar
Chang, J., & Olive, D. J. (2010). OLS for 1D regression models. Communications in statistics: Theory and methods, 39, 1869–1882.
Article MathSciNet MATH Google Scholar
Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: Wiley.
Book MATH Google Scholar
Chen, C. H., & Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, 289–316.
MathSciNet MATH Google Scholar
Claeskins, G., & Hjort, N. L. (2003). The focused information criterion (with discussion). Journal of the American Statistical Association, 98, 900–916.
Article MathSciNet Google Scholar
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. New York, NY: Cambridge University Press.
Book MATH Google Scholar
Cook, R. D. (1977). Deletion of influential observations in linear regression. Technometrics,19, 15–18.
Google Scholar
Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, 351–362.
Article MathSciNet MATH Google Scholar
Cook, R. D., & Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, 592–599.
Article MATH Google Scholar
Cook, R. D., & Olive, D. J. (2001). A note on visualizing response transformations in regression. Technometrics, 43, 443–449.
Article MathSciNet Google Scholar
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.
MATH Google Scholar
Cook, R. D., & Weisberg, S. (1994). Transforming a response variable for linearity. Biometrika, 81, 731–737.
Article MATH Google Scholar
Cook, R. D., & Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, 490–499.
Article MathSciNet MATH Google Scholar
Cook, R. D., & Weisberg, S. (1999a). Applied regression including computing and graphics. New York, NY: Wiley.
Google Scholar
Cook, R. D., & Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? The American Statistician, 53, 29–37.
Google Scholar
Daniel, C., & Wood, F. S. (1980). Fitting equations to data (2nd ed.). New York, NY: Wiley.
MATH Google Scholar
Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd ed.). New York, NY: Wiley.
MATH Google Scholar
Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM.
Book MATH Google Scholar
Efron, B. (2014), Estimation and accuracy after model selection (with discussion). Journal of the American Statistical Association, 109, 991–1007.
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32, 407–451.
Article MathSciNet MATH Google Scholar
Ferrari, D., & Yang, Y. (2015). Confidence sets for model selection by F–testing. Statistica Sinica, 25, 1637–1658.
MathSciNet MATH Google Scholar
Fox, J. (1991). Regression diagnostics. Newbury Park, CA: Sage Publications.
Book Google Scholar
Freedman, D. A. (1983). A note on screening regression equations. The American Statistician, 37, 152–155.
MathSciNet Google Scholar
Freedman, D. A. (2005). Statistical models theory and practice. New York, NY: Cambridge University Press.
Book MATH Google Scholar
Frey, J. (2013). Data-driven nonparametric prediction intervals. Journal of Statistical Planning and Inference, 143, 1039–1048.
Article MathSciNet MATH Google Scholar
Furnival, G., & Wilson, R. (1974). Regression by leaps and bounds. Technometrics, 16, 499–511.
Article MATH Google Scholar
Gilmour, S. G. (1996). The interpretation of Mallows’s C _p-statistic. The Statistician, 45, 49–56.
Article Google Scholar
Gladstone, R. J. (1905). A study of the relations of the brain to the size of the head. Biometrika, 4, 105–123.
Article Google Scholar
Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application: A data oriented approach. New York, NY: Marcel Dekker.
MATH Google Scholar
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
Article MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York, NY: Springer.
Book MATH Google Scholar
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: CRC Press Taylor & Francis.
MATH Google Scholar
Hebbler, B. (1847). Statistics of Prussia. Journal of the Royal Statistical Society, A, 10, 154–186.
Google Scholar
Hinkley, D. V., & Runger, G. (1984). The analysis of transformed data (with discussion). Journal of the American Statistical Association, 79, 302–320.
Article MathSciNet MATH Google Scholar
Hjort, N. L., & Claeskins, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.
Article MathSciNet MATH Google Scholar
Hoaglin, D. C., & Welsh, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32, 17–22.
MATH Google Scholar
Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217.
Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York, NY: Springer.
Book MATH Google Scholar
Jones, H. L. (1946). Linear regression functions with neglected variables. Journal of the American Statistical Association, 41, 356–369.
Article MathSciNet MATH Google Scholar
Kenard, R. W. (1971). A note on the C _p statistics. Technometrics, 13, 899–900.
Google Scholar
Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.
Article MathSciNet MATH Google Scholar
Léger, C., & Altman, N. (1993). Assessing influence in variable selection problems. Journal of the American Statistical Association, 88, 547–556.
MathSciNet Google Scholar
Li, K. C., & Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.
Article MathSciNet MATH Google Scholar
Linhart, H., & Zucchini, W. (1986). Model selection. New York, NY: Wiley.
MATH Google Scholar
Mallows, C. (1973). Some comments on C _p. Technometrics, 15, 661–676.
MATH Google Scholar
McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.
Article Google Scholar
McKenzie, J. D., & Goldman, R. (1999). The student edition of MINITAB. Reading, MA: Addison Wesley Longman.
Google Scholar
Nishi, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics, 12, 758–765.
Article MathSciNet Google Scholar
Olive, D. J. (2002). Applications of robust distances for regression. Technometrics, 44, 64–71.
Article MathSciNet Google Scholar
Olive, D. J. (2004b). Visualizing 1D regression. In M. Hubert, G. Pison, A. Struyf, & S. Van Aelst (Eds.), Theory and applications of recent robust methods (pp. 221–233). Basel, Switzerland: Birkhäuser.
Google Scholar
Olive, D. J. (2005). Two simple resistant regression estimators. Computational Statistics & Data Analysis, 49, 809–819.
Article MathSciNet MATH Google Scholar
Olive, D. J. (2008), Applied robust statistics, online course notes, see http://lagrange.math.siu.edu/Olive/ol-bookp.htm
Google Scholar
Olive, D. J. (2013a), Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. International Journal of Statistics and Probability, 2, 90–100.
Google Scholar
Olive, D. J. (2013b). Plots for generalized additive models. Communications in Statistics: Theory and Methods, 42, 2610–2628.
Google Scholar
Olive, D. J. (2016a). Bootstrapping hypothesis tests and confidence regions, preprint, see http://lagrange.math.siu.edu/Olive/ppvselboot.pdf
Olive, D. J. (2016b). Applications of hyperellipsoidal prediction regions. Statistical Papers, to appear.
Google Scholar
Olive, D. J. (2016c). Robust multivariate analysis. New York, NY: Springer, to appear.
Google Scholar
Olive, D. J., & Hawkins, D. M. (2005). Variable selection for 1D regression models. Technometrics, 47, 43–50.
Article MathSciNet Google Scholar
Olive, D. J., & Hawkins, D.M. (2010). Robust multivariate location and dispersion, preprint at http://lagrange.math.siu.edu/Olive/pphbmld.pdf
Google Scholar
Olive, D. J., & Hawkins, D. M. (2011). Practical high breakdown regression, preprint at http://lagrange.math.siu.edu/Olive/pphbreg.pdf
Google Scholar
Pelawa Watagoda, L. C. R. (2017). Inference After Variable Selection. Ph.D. Thesis, Southern Illinois University, online at http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf
Pelawa Watagoda, L. C. R., & Olive, D. J. (2017). Inference after variable selection, preprint at http://lagrange.math.siu.edu/Olive/ppvsinf.pdf
Google Scholar
Rouncefield, M. (1995). The statistics of poverty and inequality. Journal of Statistics and Education, 3, online www.amstat.org/publications/jse/
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY: Wiley.
Book MATH Google Scholar
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
Article Google Scholar
SAS Institute (1985). SAS user’s guide: Statistics. Version 5. Cary, NC: SAS Institute.
Google Scholar
Schaaffhausen, H. (1878). Die Anthropologische Sammlung Des Anatom–ischen Der Universitat Bonn. Archiv fur Anthropologie, 10, 1–65. Appendix.
Google Scholar
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York, NY: Wiley.
Book MATH Google Scholar
Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20–23.
Google Scholar
Tremearne, A. J. N. (1911). Notes on some Nigerian tribal marks. Journal of the Royal Anthropological Institute of Great Britain and Ireland, 41, 162–178.
Article Google Scholar
Tukey, J. W. (1957). Comparative anatomy of transformations. Annals of Mathematical Statistics, 28, 602–632.
Article MathSciNet MATH Google Scholar
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.
MATH Google Scholar
Velilla, S. (1993). A note on the multivariate Box-Cox transformation to normality. Statistics & Probability Letters, 17, 259–263.
Article MathSciNet MATH Google Scholar
Velleman, P. F., & Welsch, R. E. (1981). Efficient computing of regression diagnostics. The American Statistician, 35, 234–242.
MATH Google Scholar
Walls, R. C., & Weeks, D. L. (1969). A note on the variance of a predicted response in regression. The American Statistician, 23, 24–26.
Google Scholar
Yeo, I. K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.
Article MathSciNet MATH Google Scholar
Zhang, J., Olive, D. J., & Ye, P. (2012). Robust covariance matrix estimation with canonical correlation analysis. International Journal of Statistics and Probability, 1, 119–136.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Southern Illinois University, Carbondale, IL, USA
David J. Olive

Authors

David J. Olive
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Olive, D.J. (2017). Building an MLR Model. In: Linear Regression. Springer, Cham. https://doi.org/10.1007/978-3-319-55252-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-55252-1_3
Published: 19 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55250-7
Online ISBN: 978-3-319-55252-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics