Skip to main content

Building an MLR Model

  • Chapter
  • First Online:
Linear Regression

Abstract

Building a multiple linear regression (MLR) model from data is one of the most challenging regression problems. The “final full model” will have response variable Y = t(Z), a constant x 1, and predictor variables x 2 = t 2(w 2, , w r ), , x p  = t p (w 2, , w r ) where the initial data consists of Z, w 2, , w r . Choosing t, t 2, , t p so that the final full model is a useful MLR approximation to the data can be difficult.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ashworth, H. (1842). Statistical illustrations of the past and present state of Lancashire. Journal of the Royal Statistical Society, A, 5, 245–256.

    Google Scholar 

  • Atkinson, A. C. (1985). Plots, transformations, and regression. Oxford: Clarendon Press.

    MATH  Google Scholar 

  • Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38, 73–77.

    Google Scholar 

  • Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.

    Book  MATH  Google Scholar 

  • Bertsimas, D., King, A., & Mazmunder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel, P. J., & Doksum, K. A. (1981). An analysis of transformations revisited. Journal of the American Statistical Association, 76, 296–311.

    Article  MathSciNet  MATH  Google Scholar 

  • Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–246.

    MathSciNet  MATH  Google Scholar 

  • Box, G. E. P., & Cox, D. R. (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77, 209–210.

    Article  MathSciNet  MATH  Google Scholar 

  • Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer.

    MATH  Google Scholar 

  • Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304.

    Article  MathSciNet  Google Scholar 

  • Buxton, L. H. D. (1920). The anthropology of Cyprus. The Journal of the Royal Anthropological Institute of Great Britain and Ireland, 50, 183–235.

    Article  Google Scholar 

  • Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.

    MATH  Google Scholar 

  • Chang, J., & Olive, D. J. (2010). OLS for 1D regression models. Communications in statistics: Theory and methods, 39, 1869–1882.

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: Wiley.

    Book  MATH  Google Scholar 

  • Chen, C. H., & Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, 289–316.

    MathSciNet  MATH  Google Scholar 

  • Claeskins, G., & Hjort, N. L. (2003). The focused information criterion (with discussion). Journal of the American Statistical Association, 98, 900–916.

    Article  MathSciNet  Google Scholar 

  • Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. New York, NY: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Cook, R. D. (1977). Deletion of influential observations in linear regression. Technometrics,19, 15–18.

    Google Scholar 

  • Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, 351–362.

    Article  MathSciNet  MATH  Google Scholar 

  • Cook, R. D., & Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, 592–599.

    Article  MATH  Google Scholar 

  • Cook, R. D., & Olive, D. J. (2001). A note on visualizing response transformations in regression. Technometrics, 43, 443–449.

    Article  MathSciNet  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.

    MATH  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1994). Transforming a response variable for linearity. Biometrika, 81, 731–737.

    Article  MATH  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, 490–499.

    Article  MathSciNet  MATH  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1999a). Applied regression including computing and graphics. New York, NY: Wiley.

    Google Scholar 

  • Cook, R. D., & Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? The American Statistician, 53, 29–37.

    Google Scholar 

  • Daniel, C., & Wood, F. S. (1980). Fitting equations to data (2nd ed.). New York, NY: Wiley.

    MATH  Google Scholar 

  • Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd ed.). New York, NY: Wiley.

    MATH  Google Scholar 

  • Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM.

    Book  MATH  Google Scholar 

  • Efron, B. (2014), Estimation and accuracy after model selection (with discussion). Journal of the American Statistical Association, 109, 991–1007.

    Article  MathSciNet  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32, 407–451.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferrari, D., & Yang, Y. (2015). Confidence sets for model selection by F–testing. Statistica Sinica, 25, 1637–1658.

    MathSciNet  MATH  Google Scholar 

  • Fox, J. (1991). Regression diagnostics. Newbury Park, CA: Sage Publications.

    Book  Google Scholar 

  • Freedman, D. A. (1983). A note on screening regression equations. The American Statistician, 37, 152–155.

    MathSciNet  Google Scholar 

  • Freedman, D. A. (2005). Statistical models theory and practice. New York, NY: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Frey, J. (2013). Data-driven nonparametric prediction intervals. Journal of Statistical Planning and Inference, 143, 1039–1048.

    Article  MathSciNet  MATH  Google Scholar 

  • Furnival, G., & Wilson, R. (1974). Regression by leaps and bounds. Technometrics, 16, 499–511.

    Article  MATH  Google Scholar 

  • Gilmour, S. G. (1996). The interpretation of Mallows’s C p -statistic. The Statistician, 45, 49–56.

    Article  Google Scholar 

  • Gladstone, R. J. (1905). A study of the relations of the brain to the size of the head. Biometrika, 4, 105–123.

    Article  Google Scholar 

  • Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application: A data oriented approach. New York, NY: Marcel Dekker.

    MATH  Google Scholar 

  • Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.

    Article  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York, NY: Springer.

    Book  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: CRC Press Taylor & Francis.

    MATH  Google Scholar 

  • Hebbler, B. (1847). Statistics of Prussia. Journal of the Royal Statistical Society, A, 10, 154–186.

    Google Scholar 

  • Hinkley, D. V., & Runger, G. (1984). The analysis of transformed data (with discussion). Journal of the American Statistical Association, 79, 302–320.

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort, N. L., & Claeskins, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoaglin, D. C., & Welsh, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32, 17–22.

    MATH  Google Scholar 

  • Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217.

    Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York, NY: Springer.

    Book  MATH  Google Scholar 

  • Jones, H. L. (1946). Linear regression functions with neglected variables. Journal of the American Statistical Association, 41, 356–369.

    Article  MathSciNet  MATH  Google Scholar 

  • Kenard, R. W. (1971). A note on the C p statistics. Technometrics, 13, 899–900.

    Google Scholar 

  • Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.

    Article  MathSciNet  MATH  Google Scholar 

  • Léger, C., & Altman, N. (1993). Assessing influence in variable selection problems. Journal of the American Statistical Association, 88, 547–556.

    MathSciNet  Google Scholar 

  • Li, K. C., & Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.

    Article  MathSciNet  MATH  Google Scholar 

  • Linhart, H., & Zucchini, W. (1986). Model selection. New York, NY: Wiley.

    MATH  Google Scholar 

  • Mallows, C. (1973). Some comments on C p . Technometrics, 15, 661–676.

    MATH  Google Scholar 

  • McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.

    Article  Google Scholar 

  • McKenzie, J. D., & Goldman, R. (1999). The student edition of MINITAB. Reading, MA: Addison Wesley Longman.

    Google Scholar 

  • Nishi, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics, 12, 758–765.

    Article  MathSciNet  Google Scholar 

  • Olive, D. J. (2002). Applications of robust distances for regression. Technometrics, 44, 64–71.

    Article  MathSciNet  Google Scholar 

  • Olive, D. J. (2004b). Visualizing 1D regression. In M. Hubert, G. Pison, A. Struyf, & S. Van Aelst (Eds.), Theory and applications of recent robust methods (pp. 221–233). Basel, Switzerland: Birkhäuser.

    Google Scholar 

  • Olive, D. J. (2005). Two simple resistant regression estimators. Computational Statistics & Data Analysis, 49, 809–819.

    Article  MathSciNet  MATH  Google Scholar 

  • Olive, D. J. (2008), Applied robust statistics, online course notes, see http://lagrange.math.siu.edu/Olive/ol-bookp.htm

    Google Scholar 

  • Olive, D. J. (2013a), Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. International Journal of Statistics and Probability, 2, 90–100.

    Google Scholar 

  • Olive, D. J. (2013b). Plots for generalized additive models. Communications in Statistics: Theory and Methods, 42, 2610–2628.

    Google Scholar 

  • Olive, D. J. (2016a). Bootstrapping hypothesis tests and confidence regions, preprint, see http://lagrange.math.siu.edu/Olive/ppvselboot.pdf

  • Olive, D. J. (2016b). Applications of hyperellipsoidal prediction regions. Statistical Papers, to appear.

    Google Scholar 

  • Olive, D. J. (2016c). Robust multivariate analysis. New York, NY: Springer, to appear.

    Google Scholar 

  • Olive, D. J., & Hawkins, D. M. (2005). Variable selection for 1D regression models. Technometrics, 47, 43–50.

    Article  MathSciNet  Google Scholar 

  • Olive, D. J., & Hawkins, D.M. (2010). Robust multivariate location and dispersion, preprint at http://lagrange.math.siu.edu/Olive/pphbmld.pdf

    Google Scholar 

  • Olive, D. J., & Hawkins, D. M. (2011). Practical high breakdown regression, preprint at http://lagrange.math.siu.edu/Olive/pphbreg.pdf

    Google Scholar 

  • Pelawa Watagoda, L. C. R. (2017). Inference After Variable Selection. Ph.D. Thesis, Southern Illinois University, online at http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf

  • Pelawa Watagoda, L. C. R., & Olive, D. J. (2017). Inference after variable selection, preprint at http://lagrange.math.siu.edu/Olive/ppvsinf.pdf

    Google Scholar 

  • Rouncefield, M. (1995). The statistics of poverty and inequality. Journal of Statistics and Education, 3, online www.amstat.org/publications/jse/

  • Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY: Wiley.

    Book  MATH  Google Scholar 

  • Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.

    Article  Google Scholar 

  • SAS Institute (1985). SAS user’s guide: Statistics. Version 5. Cary, NC: SAS Institute.

    Google Scholar 

  • Schaaffhausen, H. (1878). Die Anthropologische Sammlung Des Anatom–ischen Der Universitat Bonn. Archiv fur Anthropologie, 10, 1–65. Appendix.

    Google Scholar 

  • Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York, NY: Wiley.

    Book  MATH  Google Scholar 

  • Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20–23.

    Google Scholar 

  • Tremearne, A. J. N. (1911). Notes on some Nigerian tribal marks. Journal of the Royal Anthropological Institute of Great Britain and Ireland, 41, 162–178.

    Article  Google Scholar 

  • Tukey, J. W. (1957). Comparative anatomy of transformations. Annals of Mathematical Statistics, 28, 602–632.

    Article  MathSciNet  MATH  Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.

    MATH  Google Scholar 

  • Velilla, S. (1993). A note on the multivariate Box-Cox transformation to normality. Statistics & Probability Letters, 17, 259–263.

    Article  MathSciNet  MATH  Google Scholar 

  • Velleman, P. F., & Welsch, R. E. (1981). Efficient computing of regression diagnostics. The American Statistician, 35, 234–242.

    MATH  Google Scholar 

  • Walls, R. C., & Weeks, D. L. (1969). A note on the variance of a predicted response in regression. The American Statistician, 23, 24–26.

    Google Scholar 

  • Yeo, I. K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, J., Olive, D. J., & Ye, P. (2012). Robust covariance matrix estimation with canonical correlation analysis. International Journal of Statistics and Probability, 1, 119–136.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Olive, D.J. (2017). Building an MLR Model. In: Linear Regression. Springer, Cham. https://doi.org/10.1007/978-3-319-55252-1_3

Download citation

Publish with us

Policies and ethics