Skip to main content

General Aspects of Fitting Regression Models

  • Chapter
Regression Modeling Strategies

Part of the book series: Springer Series in Statistics ((SSS))

Abstract

The ordinary multiple linear regression model is frequently used and has parameters that are easily interpreted. In this chapter we study a general class of regression models, those stated in terms of a weighted sum of a set of independent or predictor variables. It is shown that after linearizing the model with respect to the predictor variables, the parameters in such regression models are also readily interpreted. Also, all the designs used in ordinary linear regression can be used in this general setting. These designs include analysis of variance ( ANOVA) setups, interaction effects, and nonlinear effects. Besides describing and interpreting general regression models, this chapter also describes, in general terms, how the three types of assumptions of regression models can be examined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that it is not necessary to “hold constant” all other variables to be able to interpret the effect of one predictor. It is sufficient to hold constant the weighted sum of all the variables other than X j . And in many cases it is not physically possible to hold other variables constant while varying one, e.g., when a model contains X and X 2 (David Hoaglin, personal communication).

  2. 2.

    This weight is not to be confused with the regression coefficient; rather the weights are \(w_{1},w_{2},\ldots,w_{n}\) and the fitting criterion is \(\sum _{i}^{n}w_{i}(Y _{i} -\hat{ Y _{i}})^{2}\).

  3. 3.

    In other words, under what assumptions does the test have maximum power?

  4. 4.

    Note: To pre-specify knots for restricted cubic spline functions, use something like rcs(predictor, c(t1,t2,t3,t4)), where the knot locations are t1, t2, t3, t4.

  5. 5.

    Note that anova in rms computes all needed test statistics from a single model fit object.

References

  1. H. Ahn and W. Loh. Tree-structured proportional hazards regression modeling. Biometrics, 50:471–485, 1994.

    Article  Google Scholar 

  2. D. G. Altman. Categorising continuous covariates (letter to the editor). Brit J Cancer, 64:975, 1991.

    Article  Google Scholar 

  3. D. G. Altman. Suboptimal analysis using ‘optimal’ cutpoints. Brit J Cancer, 78:556–557, 1998.

    Article  Google Scholar 

  4. D. G. Altman, B. Lausen, W. Sauerbrei, and M. Schumacher. Dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst, 86:829–835, 1994.

    Article  Google Scholar 

  5. P. C. Austin. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Stat Med, 26:2937–2957, 2007.

    Article  MathSciNet  Google Scholar 

  6. H. Belcher. The concept of residual confounding in regression models and some applications. Stat Med, 11:1747–1758, 1992.

    Article  Google Scholar 

  7. K. Berhane, M. Hauptmann, and B. Langholz. Using tensor product splines in modeling exposure–time–response relationships: Application to the Colorado Plateau Uranium Miners cohort. Stat Med, 27:5484–5496, 2008.

    Article  MathSciNet  Google Scholar 

  8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1984.

    Google Scholar 

  9. P. Buettner, C. Garbe, and I. Guggenmoos-Holzmann. Problems in defining cutoff points of continuous prognostic factors: Example of tumor thickness in primary cutaneous melanoma. J Clin Epi, 50:1201–1210, 1997.

    Article  Google Scholar 

  10. J. M. Chambers and T. J. Hastie, editors. Statistical Models in S. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.

    Google Scholar 

  11. A. Ciampi, A. Negassa, and Z. Lou. Tree-structured prediction for censored survival data and the Cox model. J Clin Epi, 48:675–689, 1995.

    Article  Google Scholar 

  12. A. Ciampi, J. Thiffault, J. P. Nakache, and B. Asselain. Stratification by stepwise regression, correspondence analysis and recursive partition. Comp Stat Data Analysis, 1986:185–204, 1986.

    Article  Google Scholar 

  13. L. A. Clark and D. Pregibon. Tree-Based Models. In J. M. Chambers and T. J. Hastie, editors, Statistical Models in S, chapter 9, pages 377–419. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.

    Google Scholar 

  14. W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74:829–836, 1979.

    Article  MathSciNet  Google Scholar 

  15. E. F. Cook and L. Goldman. Asymmetric stratification: An outline for an efficient method for controlling confounding in cohort studies. Am J Epi, 127:626–639, 1988.

    Google Scholar 

  16. D. R. Cox. The regression analysis of binary sequences (with discussion). J Roy Stat Soc B, 20:215–242, 1958.

    Google Scholar 

  17. D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.

    Google Scholar 

  18. N. J. Crichton, J. P. Hinde, and J. Marchini. Models for diagnosing chest pain: Is CART useful? Stat Med, 16:717–727, 1997.

    Article  Google Scholar 

  19. R. B. Davis and J. R. Anderson. Exponential survival trees. Stat Med, 8:947–961, 1989.

    Article  Google Scholar 

  20. C. de Boor. A Practical Guide to Splines. Springer-Verlag, New York, revised edition, 2001.

    Google Scholar 

  21. T. F. Devlin and B. J. Weeks. Spline functions for logistic regression modeling. In Proceedings of the Eleventh Annual SAS Users Group International Conference, pages 646–651, Cary, NC, 1986. SAS Institute, Inc.

    Google Scholar 

  22. S. Durrleman and R. Simon. Flexible regression models with cubic splines. Stat Med, 8:551–561, 1989.

    Article  Google Scholar 

  23. D. Faraggi and R. Simon. A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Stat Med, 15:2203–2213, 1996.

    Article  Google Scholar 

  24. V. Fedorov, F. Mannino, and R. Zhang. Consequences of dichotomization. Pharm Stat, 8:50–61, 2009.

    Article  Google Scholar 

  25. J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.

    Google Scholar 

  26. A. Giannoni, R. Baruah, T. Leong, M. B. Rehman, L. E. Pastormerlo, F. E. Harrell, A. J. Coats, and D. P. Francis. Do optimal prognostic thresholds in continuous physiological variables really exist? Analysis of origin of apparent thresholds, with systematic review for peak oxygen consumption, ejection fraction and BNP. PLoS ONE, 9(1), 2014.

    Google Scholar 

  27. U. S. Govindarajulu, D. Spiegelman, S. W. Thurston, B. Ganguli, and E. A. Eisen. Comparing smoothing techniques in Cox models for exposure-response relationships. Stat Med, 26:3735–3752, 2007.

    Article  MathSciNet  Google Scholar 

  28. P. M. Grambsch and P. C. O’Brien. The effects of transformations and preliminary tests for non-linearity in regression. Stat Med, 10:697–709, 1991.

    Article  Google Scholar 

  29. R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.

    Article  Google Scholar 

  30. R. J. Gray. Spline-based tests in survival analysis. Biometrics, 50:640–652, 1994.

    Article  MathSciNet  Google Scholar 

  31. P. Gustafson. Bayesian regression modeling with interactions and smooth effects. J Am Stat Assoc, 95:795–806, 2000.

    Article  Google Scholar 

  32. F. E. Harrell, K. L. Lee, D. B. Matchar, and T. A. Reichert. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Ca Trt Rep, 69:1071–1077, 1985.

    Google Scholar 

  33. F. E. Harrell, K. L. Lee, and B. G. Pollock. Regression models in clinical studies: Determining relationships between predictors and response. J Nat Cancer Inst, 80:1198–1202, 1988.

    Article  Google Scholar 

  34. T. Hastie. Discussion of “The use of polynomial splines and their tensor products in multivariate function estimation” by C. J. Stone. Appl Stat, 22:177–179, 1994.

    Google Scholar 

  35. T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.

    Google Scholar 

  36. S. G. Hilsenbeck and G. M. Clark. Practical p-value adjustment for optimally selected cutpoints. Stat Med, 15:103–112, 1996.

    Article  Google Scholar 

  37. N. Holländer, W. Sauerbrei, and M. Schumacher. Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Stat Med, 23:1701–1713, 2004.

    Article  Google Scholar 

  38. S. Keleş and M. R. Segal. Residual-based tree-structured survival analysis. Stat Med, 21:313–326, 2002.

    Article  Google Scholar 

  39. B. Lausen and M. Schumacher. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comp Stat Data Analysis, 21(3):307–326, 1996.

    Article  Google Scholar 

  40. M. LeBlanc and J. Crowley. Survival trees by goodness of fit. J Am Stat Assoc, 88:457–467, 1993.

    Article  MathSciNet  Google Scholar 

  41. L. Magee. Nonlocal behavior in polynomial regressions. Am Statistician, 52:20–22, 1998.

    Google Scholar 

  42. R. J. Marshall. The use of classification and regression trees in clinical epidemiology. J Clin Epi, 54:603–609, 2001.

    Article  Google Scholar 

  43. S. E. Maxwell and H. D. Delaney. Bivariate median splits and spurious statistical significance. Psych Bull, 113:181–190, 1993.

    Article  Google Scholar 

  44. D. R. McNeil, J. Trussell, and J. C. Turner. Spline interpolation of demographic data. Demography, 14:245–252, 1977.

    Article  Google Scholar 

  45. B. K. Moser and L. P. Coombs. Odds ratios for a continuous outcome variable without dichotomizing. Stat Med, 23:1843–1860, 2004.

    Article  Google Scholar 

  46. D. R. Ragland. Dichotomizing continuous outcome variables: Dependence of the magnitude of association and statistical power on the cutpoint. Epi, 3:434–440, 1992. See letters to editor May 1993 P. 274-, Vol 4 No. 3.

    Google Scholar 

  47. P. Royston and D. G. Altman. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. ApplStat, 43:429–453, 1994. Discussion pp. 453–467.

    Google Scholar 

  48. P. Royston, D. G. Altman, and W. Sauerbrei. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med, 25:127–141, 2006.

    Article  MathSciNet  Google Scholar 

  49. M. Schemper. Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Stat Med, 7:1257–1266, 1988.

    Article  Google Scholar 

  50. C. Schmoor, K. Ulm, and M. Schumacher. Comparison of the Cox model and the regression tree procedure in analysing a randomized clinical trial. Stat Med, 12:2351–2366, 1993.

    Article  Google Scholar 

  51. G. Schulgen, B. Lausen, J. Olsen, and M. Schumacher. Outcome-oriented cutpoints in quantitative exposure. Am J Epi, 120:172–184, 1994.

    Google Scholar 

  52. M. R. Segal. Regression trees for censored data. Biometrics, 44:35–47, 1988.

    Article  Google Scholar 

  53. L. A. Sleeper and D. P. Harrington. Regression splines in the Cox model with application to covariate effects in liver disease. J Am Stat Assoc, 85:941–949, 1990.

    Article  Google Scholar 

  54. P. L. Smith. Splines as a useful and convenient statistical tool. Am Statistician, 33:57–62, 1979.

    Google Scholar 

  55. C. J. Stone. Comment: Generalized additive models. Statistical Sci, 1:312–314, 1986.

    Article  Google Scholar 

  56. C. J. Stone and C. Y. Koo. Additive splines in statistics. In Proceedings of the Statistical Computing Section ASA, pages 45–48, Washington, DC, 1985.

    Google Scholar 

  57. S. Suissa and L. Blais. Binary regression with continuous outcomes. Stat Med, 14:247–255, 1995.

    Article  Google Scholar 

  58. T. van der Ploeg, P. C. Austin, and E. W. Steyerberg. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology, 14(1):137+, Dec. 2014.

    Google Scholar 

  59. H. Wainer. Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance, 19(1):49–56, 2006.

    Article  MathSciNet  Google Scholar 

  60. S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54:167–178, 1967.

    Article  MathSciNet  Google Scholar 

  61. A. R. Walter, A. R. Feinstein, and C. K. Wells. Coding ordinal independent variables in multiple regression analyses. Am J Epi, 125:319–323, 1987.

    Google Scholar 

  62. Y. Wang, G. Wahba, C. Gu, R. Klein, and B. Klein. Using smoothing spline ANOVA to examine the relation of risk factors to the incidence and progression of diabetic retinopathy. Stat Med, 16:1357–1376, 1997.

    Article  Google Scholar 

  63. H. Zhang. Classification trees for multiple binary responses. J Am Stat Assoc, 93:180–193, 1998.

    Article  Google Scholar 

  64. H. Zhang, T. Holford, and M. B. Bracken. A tree-based method of analysis for prospective studies. Stat Med, 15:37–49, 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Harrell, F.E. (2015). General Aspects of Fitting Regression Models. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_2

Download citation

Publish with us

Policies and ethics