General Aspects of Fitting Regression Models

Harrell, Frank E.

doi:10.1007/978-3-319-19425-7_2

Frank E. Harrell Jr.⁸

Part of the book series: Springer Series in Statistics ((SSS))

206k Accesses
35 Citations
7 Altmetric

Abstract

The ordinary multiple linear regression model is frequently used and has parameters that are easily interpreted. In this chapter we study a general class of regression models, those stated in terms of a weighted sum of a set of independent or predictor variables. It is shown that after linearizing the model with respect to the predictor variables, the parameters in such regression models are also readily interpreted. Also, all the designs used in ordinary linear regression can be used in this general setting. These designs include analysis of variance ( ANOVA) setups, interaction effects, and nonlinear effects. Besides describing and interpreting general regression models, this chapter also describes, in general terms, how the three types of assumptions of regression models can be examined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that it is not necessary to “hold constant” all other variables to be able to interpret the effect of one predictor. It is sufficient to hold constant the weighted sum of all the variables other than X _j. And in many cases it is not physically possible to hold other variables constant while varying one, e.g., when a model contains X and X ² (David Hoaglin, personal communication).
2.
This weight is not to be confused with the regression coefficient; rather the weights are \(w_{1},w_{2},\ldots,w_{n}\) and the fitting criterion is \(\sum _{i}^{n}w_{i}(Y _{i} -\hat{ Y _{i}})^{2}\).
3.
In other words, under what assumptions does the test have maximum power?
4.
Note: To pre-specify knots for restricted cubic spline functions, use something like rcs(predictor, c(t1,t2,t3,t4)), where the knot locations are t1, t2, t3, t4.
5.
Note that anova in rms computes all needed test statistics from a single model fit object.

References

H. Ahn and W. Loh. Tree-structured proportional hazards regression modeling. Biometrics, 50:471–485, 1994.
Article Google Scholar
D. G. Altman. Categorising continuous covariates (letter to the editor). Brit J Cancer, 64:975, 1991.
Article Google Scholar
D. G. Altman. Suboptimal analysis using ‘optimal’ cutpoints. Brit J Cancer, 78:556–557, 1998.
Article Google Scholar
D. G. Altman, B. Lausen, W. Sauerbrei, and M. Schumacher. Dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst, 86:829–835, 1994.
Article Google Scholar
P. C. Austin. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Stat Med, 26:2937–2957, 2007.
Article MathSciNet Google Scholar
H. Belcher. The concept of residual confounding in regression models and some applications. Stat Med, 11:1747–1758, 1992.
Article Google Scholar
K. Berhane, M. Hauptmann, and B. Langholz. Using tensor product splines in modeling exposure–time–response relationships: Application to the Colorado Plateau Uranium Miners cohort. Stat Med, 27:5484–5496, 2008.
Article MathSciNet Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1984.
Google Scholar
P. Buettner, C. Garbe, and I. Guggenmoos-Holzmann. Problems in defining cutoff points of continuous prognostic factors: Example of tumor thickness in primary cutaneous melanoma. J Clin Epi, 50:1201–1210, 1997.
Article Google Scholar
J. M. Chambers and T. J. Hastie, editors. Statistical Models in S. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.
Google Scholar
A. Ciampi, A. Negassa, and Z. Lou. Tree-structured prediction for censored survival data and the Cox model. J Clin Epi, 48:675–689, 1995.
Article Google Scholar
A. Ciampi, J. Thiffault, J. P. Nakache, and B. Asselain. Stratification by stepwise regression, correspondence analysis and recursive partition. Comp Stat Data Analysis, 1986:185–204, 1986.
Article Google Scholar
L. A. Clark and D. Pregibon. Tree-Based Models. In J. M. Chambers and T. J. Hastie, editors, Statistical Models in S, chapter 9, pages 377–419. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.
Google Scholar
W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74:829–836, 1979.
Article MathSciNet Google Scholar
E. F. Cook and L. Goldman. Asymmetric stratification: An outline for an efficient method for controlling confounding in cohort studies. Am J Epi, 127:626–639, 1988.
Google Scholar
D. R. Cox. The regression analysis of binary sequences (with discussion). J Roy Stat Soc B, 20:215–242, 1958.
Google Scholar
D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.
Google Scholar
N. J. Crichton, J. P. Hinde, and J. Marchini. Models for diagnosing chest pain: Is CART useful? Stat Med, 16:717–727, 1997.
Article Google Scholar
R. B. Davis and J. R. Anderson. Exponential survival trees. Stat Med, 8:947–961, 1989.
Article Google Scholar
C. de Boor. A Practical Guide to Splines. Springer-Verlag, New York, revised edition, 2001.
Google Scholar
T. F. Devlin and B. J. Weeks. Spline functions for logistic regression modeling. In Proceedings of the Eleventh Annual SAS Users Group International Conference, pages 646–651, Cary, NC, 1986. SAS Institute, Inc.
Google Scholar
S. Durrleman and R. Simon. Flexible regression models with cubic splines. Stat Med, 8:551–561, 1989.
Article Google Scholar
D. Faraggi and R. Simon. A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Stat Med, 15:2203–2213, 1996.
Article Google Scholar
V. Fedorov, F. Mannino, and R. Zhang. Consequences of dichotomization. Pharm Stat, 8:50–61, 2009.
Article Google Scholar
J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.
Google Scholar
A. Giannoni, R. Baruah, T. Leong, M. B. Rehman, L. E. Pastormerlo, F. E. Harrell, A. J. Coats, and D. P. Francis. Do optimal prognostic thresholds in continuous physiological variables really exist? Analysis of origin of apparent thresholds, with systematic review for peak oxygen consumption, ejection fraction and BNP. PLoS ONE, 9(1), 2014.
Google Scholar
U. S. Govindarajulu, D. Spiegelman, S. W. Thurston, B. Ganguli, and E. A. Eisen. Comparing smoothing techniques in Cox models for exposure-response relationships. Stat Med, 26:3735–3752, 2007.
Article MathSciNet Google Scholar
P. M. Grambsch and P. C. O’Brien. The effects of transformations and preliminary tests for non-linearity in regression. Stat Med, 10:697–709, 1991.
Article Google Scholar
R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.
Article Google Scholar
R. J. Gray. Spline-based tests in survival analysis. Biometrics, 50:640–652, 1994.
Article MathSciNet Google Scholar
P. Gustafson. Bayesian regression modeling with interactions and smooth effects. J Am Stat Assoc, 95:795–806, 2000.
Article Google Scholar
F. E. Harrell, K. L. Lee, D. B. Matchar, and T. A. Reichert. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Ca Trt Rep, 69:1071–1077, 1985.
Google Scholar
F. E. Harrell, K. L. Lee, and B. G. Pollock. Regression models in clinical studies: Determining relationships between predictors and response. J Nat Cancer Inst, 80:1198–1202, 1988.
Article Google Scholar
T. Hastie. Discussion of “The use of polynomial splines and their tensor products in multivariate function estimation” by C. J. Stone. Appl Stat, 22:177–179, 1994.
Google Scholar
T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.
Google Scholar
S. G. Hilsenbeck and G. M. Clark. Practical p-value adjustment for optimally selected cutpoints. Stat Med, 15:103–112, 1996.
Article Google Scholar
N. Holländer, W. Sauerbrei, and M. Schumacher. Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Stat Med, 23:1701–1713, 2004.
Article Google Scholar
S. Keleş and M. R. Segal. Residual-based tree-structured survival analysis. Stat Med, 21:313–326, 2002.
Article Google Scholar
B. Lausen and M. Schumacher. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comp Stat Data Analysis, 21(3):307–326, 1996.
Article Google Scholar
M. LeBlanc and J. Crowley. Survival trees by goodness of fit. J Am Stat Assoc, 88:457–467, 1993.
Article MathSciNet Google Scholar
L. Magee. Nonlocal behavior in polynomial regressions. Am Statistician, 52:20–22, 1998.
Google Scholar
R. J. Marshall. The use of classification and regression trees in clinical epidemiology. J Clin Epi, 54:603–609, 2001.
Article Google Scholar
S. E. Maxwell and H. D. Delaney. Bivariate median splits and spurious statistical significance. Psych Bull, 113:181–190, 1993.
Article Google Scholar
D. R. McNeil, J. Trussell, and J. C. Turner. Spline interpolation of demographic data. Demography, 14:245–252, 1977.
Article Google Scholar
B. K. Moser and L. P. Coombs. Odds ratios for a continuous outcome variable without dichotomizing. Stat Med, 23:1843–1860, 2004.
Article Google Scholar
D. R. Ragland. Dichotomizing continuous outcome variables: Dependence of the magnitude of association and statistical power on the cutpoint. Epi, 3:434–440, 1992. See letters to editor May 1993 P. 274-, Vol 4 No. 3.
Google Scholar
P. Royston and D. G. Altman. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. ApplStat, 43:429–453, 1994. Discussion pp. 453–467.
Google Scholar
P. Royston, D. G. Altman, and W. Sauerbrei. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med, 25:127–141, 2006.
Article MathSciNet Google Scholar
M. Schemper. Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Stat Med, 7:1257–1266, 1988.
Article Google Scholar
C. Schmoor, K. Ulm, and M. Schumacher. Comparison of the Cox model and the regression tree procedure in analysing a randomized clinical trial. Stat Med, 12:2351–2366, 1993.
Article Google Scholar
G. Schulgen, B. Lausen, J. Olsen, and M. Schumacher. Outcome-oriented cutpoints in quantitative exposure. Am J Epi, 120:172–184, 1994.
Google Scholar
M. R. Segal. Regression trees for censored data. Biometrics, 44:35–47, 1988.
Article Google Scholar
L. A. Sleeper and D. P. Harrington. Regression splines in the Cox model with application to covariate effects in liver disease. J Am Stat Assoc, 85:941–949, 1990.
Article Google Scholar
P. L. Smith. Splines as a useful and convenient statistical tool. Am Statistician, 33:57–62, 1979.
Google Scholar
C. J. Stone. Comment: Generalized additive models. Statistical Sci, 1:312–314, 1986.
Article Google Scholar
C. J. Stone and C. Y. Koo. Additive splines in statistics. In Proceedings of the Statistical Computing Section ASA, pages 45–48, Washington, DC, 1985.
Google Scholar
S. Suissa and L. Blais. Binary regression with continuous outcomes. Stat Med, 14:247–255, 1995.
Article Google Scholar
T. van der Ploeg, P. C. Austin, and E. W. Steyerberg. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology, 14(1):137+, Dec. 2014.
Google Scholar
H. Wainer. Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance, 19(1):49–56, 2006.
Article MathSciNet Google Scholar
S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54:167–178, 1967.
Article MathSciNet Google Scholar
A. R. Walter, A. R. Feinstein, and C. K. Wells. Coding ordinal independent variables in multiple regression analyses. Am J Epi, 125:319–323, 1987.
Google Scholar
Y. Wang, G. Wahba, C. Gu, R. Klein, and B. Klein. Using smoothing spline ANOVA to examine the relation of risk factors to the incidence and progression of diabetic retinopathy. Stat Med, 16:1357–1376, 1997.
Article Google Scholar
H. Zhang. Classification trees for multiple binary responses. J Am Stat Assoc, 93:180–193, 1998.
Article Google Scholar
H. Zhang, T. Holford, and M. B. Bracken. A tree-based method of analysis for prospective studies. Stat Med, 15:37–49, 1996.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, School of Medicine Vanderbilt University, Nashville, TN, USA
Frank E. Harrell Jr.

Authors

Frank E. Harrell Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harrell, F.E. (2015). General Aspects of Fitting Regression Models. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-19425-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics