Abstract
The ordinary multiple linear regression model is frequently used and has parameters that are easily interpreted. In this chapter we study a general class of regression models, those stated in terms of a weighted sum of a set of independent or predictor variables. It is shown that after linearizing the model with respect to the predictor variables, the parameters in such regression models are also readily interpreted. Also, all the designs used in ordinary linear regression can be used in this general setting. These designs include analysis of variance ( ANOVA) setups, interaction effects, and nonlinear effects. Besides describing and interpreting general regression models, this chapter also describes, in general terms, how the three types of assumptions of regression models can be examined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that it is not necessary to “hold constant” all other variables to be able to interpret the effect of one predictor. It is sufficient to hold constant the weighted sum of all the variables other than X j . And in many cases it is not physically possible to hold other variables constant while varying one, e.g., when a model contains X and X 2 (David Hoaglin, personal communication).
- 2.
This weight is not to be confused with the regression coefficient; rather the weights are \(w_{1},w_{2},\ldots,w_{n}\) and the fitting criterion is \(\sum _{i}^{n}w_{i}(Y _{i} -\hat{ Y _{i}})^{2}\).
- 3.
In other words, under what assumptions does the test have maximum power?
- 4.
Note: To pre-specify knots for restricted cubic spline functions, use something like rcs(predictor, c(t1,t2,t3,t4)), where the knot locations are t1, t2, t3, t4.
- 5.
Note that anova in rms computes all needed test statistics from a single model fit object.
References
H. Ahn and W. Loh. Tree-structured proportional hazards regression modeling. Biometrics, 50:471–485, 1994.
D. G. Altman. Categorising continuous covariates (letter to the editor). Brit J Cancer, 64:975, 1991.
D. G. Altman. Suboptimal analysis using ‘optimal’ cutpoints. Brit J Cancer, 78:556–557, 1998.
D. G. Altman, B. Lausen, W. Sauerbrei, and M. Schumacher. Dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst, 86:829–835, 1994.
P. C. Austin. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Stat Med, 26:2937–2957, 2007.
H. Belcher. The concept of residual confounding in regression models and some applications. Stat Med, 11:1747–1758, 1992.
K. Berhane, M. Hauptmann, and B. Langholz. Using tensor product splines in modeling exposure–time–response relationships: Application to the Colorado Plateau Uranium Miners cohort. Stat Med, 27:5484–5496, 2008.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1984.
P. Buettner, C. Garbe, and I. Guggenmoos-Holzmann. Problems in defining cutoff points of continuous prognostic factors: Example of tumor thickness in primary cutaneous melanoma. J Clin Epi, 50:1201–1210, 1997.
J. M. Chambers and T. J. Hastie, editors. Statistical Models in S. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.
A. Ciampi, A. Negassa, and Z. Lou. Tree-structured prediction for censored survival data and the Cox model. J Clin Epi, 48:675–689, 1995.
A. Ciampi, J. Thiffault, J. P. Nakache, and B. Asselain. Stratification by stepwise regression, correspondence analysis and recursive partition. Comp Stat Data Analysis, 1986:185–204, 1986.
L. A. Clark and D. Pregibon. Tree-Based Models. In J. M. Chambers and T. J. Hastie, editors, Statistical Models in S, chapter 9, pages 377–419. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.
W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74:829–836, 1979.
E. F. Cook and L. Goldman. Asymmetric stratification: An outline for an efficient method for controlling confounding in cohort studies. Am J Epi, 127:626–639, 1988.
D. R. Cox. The regression analysis of binary sequences (with discussion). J Roy Stat Soc B, 20:215–242, 1958.
D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.
N. J. Crichton, J. P. Hinde, and J. Marchini. Models for diagnosing chest pain: Is CART useful? Stat Med, 16:717–727, 1997.
R. B. Davis and J. R. Anderson. Exponential survival trees. Stat Med, 8:947–961, 1989.
C. de Boor. A Practical Guide to Splines. Springer-Verlag, New York, revised edition, 2001.
T. F. Devlin and B. J. Weeks. Spline functions for logistic regression modeling. In Proceedings of the Eleventh Annual SAS Users Group International Conference, pages 646–651, Cary, NC, 1986. SAS Institute, Inc.
S. Durrleman and R. Simon. Flexible regression models with cubic splines. Stat Med, 8:551–561, 1989.
D. Faraggi and R. Simon. A simulation study of cross-validation for selecting an optimal cutpoint in univariate survival analysis. Stat Med, 15:2203–2213, 1996.
V. Fedorov, F. Mannino, and R. Zhang. Consequences of dichotomization. Pharm Stat, 8:50–61, 2009.
J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.
A. Giannoni, R. Baruah, T. Leong, M. B. Rehman, L. E. Pastormerlo, F. E. Harrell, A. J. Coats, and D. P. Francis. Do optimal prognostic thresholds in continuous physiological variables really exist? Analysis of origin of apparent thresholds, with systematic review for peak oxygen consumption, ejection fraction and BNP. PLoS ONE, 9(1), 2014.
U. S. Govindarajulu, D. Spiegelman, S. W. Thurston, B. Ganguli, and E. A. Eisen. Comparing smoothing techniques in Cox models for exposure-response relationships. Stat Med, 26:3735–3752, 2007.
P. M. Grambsch and P. C. O’Brien. The effects of transformations and preliminary tests for non-linearity in regression. Stat Med, 10:697–709, 1991.
R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.
R. J. Gray. Spline-based tests in survival analysis. Biometrics, 50:640–652, 1994.
P. Gustafson. Bayesian regression modeling with interactions and smooth effects. J Am Stat Assoc, 95:795–806, 2000.
F. E. Harrell, K. L. Lee, D. B. Matchar, and T. A. Reichert. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Ca Trt Rep, 69:1071–1077, 1985.
F. E. Harrell, K. L. Lee, and B. G. Pollock. Regression models in clinical studies: Determining relationships between predictors and response. J Nat Cancer Inst, 80:1198–1202, 1988.
T. Hastie. Discussion of “The use of polynomial splines and their tensor products in multivariate function estimation” by C. J. Stone. Appl Stat, 22:177–179, 1994.
T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.
S. G. Hilsenbeck and G. M. Clark. Practical p-value adjustment for optimally selected cutpoints. Stat Med, 15:103–112, 1996.
N. Holländer, W. Sauerbrei, and M. Schumacher. Confidence intervals for the effect of a prognostic factor after selection of an ‘optimal’ cutpoint. Stat Med, 23:1701–1713, 2004.
S. Keleş and M. R. Segal. Residual-based tree-structured survival analysis. Stat Med, 21:313–326, 2002.
B. Lausen and M. Schumacher. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comp Stat Data Analysis, 21(3):307–326, 1996.
M. LeBlanc and J. Crowley. Survival trees by goodness of fit. J Am Stat Assoc, 88:457–467, 1993.
L. Magee. Nonlocal behavior in polynomial regressions. Am Statistician, 52:20–22, 1998.
R. J. Marshall. The use of classification and regression trees in clinical epidemiology. J Clin Epi, 54:603–609, 2001.
S. E. Maxwell and H. D. Delaney. Bivariate median splits and spurious statistical significance. Psych Bull, 113:181–190, 1993.
D. R. McNeil, J. Trussell, and J. C. Turner. Spline interpolation of demographic data. Demography, 14:245–252, 1977.
B. K. Moser and L. P. Coombs. Odds ratios for a continuous outcome variable without dichotomizing. Stat Med, 23:1843–1860, 2004.
D. R. Ragland. Dichotomizing continuous outcome variables: Dependence of the magnitude of association and statistical power on the cutpoint. Epi, 3:434–440, 1992. See letters to editor May 1993 P. 274-, Vol 4 No. 3.
P. Royston and D. G. Altman. Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. ApplStat, 43:429–453, 1994. Discussion pp. 453–467.
P. Royston, D. G. Altman, and W. Sauerbrei. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med, 25:127–141, 2006.
M. Schemper. Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Stat Med, 7:1257–1266, 1988.
C. Schmoor, K. Ulm, and M. Schumacher. Comparison of the Cox model and the regression tree procedure in analysing a randomized clinical trial. Stat Med, 12:2351–2366, 1993.
G. Schulgen, B. Lausen, J. Olsen, and M. Schumacher. Outcome-oriented cutpoints in quantitative exposure. Am J Epi, 120:172–184, 1994.
M. R. Segal. Regression trees for censored data. Biometrics, 44:35–47, 1988.
L. A. Sleeper and D. P. Harrington. Regression splines in the Cox model with application to covariate effects in liver disease. J Am Stat Assoc, 85:941–949, 1990.
P. L. Smith. Splines as a useful and convenient statistical tool. Am Statistician, 33:57–62, 1979.
C. J. Stone. Comment: Generalized additive models. Statistical Sci, 1:312–314, 1986.
C. J. Stone and C. Y. Koo. Additive splines in statistics. In Proceedings of the Statistical Computing Section ASA, pages 45–48, Washington, DC, 1985.
S. Suissa and L. Blais. Binary regression with continuous outcomes. Stat Med, 14:247–255, 1995.
T. van der Ploeg, P. C. Austin, and E. W. Steyerberg. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology, 14(1):137+, Dec. 2014.
H. Wainer. Finding what is not there through the unfortunate binning of results: The Mendel effect. Chance, 19(1):49–56, 2006.
S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54:167–178, 1967.
A. R. Walter, A. R. Feinstein, and C. K. Wells. Coding ordinal independent variables in multiple regression analyses. Am J Epi, 125:319–323, 1987.
Y. Wang, G. Wahba, C. Gu, R. Klein, and B. Klein. Using smoothing spline ANOVA to examine the relation of risk factors to the incidence and progression of diabetic retinopathy. Stat Med, 16:1357–1376, 1997.
H. Zhang. Classification trees for multiple binary responses. J Am Stat Assoc, 93:180–193, 1998.
H. Zhang, T. Holford, and M. B. Bracken. A tree-based method of analysis for prospective studies. Stat Med, 15:37–49, 1996.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Harrell, F.E. (2015). General Aspects of Fitting Regression Models. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-19425-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)