Summary
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Matthews DE. (2005). Linear regression, simple. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 4. Chichester, UK: Wiley, pp. 2812–2816.
McCullagh P, Nelder JA. (1989). Generalized Linear Models, 2nd ed. New York: Chapman & Hall.
Srivastava MS. (2002). Methods of Multi-variate Statistics. New York: Wiley.
Anderson TW. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.
Krzanowski WJ. (2005). Multivariate multiple regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3552–3553.
Matthews DE. (2005). Multiple linear regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3428–3441.
Draper NR, Smith H. (1998). Applied Regression Analysis, 3rd ed. New York: Wiley.
Harrell FE Jr. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.
Cook DR, Weisberg S. (1997). Graphics for assessing the adequacy of regression models. J Am Stat Assoc 92, 490–499.
Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by Body weight and gender. World J Gastroenterol 12, 2217–2222.
Anderson JA. (1972). Separate sample logistic discrimination. Biometrika 59, 19–35.
Mantel N. (1973). Synthetic retrospective studies and related topics. Biometrics 29, 479–486.
Levy PS, Stolte K. (2000). Statistical methods in public health and epidemiology: a look at the recent past and projections for the next decade. Stat Methods Med Res 9, 41–55.
Hosmer DW Jr, Lemeshow S. (2000). Applied Logistic Regression, 2nd ed. New York: Wiley.
Hosmer DW, Lemeshow S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9, 1043–1069.
Davies HTO, Crombie IK, Tavakoli M. (1998). When can odds ratios mislead? BMJ 316, 989–991.
Gorini G, Stagnaro E, Fontana V, et al. (2007). Alcohol consumption and risk of Hodgkin's lymphoma and multiple myeloma: a multicentre case-control study. Ann Oncol 18, 143–148.
Kaplan EL, Meier P. (1958). Nonparamet-ric estimator from incomplete observations. J Am Stat Assoc 53, 457–481.
Sasieni P. (2005). Cox regression model. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 2. Chichester, UK: Wiley, pp. 1280–1294.
Cox DR. (1972). Regression models and life tables (with discussion). J R Stat Soc B 34, 187–220.
Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276.
Jac/obs DR Jr, Adachi H, Mulder I, et al. (1999). Cigarette smoking and mortality risk: twenty-five-year follow-up of the Seven Countries Study. Arch Intern Med 159, 733–740.
Frome EL, Kutner MH, Beauchamp JJ. (1973). Regression analysis of Poisson-distrib-uted data. J Am Stat Assoc 68, 935–940.
Preston DL. (2005). Poisson regression in epidemiology. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4124–4127.
Spiegelman D, Hertzmark E. (2005). Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 162, 199–200.
Seeber GUH. (2005). Poisson regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4115–4124.
Romundstad P, Andersen A, Haldorsen T. (2001). Cancer incidence among workers in the Norwegian silicon carbide industry. Am J Epidemiol 153, 978–986.
Royston P. (2000). A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 19, 1831–1847.
Harrell FE Jr, Lee KL, Mark DB. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361–387.
Hosmer DW Jr, Lemeshow S. (1999). Applied Survival Analysis: Regression Modelling of Time to Event Data. New York: Wiley.
Bagley SC, White H, Golomb BA. (2001). Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 54, 979–985.
Katz MH. (2003). Multivariable analysis: A primer for readers of medical research. N Engl J Med 138, 644–650.
Breslow NE, Day NE. (1980). Statistical Methods in Cancer Research Vol. I: The Analysis of Case-Control Studies. Lyon, France: International Agency for Research on Cancer.
Engel J. (1988). Polytomous logistic regression. Stat Neerl 42: 233–252.
McCullagh P. (1980). Regression models for ordinal data (with discussion). J R Stat Soc B 42, 109–142.
Bender R, Grouven U. (1997). Ordinal logistic regression in medical research. J R Coll Physicians Lond 31, 546–551.
Bender R, Benner A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biom J 42, 677–699.
Andersen PK. (1992). Repeated assessment of risk factors in survival analysis. Stat Methods Med Res 1, 297–315.
Altman DG, DeStavola BL. (1994). Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates. Stat Med 13, 301–341.
Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research Vol. II: The Design and Analysis of Cohort Studies. Lyon, France: International Agency for Research on Cancer.
Dickman PW, Sloggett A, Hills M, Hakulinen T. (2004). Regression models for relative survival. Stat Med 23, 51–64.
Royston P, Altman DG. (1994). Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43, 429–467.
Sauerbrei W, Royston P. (1999). Building multi-variable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Society 162, 71–94.
Royston P, Ambler G, Sauerbrei W. (1999). The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 28, 964–974.
Royston P, Sauerbrei W. (2005). Building multivariable regression models with continuous covariates in clinical epidemiology—with an emphasis on fractional polynomials. Methods Inf Med 44, 561–571.
Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006). Multivariable regression building by using fractional polynomials: description of SAS, STATA and R programs. Comput Stat Data Anal 50, 3646–3485.
Bates DM, Watts DG. (1988). Nonlinear Regression Analysis and its Applications. New York: Wiley.
Seber GAF, Wild CJ. (1989). Nonlinear Regression. New York: Wiley.
Ratkowsky DA. (1990). Handbook of Nonlinear Regression Models. New York: Marcel Dekker.
Liang K-Y, Zeger SL. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
Burton P, Gurrin L, Sly P. (1998). Tutorial in biostatistics: extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Stat Med 17, 1261–1291.
Hanley JA, Negassa A, Edwardes MD, Forrester JE. (2003). Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 157, 364–375.
Brown H. (2006). Applied Mixed Models in Medicine, 2nd ed. Chichester, UK: Wiley.
McGilchrist CA. (1993). REML estimation for survival models with frailty. Biometrics 49, 221–225.
Diez-Roux AV. (2000). Multilevel analysis in public health research. Annu Rev Public Health 21, 171–192.
Little RJA, Rubin DB. (2002). Statistical Analysis with Missing Data, 2nd ed. Hobo-ken, NJ: Wiley.
Carroll RJ, Ruppert D, Stefanski LA, Crain-iceanu CM. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. London, UK: Chapman & Hall.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Bender, R. (2009). Introduction to the Use of Regression Models in Epidemiology. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_9
Download citation
DOI: https://doi.org/10.1007/978-1-59745-416-2_9
Publisher Name: Humana Press
Print ISBN: 978-1-58829-987-1
Online ISBN: 978-1-59745-416-2
eBook Packages: Springer Protocols