Skip to main content

Introduction to the Use of Regression Models in Epidemiology

  • Protocol
Book cover Cancer Epidemiology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 471))

Summary

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Matthews DE. (2005). Linear regression, simple. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 4. Chichester, UK: Wiley, pp. 2812–2816.

    Google Scholar 

  2. McCullagh P, Nelder JA. (1989). Generalized Linear Models, 2nd ed. New York: Chapman & Hall.

    Google Scholar 

  3. Srivastava MS. (2002). Methods of Multi-variate Statistics. New York: Wiley.

    Google Scholar 

  4. Anderson TW. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.

    Google Scholar 

  5. Krzanowski WJ. (2005). Multivariate multiple regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3552–3553.

    Google Scholar 

  6. Matthews DE. (2005). Multiple linear regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3428–3441.

    Google Scholar 

  7. Draper NR, Smith H. (1998). Applied Regression Analysis, 3rd ed. New York: Wiley.

    Google Scholar 

  8. Harrell FE Jr. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.

    Google Scholar 

  9. Cook DR, Weisberg S. (1997). Graphics for assessing the adequacy of regression models. J Am Stat Assoc 92, 490–499.

    Article  Google Scholar 

  10. Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by Body weight and gender. World J Gastroenterol 12, 2217–2222.

    PubMed  Google Scholar 

  11. Anderson JA. (1972). Separate sample logistic discrimination. Biometrika 59, 19–35.

    Article  Google Scholar 

  12. Mantel N. (1973). Synthetic retrospective studies and related topics. Biometrics 29, 479–486.

    Article  CAS  PubMed  Google Scholar 

  13. Levy PS, Stolte K. (2000). Statistical methods in public health and epidemiology: a look at the recent past and projections for the next decade. Stat Methods Med Res 9, 41–55.

    Article  CAS  PubMed  Google Scholar 

  14. Hosmer DW Jr, Lemeshow S. (2000). Applied Logistic Regression, 2nd ed. New York: Wiley.

    Book  Google Scholar 

  15. Hosmer DW, Lemeshow S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9, 1043–1069.

    Article  Google Scholar 

  16. Davies HTO, Crombie IK, Tavakoli M. (1998). When can odds ratios mislead? BMJ 316, 989–991.

    CAS  PubMed  Google Scholar 

  17. Gorini G, Stagnaro E, Fontana V, et al. (2007). Alcohol consumption and risk of Hodgkin's lymphoma and multiple myeloma: a multicentre case-control study. Ann Oncol 18, 143–148.

    Article  CAS  PubMed  Google Scholar 

  18. Kaplan EL, Meier P. (1958). Nonparamet-ric estimator from incomplete observations. J Am Stat Assoc 53, 457–481.

    Article  Google Scholar 

  19. Sasieni P. (2005). Cox regression model. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 2. Chichester, UK: Wiley, pp. 1280–1294.

    Google Scholar 

  20. Cox DR. (1972). Regression models and life tables (with discussion). J R Stat Soc B 34, 187–220.

    Google Scholar 

  21. Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276.

    Article  Google Scholar 

  22. Jac/obs DR Jr, Adachi H, Mulder I, et al. (1999). Cigarette smoking and mortality risk: twenty-five-year follow-up of the Seven Countries Study. Arch Intern Med 159, 733–740.

    Article  Google Scholar 

  23. Frome EL, Kutner MH, Beauchamp JJ. (1973). Regression analysis of Poisson-distrib-uted data. J Am Stat Assoc 68, 935–940.

    Article  Google Scholar 

  24. Preston DL. (2005). Poisson regression in epidemiology. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4124–4127.

    Google Scholar 

  25. Spiegelman D, Hertzmark E. (2005). Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 162, 199–200.

    Article  PubMed  Google Scholar 

  26. Seeber GUH. (2005). Poisson regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4115–4124.

    Google Scholar 

  27. Romundstad P, Andersen A, Haldorsen T. (2001). Cancer incidence among workers in the Norwegian silicon carbide industry. Am J Epidemiol 153, 978–986.

    Article  CAS  PubMed  Google Scholar 

  28. Royston P. (2000). A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 19, 1831–1847.

    Article  CAS  PubMed  Google Scholar 

  29. Harrell FE Jr, Lee KL, Mark DB. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361–387.

    Article  PubMed  Google Scholar 

  30. Hosmer DW Jr, Lemeshow S. (1999). Applied Survival Analysis: Regression Modelling of Time to Event Data. New York: Wiley.

    Google Scholar 

  31. Bagley SC, White H, Golomb BA. (2001). Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 54, 979–985.

    Article  CAS  PubMed  Google Scholar 

  32. Katz MH. (2003). Multivariable analysis: A primer for readers of medical research. N Engl J Med 138, 644–650.

    Google Scholar 

  33. Breslow NE, Day NE. (1980). Statistical Methods in Cancer Research Vol. I: The Analysis of Case-Control Studies. Lyon, France: International Agency for Research on Cancer.

    Google Scholar 

  34. Engel J. (1988). Polytomous logistic regression. Stat Neerl 42: 233–252.

    Article  Google Scholar 

  35. McCullagh P. (1980). Regression models for ordinal data (with discussion). J R Stat Soc B 42, 109–142.

    Google Scholar 

  36. Bender R, Grouven U. (1997). Ordinal logistic regression in medical research. J R Coll Physicians Lond 31, 546–551.

    CAS  PubMed  Google Scholar 

  37. Bender R, Benner A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biom J 42, 677–699.

    Article  Google Scholar 

  38. Andersen PK. (1992). Repeated assessment of risk factors in survival analysis. Stat Methods Med Res 1, 297–315.

    Article  CAS  PubMed  Google Scholar 

  39. Altman DG, DeStavola BL. (1994). Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates. Stat Med 13, 301–341.

    Article  CAS  PubMed  Google Scholar 

  40. Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research Vol. II: The Design and Analysis of Cohort Studies. Lyon, France: International Agency for Research on Cancer.

    Google Scholar 

  41. Dickman PW, Sloggett A, Hills M, Hakulinen T. (2004). Regression models for relative survival. Stat Med 23, 51–64.

    Article  PubMed  Google Scholar 

  42. Royston P, Altman DG. (1994). Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43, 429–467.

    Article  Google Scholar 

  43. Sauerbrei W, Royston P. (1999). Building multi-variable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Society 162, 71–94.

    Article  Google Scholar 

  44. Royston P, Ambler G, Sauerbrei W. (1999). The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 28, 964–974.

    Article  CAS  PubMed  Google Scholar 

  45. Royston P, Sauerbrei W. (2005). Building multivariable regression models with continuous covariates in clinical epidemiology—with an emphasis on fractional polynomials. Methods Inf Med 44, 561–571.

    CAS  PubMed  Google Scholar 

  46. Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006). Multivariable regression building by using fractional polynomials: description of SAS, STATA and R programs. Comput Stat Data Anal 50, 3646–3485.

    Article  Google Scholar 

  47. Bates DM, Watts DG. (1988). Nonlinear Regression Analysis and its Applications. New York: Wiley.

    Book  Google Scholar 

  48. Seber GAF, Wild CJ. (1989). Nonlinear Regression. New York: Wiley.

    Book  Google Scholar 

  49. Ratkowsky DA. (1990). Handbook of Nonlinear Regression Models. New York: Marcel Dekker.

    Google Scholar 

  50. Liang K-Y, Zeger SL. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.

    Article  Google Scholar 

  51. Burton P, Gurrin L, Sly P. (1998). Tutorial in biostatistics: extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Stat Med 17, 1261–1291.

    Article  CAS  PubMed  Google Scholar 

  52. Hanley JA, Negassa A, Edwardes MD, Forrester JE. (2003). Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 157, 364–375.

    Article  PubMed  Google Scholar 

  53. Brown H. (2006). Applied Mixed Models in Medicine, 2nd ed. Chichester, UK: Wiley.

    Book  Google Scholar 

  54. McGilchrist CA. (1993). REML estimation for survival models with frailty. Biometrics 49, 221–225.

    Article  CAS  PubMed  Google Scholar 

  55. Diez-Roux AV. (2000). Multilevel analysis in public health research. Annu Rev Public Health 21, 171–192.

    Article  CAS  PubMed  Google Scholar 

  56. Little RJA, Rubin DB. (2002). Statistical Analysis with Missing Data, 2nd ed. Hobo-ken, NJ: Wiley.

    Google Scholar 

  57. Carroll RJ, Ruppert D, Stefanski LA, Crain-iceanu CM. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. London, UK: Chapman & Hall.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Bender, R. (2009). Introduction to the Use of Regression Models in Epidemiology. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-416-2_9

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-987-1

  • Online ISBN: 978-1-59745-416-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics