Skip to main content

Regression Methods for Epidemiological Analysis

  • Reference work entry
  • 12k Accesses

Abstract

Basic tabular and graphical methods are an essential component of epidemiological analysis and are often sufficient, especially when one need consider only a few variables at a time. They are, however, limited in the number of variables that they can examine simultaneously and in detail they can consider continuous variables. Even sparse-strata methods (such as Mantel-Haenszel) require that some strata have two or more subjects; yet, as more and more variables or categories are added to a stratification, the number of subjects in each stratum may eventually drop to 0 or 1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   999.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   1,399.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Agresti A (2002) Categorical data analysis. Wiley, New York

    Book  Google Scholar 

  • Ananth CV, Kleinbaum DG (1997) Regression models for ordinal responses: a review of methods and applications. Int J Epidemiol 26:1323–1333

    Article  CAS  PubMed  Google Scholar 

  • Bancroft TA, Han C-P (1977) Inference based on conditional specification: a note and a bibliography. Int Stat Rev 45:117–127

    Google Scholar 

  • Berk R (2004) Regression analysis: a constructive critique. Sage publications, Thousand Oaks

    Google Scholar 

  • Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge

    Google Scholar 

  • Breiman L (2001) Statistical modeling: the two cultures (with discussion). Stat Sci 16:199–231

    Article  Google Scholar 

  • Breslow NE, Day NE (1980) Statistical methods in cancer research. Vol I: the analysis of case-control data. IARC, Lyon

    Google Scholar 

  • Breslow NE, Day NE (1987) Statistical methods in cancer research. Vol II: the design and analysis of cohort studies. IARC, Lyon

    Google Scholar 

  • Brown PJ, Vannucci M, Fearn T (2002) Bayes model averaging with selection of regressors. J R Stat Soc Ser B 64:519–536

    Article  Google Scholar 

  • Carlin B, Louis TA (2000) Bayes and empirical-Bayes methods of data analysis, 2nd edn. Chapman and Hall, New York

    Book  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models, 2nd edn. Chapman and Hall, New York

    Book  Google Scholar 

  • Cole SR, Ananth CV (2001) Regression models for unconstrained, partially or fully constrained continuation odds ratios. Int J Epidemiol 30:1379–1382

    Article  CAS  PubMed  Google Scholar 

  • Copas JB (1983) Regression, prediction, and shrinkage (with discussion). J R Stat Soc B 45: 311–354

    Google Scholar 

  • Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220

    Google Scholar 

  • Cox DR, Oakes D (1984) Analysis of survival data. Chapman and Hall, New York

    Google Scholar 

  • Cox DR, Wermuth N (1992) A comment on the coefficient of determination for binary responses. Am Stat 46:1–4

    Google Scholar 

  • Diggle PJ, Heagerty P, Liang KY, Zeger SL (2002) The analysis of longitudinal data, 2nd edn. Oxford University Press, New York

    Google Scholar 

  • Draper D (1995) Assessment and propagation of model uncertainty. J R Stat Soc Ser B 57:45–97

    Google Scholar 

  • Draper NR, Guttman I, Lapczak L (1979) Actual rejection levels in a certain stepwise test. Commun Stat A 8:99–105

    Article  Google Scholar 

  • Easton DF, Peto J, Babiker AG (1991) Floating absolute risk: an alternative to relative risk in survival and case-control analysis avoiding an arbitrary reference group. Stat Med 10:1025–1035

    Article  CAS  PubMed  Google Scholar 

  • Efron B (2004) The estimation of prediction error: covariance penalties and cross-validation. J Am Stat Assoc 99:619–642

    Article  Google Scholar 

  • Efron B, Morris CN (1975) Data analysis using Stein’s estimator and its generalizations. J Am Stat Assoc 70:311–319

    Article  Google Scholar 

  • Faraway JJ (1992) On the cost of data analysis. J Comput Graph Stat 1:213–219

    Google Scholar 

  • Flack VF, Chang PC (1987) Frequency of selecting noise variables in subset regression analysis: a simulation study. Am Stat 41:84–86

    Google Scholar 

  • Freedman DA (1983) A note on screening regression equations. Am Stat 37:152–155

    Google Scholar 

  • Freedman DA, Navidi W, Peters SC (1988) On the impact of variable selection in fitting regression equations. In: Dijlestra TK (ed) On model uncertainty and its statistical implications. Springer, Berlin, pp 1–16

    Chapter  Google Scholar 

  • Greenland S (1993) Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Stat Med 12:717–736

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (1994) Alternative models for ordinal logistic regression. Stat Med 13:1665–1677

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (1995a) Dose-response and trend analysis: alternatives to categorical analysis. Epidemiology 6:356–365

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (1995b) Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology 6:450–454

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (1995c) Problems in the average-risk interpretation of categorical dose-response analyses. Epidemiology 6:563–565

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (1996) A lower bound for the correlation of exponentiated bivariate normal pairs. Am Stat 50:163–164

    Google Scholar 

  • Greenland S (1999) Multilevel modeling and model averaging. Scand J Work Environ Health 25(suppl 4):43–48

    PubMed  Google Scholar 

  • Greenland S (2000a) Principles of multilevel modeling. Int J Epidemiol 29:158–167

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (2000b) When should epidemiologic regressions use random coefficients? Biometrics 56:915–921

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (2001) Putting background information about relative risks into conjugate priors. Biometrics 57:663–670

    Article  CAS  PubMed  Google Scholar 

  • Greenland S (2003) The impact of prior distributions for uncontrolled confounding and response bias: a case study of the relation of wire codes and magnetic fields to childhood leukemia. J Am Stat Assoc 98:47–54

    Article  Google Scholar 

  • Greenland S (2004) Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. Am J Epidemiol 160:301–305

    Article  PubMed  Google Scholar 

  • Greenland S (2005a) Epidemiologic measures and policy formulation: Lessons from potential outcomes (with discussion). Emerg Themes Epidemiol 2:1–4

    Article  Google Scholar 

  • Greenland S (2005b) Multiple-bias modeling for observational studies. J R Stat Soc Ser A 168:267–308

    Article  Google Scholar 

  • Greenland S (2006) Bayesian perspectives for epidemiologic research. I. Foundations and basic methods (with comment and reply). Int J Epidemiol 35:765–778

    Article  PubMed  Google Scholar 

  • Greenland S (2007) Bayesian perspectives for epidemiologic research. II. Regression analysis. Int J Epidemiol 36:195–202

    Article  PubMed  Google Scholar 

  • Greenland S (2008a) Introduction to regression modeling. Chap. 21. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia

    Google Scholar 

  • Greenland S (2008b) Variable selection and shrinkage in the control of multiple confounders. Am J Epidemiol 167:523–529, Erratum: p 1142

    Article  PubMed  Google Scholar 

  • Greenland S (2009a) Bayesian perspectives for epidemiologic research. III. Bias analysis via missing-data methods. Int J Epidemiol 38:1662–1673

    Article  PubMed  Google Scholar 

  • Greenland S (2009b). Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Stat Sci 24:195–210

    Article  Google Scholar 

  • Greenland S, Lash TL (2008) Bias analysis. Chap. 19. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia

    Google Scholar 

  • Greenland S, Maldonado G (1994). The interpretation of multiplicative model parameters as standardized parameters. Statistics in Medicine 13:989–999

    Article  CAS  PubMed  Google Scholar 

  • Greenland S, Poole C (1995) Interpretation and analysis of differential exposure variability and zero-dose categories for continuous exposures. Epidemiology 6:326–328

    Article  CAS  PubMed  Google Scholar 

  • Greenland S, Rothman KJ (2008) Fundamentals of epidemiologic data analysis. Chap. 13. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia

    Google Scholar 

  • Greenland S, Schlesselman JJ, Criqui MH (1986) The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am J Epidemiol 123:203–208

    CAS  PubMed  Google Scholar 

  • Greenland S, Maclure M, Schlesselman JJ, Poole C, Morgenstern H (1991) Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology 2:387–392

    Article  CAS  PubMed  Google Scholar 

  • Greenland S, Michels KB, Robins JM, Poole C, Willett WC (1999) Presenting statistical uncertainty in trends and dose-response relations. Am J Epidemiol 149:1077–1086

    Article  CAS  PubMed  Google Scholar 

  • Greenland S, Schwartbaum JA, Finkle WD (2000) Problems from small samples and sparse data in conditional logistic regression. Am J Epidemiol 151:531–539

    Article  CAS  PubMed  Google Scholar 

  • Greenland S, Rothman KJ, Lash TL (2008) Concepts of interaction. Chap. 5. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia

    Google Scholar 

  • Gustafson P (2003) Measurement error and misclassification in statistics and epidemiology. Chapman and Hall, Boca Raton

    Book  Google Scholar 

  • Gustafson P (2005) On model expansion, model contraction, identifiability, and prior information (with discussion). Stat Sci 20:111–140

    Article  Google Scholar 

  • Harrell F (2001) Regression modeling strategies. Springer, New York

    Book  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Hernán MA (2005) Hypothetical interventions to define causal effects—afterthought or prerequisite? Am J Epidemiol 162:618–620

    Article  PubMed  Google Scholar 

  • Hirji K (2006) Exact analysis of discrete data. CRC Press/Chapman and Hall, Boca Raton

    Google Scholar 

  • Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York

    Book  Google Scholar 

  • Hosmer DW, Hosmer T, LeCessie S, Lemeshow S (1997) A comparison of goodness-of-fit tests for the logistic regression model. Stat Med 16:965–980

    Article  CAS  PubMed  Google Scholar 

  • Hurvich DM, Tsai CL (1990) The impact of model selection on inference in linear regression. Am Stat 44:214–217

    Google Scholar 

  • Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York

    Book  Google Scholar 

  • Jewell NP (2004) Statistics for epidemiology. Chapman and Hall, New York

    Google Scholar 

  • Lagakos SW (1988) Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat Med 7:257–274

    Article  CAS  PubMed  Google Scholar 

  • Lash TL, Fox MP, Fink AK (2009) Applying quantitative bias analysis to epidemiologic data. Springer, New York

    Book  Google Scholar 

  • Le Cessie S, van Houwelingen HC (1992) Ridge estimators in logistic regression. Appl Stat 41:191–201

    Article  Google Scholar 

  • Leamer EE (1978) Specification searches: ad hoc inference with nonexperimental data. Wiley, New York

    Google Scholar 

  • Maclure M (1993) Demonstration of deductive meta-analysis: ethanol intake and risk of myocardial infarction. Epidemiol Rev 15:328–351

    CAS  PubMed  Google Scholar 

  • Maclure M, Greenland S (1992) Tests for trend and dose-response: misinterpretations and alternatives. Am J Epidemiol 135:96–104

    CAS  PubMed  Google Scholar 

  • Maldonado G, Greenland S (1993a) Interpreting model coefficients when the true model form is unknown. Epidemiology 4:310–318

    Article  CAS  PubMed  Google Scholar 

  • Maldonado G, Greenland S (1993b) Simulation study of confounder-selection strategies. Am J Epidemiol 138:923–936

    CAS  PubMed  Google Scholar 

  • Maldonado G, Greenland S (1994) A comparison of the performance of model-based confidence intervals when the correct model form is unknown: coverage of asymptotic means. Epidemiology 5:171–182

    Article  CAS  PubMed  Google Scholar 

  • Mantel N, Haenszel WH (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719–748

    CAS  PubMed  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, New York

    Book  Google Scholar 

  • Michels KB, Greenland S, Rosner BA (1998) Does body mass index adequately capture the relation of body composition and body size to health outcomes? Am J Epidemiol 147:167–172

    Article  CAS  PubMed  Google Scholar 

  • Moolgavkar SH, Venzon DJ (1987) General relative risk regression models for epidemiologic studies. Am J Epidemiol 126:949–961

    CAS  PubMed  Google Scholar 

  • Pearl J (2009) Causality, 2nd edn. Cambridge, New York

    Book  Google Scholar 

  • Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49:1373–1379

    Article  CAS  PubMed  Google Scholar 

  • Pike MC, Hill AP, Smith PG (1980) Bias and efficiency in logistic analyses of stratified case-control studies. Int J Epidemiol 9:89–95

    Article  CAS  PubMed  Google Scholar 

  • Pregibon D (1981) Logistic regression diagnostics. Ann Stat 9:705–724

    Article  Google Scholar 

  • Raftery AE (1995) Bayesian model selection in social research (with discussion). Sociol Methodol 25:111–196

    Article  Google Scholar 

  • Robins JM, Greenland S (1986) The role of model selection in causal inference from nonexperimental data. Am J Epidemiol 123:392–402

    CAS  PubMed  Google Scholar 

  • Robins JM, Greenland S (1994) Adjusting for differential rates of prophylaxis therapy for PCP in high- versus low-dose AZT treatment arms in an AIDS randomized trial. J Am Stat Assoc 89:737–749

    Article  Google Scholar 

  • Robins JM, Blevins D, Ritter G, Wulfsohn M (1992) G-estimation of the effect of prophylaxis therapy for Pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology 3:319–336. Errata: Epidemiology 1993; 4:189

    Article  CAS  PubMed  Google Scholar 

  • Robins JM, Greenland S, Hu FC (1999) Estimation of the causal effect of time-varying exposure on the marginal means of a repeated binary outcome. J Am Stat Assoc 94:687–712

    Article  Google Scholar 

  • Robins JM, Hernán MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11:561–570

    Article  PubMed  Google Scholar 

  • Rosenthal R, Rubin DB (1979) A note on percent variance explained as a measure of importance of effects. J Appl Psychol 9:395–396

    Google Scholar 

  • Rothman KJ, Greenland S, Lash TL (2008) Modern epidemiology, 3rd edn. Lippincott Wolters Kluwer, Philadelphia

    Google Scholar 

  • Royston P, Sauerbrei W (2008) Multivariable model building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables. Wiley, New York

    Book  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge, New York

    Book  Google Scholar 

  • Saltelli A, Chan K, Scott EM (eds) (2000) Sensitivity analysis. Wiley, New York

    Google Scholar 

  • Sclove SL, Morris C, Radhakrishna R (1972) Non-optimality of preliminary-test estimators for the mean of a multivariate normal distribution. Ann Math Stat 43:1481–1490

    Article  Google Scholar 

  • Sheehe P (1962) Dynamic risk analysis in retrospective matched-pair studies of disease. Biometrics 18:323–341

    Article  Google Scholar 

  • Shen X, Huang H, Ye J (2004) Inference after model selection. J Am Stat Assoc 99:751–762

    Article  Google Scholar 

  • Steyerberg EW (2009) Clinical prediction models. Springer, New York

    Book  Google Scholar 

  • Strömberg U (1996) Collapsing ordered outcome categories: a note of concern. Am J Epidemiol 144:421–424

    Article  PubMed  Google Scholar 

  • Titterington DM (1985) Common structure of smoothing techniques in statistics. Int Stat Rev 53:141–170

    Article  Google Scholar 

  • Viallefont V, Raftery AE, Richardson S (2001) Variable selection and Bayesian model averaging in epidemiological case-control studies. Stat Med 20:3215–3230

    Article  CAS  PubMed  Google Scholar 

  • Weiss RE (1995) The influence of variable selection: a Bayesian diagnostic perspective. J Am Stat Assoc 90:619–625

    Article  Google Scholar 

  • White H (1994) Estimation, inference, and specification analysis. Cambridge University Press, New York

    Book  Google Scholar 

  • Ye J (1998) On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 93:120–131

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Greenland, S. (2014). Regression Methods for Epidemiological Analysis. In: Ahrens, W., Pigeot, I. (eds) Handbook of Epidemiology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09834-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09834-0_17

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-09833-3

  • Online ISBN: 978-0-387-09834-0

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics