Regression Methods for Epidemiological Analysis

Greenland, Sander

doi:10.1007/978-0-387-09834-0_17

Regression Methods for Epidemiological Analysis

Sander Greenland³

Reference work entry

12k Accesses

Abstract

Basic tabular and graphical methods are an essential component of epidemiological analysis and are often sufficient, especially when one need consider only a few variables at a time. They are, however, limited in the number of variables that they can examine simultaneously and in detail they can consider continuous variables. Even sparse-strata methods (such as Mantel-Haenszel) require that some strata have two or more subjects; yet, as more and more variables or categories are added to a stratification, the number of subjects in each stratum may eventually drop to 0 or 1.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 999.99; Price excludes VAT (USA)

Hardcover Book: USD 1,399.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agresti A (2002) Categorical data analysis. Wiley, New York
Book Google Scholar
Ananth CV, Kleinbaum DG (1997) Regression models for ordinal responses: a review of methods and applications. Int J Epidemiol 26:1323–1333
Article CAS PubMed Google Scholar
Bancroft TA, Han C-P (1977) Inference based on conditional specification: a note and a bibliography. Int Stat Rev 45:117–127
Google Scholar
Berk R (2004) Regression analysis: a constructive critique. Sage publications, Thousand Oaks
Google Scholar
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge
Google Scholar
Breiman L (2001) Statistical modeling: the two cultures (with discussion). Stat Sci 16:199–231
Article Google Scholar
Breslow NE, Day NE (1980) Statistical methods in cancer research. Vol I: the analysis of case-control data. IARC, Lyon
Google Scholar
Breslow NE, Day NE (1987) Statistical methods in cancer research. Vol II: the design and analysis of cohort studies. IARC, Lyon
Google Scholar
Brown PJ, Vannucci M, Fearn T (2002) Bayes model averaging with selection of regressors. J R Stat Soc Ser B 64:519–536
Article Google Scholar
Carlin B, Louis TA (2000) Bayes and empirical-Bayes methods of data analysis, 2nd edn. Chapman and Hall, New York
Book Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models, 2nd edn. Chapman and Hall, New York
Book Google Scholar
Cole SR, Ananth CV (2001) Regression models for unconstrained, partially or fully constrained continuation odds ratios. Int J Epidemiol 30:1379–1382
Article CAS PubMed Google Scholar
Copas JB (1983) Regression, prediction, and shrinkage (with discussion). J R Stat Soc B 45: 311–354
Google Scholar
Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220
Google Scholar
Cox DR, Oakes D (1984) Analysis of survival data. Chapman and Hall, New York
Google Scholar
Cox DR, Wermuth N (1992) A comment on the coefficient of determination for binary responses. Am Stat 46:1–4
Google Scholar
Diggle PJ, Heagerty P, Liang KY, Zeger SL (2002) The analysis of longitudinal data, 2nd edn. Oxford University Press, New York
Google Scholar
Draper D (1995) Assessment and propagation of model uncertainty. J R Stat Soc Ser B 57:45–97
Google Scholar
Draper NR, Guttman I, Lapczak L (1979) Actual rejection levels in a certain stepwise test. Commun Stat A 8:99–105
Article Google Scholar
Easton DF, Peto J, Babiker AG (1991) Floating absolute risk: an alternative to relative risk in survival and case-control analysis avoiding an arbitrary reference group. Stat Med 10:1025–1035
Article CAS PubMed Google Scholar
Efron B (2004) The estimation of prediction error: covariance penalties and cross-validation. J Am Stat Assoc 99:619–642
Article Google Scholar
Efron B, Morris CN (1975) Data analysis using Stein’s estimator and its generalizations. J Am Stat Assoc 70:311–319
Article Google Scholar
Faraway JJ (1992) On the cost of data analysis. J Comput Graph Stat 1:213–219
Google Scholar
Flack VF, Chang PC (1987) Frequency of selecting noise variables in subset regression analysis: a simulation study. Am Stat 41:84–86
Google Scholar
Freedman DA (1983) A note on screening regression equations. Am Stat 37:152–155
Google Scholar
Freedman DA, Navidi W, Peters SC (1988) On the impact of variable selection in fitting regression equations. In: Dijlestra TK (ed) On model uncertainty and its statistical implications. Springer, Berlin, pp 1–16
Chapter Google Scholar
Greenland S (1993) Methods for epidemiologic analyses of multiple exposures: a review and comparative study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Stat Med 12:717–736
Article CAS PubMed Google Scholar
Greenland S (1994) Alternative models for ordinal logistic regression. Stat Med 13:1665–1677
Article CAS PubMed Google Scholar
Greenland S (1995a) Dose-response and trend analysis: alternatives to categorical analysis. Epidemiology 6:356–365
Article CAS PubMed Google Scholar
Greenland S (1995b) Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology 6:450–454
Article CAS PubMed Google Scholar
Greenland S (1995c) Problems in the average-risk interpretation of categorical dose-response analyses. Epidemiology 6:563–565
Article CAS PubMed Google Scholar
Greenland S (1996) A lower bound for the correlation of exponentiated bivariate normal pairs. Am Stat 50:163–164
Google Scholar
Greenland S (1999) Multilevel modeling and model averaging. Scand J Work Environ Health 25(suppl 4):43–48
PubMed Google Scholar
Greenland S (2000a) Principles of multilevel modeling. Int J Epidemiol 29:158–167
Article CAS PubMed Google Scholar
Greenland S (2000b) When should epidemiologic regressions use random coefficients? Biometrics 56:915–921
Article CAS PubMed Google Scholar
Greenland S (2001) Putting background information about relative risks into conjugate priors. Biometrics 57:663–670
Article CAS PubMed Google Scholar
Greenland S (2003) The impact of prior distributions for uncontrolled confounding and response bias: a case study of the relation of wire codes and magnetic fields to childhood leukemia. J Am Stat Assoc 98:47–54
Article Google Scholar
Greenland S (2004) Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. Am J Epidemiol 160:301–305
Article PubMed Google Scholar
Greenland S (2005a) Epidemiologic measures and policy formulation: Lessons from potential outcomes (with discussion). Emerg Themes Epidemiol 2:1–4
Article Google Scholar
Greenland S (2005b) Multiple-bias modeling for observational studies. J R Stat Soc Ser A 168:267–308
Article Google Scholar
Greenland S (2006) Bayesian perspectives for epidemiologic research. I. Foundations and basic methods (with comment and reply). Int J Epidemiol 35:765–778
Article PubMed Google Scholar
Greenland S (2007) Bayesian perspectives for epidemiologic research. II. Regression analysis. Int J Epidemiol 36:195–202
Article PubMed Google Scholar
Greenland S (2008a) Introduction to regression modeling. Chap. 21. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Greenland S (2008b) Variable selection and shrinkage in the control of multiple confounders. Am J Epidemiol 167:523–529, Erratum: p 1142
Article PubMed Google Scholar
Greenland S (2009a) Bayesian perspectives for epidemiologic research. III. Bias analysis via missing-data methods. Int J Epidemiol 38:1662–1673
Article PubMed Google Scholar
Greenland S (2009b). Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Stat Sci 24:195–210
Article Google Scholar
Greenland S, Lash TL (2008) Bias analysis. Chap. 19. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Greenland S, Maldonado G (1994). The interpretation of multiplicative model parameters as standardized parameters. Statistics in Medicine 13:989–999
Article CAS PubMed Google Scholar
Greenland S, Poole C (1995) Interpretation and analysis of differential exposure variability and zero-dose categories for continuous exposures. Epidemiology 6:326–328
Article CAS PubMed Google Scholar
Greenland S, Rothman KJ (2008) Fundamentals of epidemiologic data analysis. Chap. 13. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Greenland S, Schlesselman JJ, Criqui MH (1986) The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am J Epidemiol 123:203–208
CAS PubMed Google Scholar
Greenland S, Maclure M, Schlesselman JJ, Poole C, Morgenstern H (1991) Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology 2:387–392
Article CAS PubMed Google Scholar
Greenland S, Michels KB, Robins JM, Poole C, Willett WC (1999) Presenting statistical uncertainty in trends and dose-response relations. Am J Epidemiol 149:1077–1086
Article CAS PubMed Google Scholar
Greenland S, Schwartbaum JA, Finkle WD (2000) Problems from small samples and sparse data in conditional logistic regression. Am J Epidemiol 151:531–539
Article CAS PubMed Google Scholar
Greenland S, Rothman KJ, Lash TL (2008) Concepts of interaction. Chap. 5. In: Rothman KJ, Greenland S, Lash TL (eds) Modern epidemiology, 2nd edn. Lippincott Williams & Wilkins, Philadelphia
Google Scholar
Gustafson P (2003) Measurement error and misclassification in statistics and epidemiology. Chapman and Hall, Boca Raton
Book Google Scholar
Gustafson P (2005) On model expansion, model contraction, identifiability, and prior information (with discussion). Stat Sci 20:111–140
Article Google Scholar
Harrell F (2001) Regression modeling strategies. Springer, New York
Book Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Book Google Scholar
Hernán MA (2005) Hypothetical interventions to define causal effects—afterthought or prerequisite? Am J Epidemiol 162:618–620
Article PubMed Google Scholar
Hirji K (2006) Exact analysis of discrete data. CRC Press/Chapman and Hall, Boca Raton
Google Scholar
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York
Book Google Scholar
Hosmer DW, Hosmer T, LeCessie S, Lemeshow S (1997) A comparison of goodness-of-fit tests for the logistic regression model. Stat Med 16:965–980
Article CAS PubMed Google Scholar
Hurvich DM, Tsai CL (1990) The impact of model selection on inference in linear regression. Am Stat 44:214–217
Google Scholar
Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York
Book Google Scholar
Jewell NP (2004) Statistics for epidemiology. Chapman and Hall, New York
Google Scholar
Lagakos SW (1988) Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat Med 7:257–274
Article CAS PubMed Google Scholar
Lash TL, Fox MP, Fink AK (2009) Applying quantitative bias analysis to epidemiologic data. Springer, New York
Book Google Scholar
Le Cessie S, van Houwelingen HC (1992) Ridge estimators in logistic regression. Appl Stat 41:191–201
Article Google Scholar
Leamer EE (1978) Specification searches: ad hoc inference with nonexperimental data. Wiley, New York
Google Scholar
Maclure M (1993) Demonstration of deductive meta-analysis: ethanol intake and risk of myocardial infarction. Epidemiol Rev 15:328–351
CAS PubMed Google Scholar
Maclure M, Greenland S (1992) Tests for trend and dose-response: misinterpretations and alternatives. Am J Epidemiol 135:96–104
CAS PubMed Google Scholar
Maldonado G, Greenland S (1993a) Interpreting model coefficients when the true model form is unknown. Epidemiology 4:310–318
Article CAS PubMed Google Scholar
Maldonado G, Greenland S (1993b) Simulation study of confounder-selection strategies. Am J Epidemiol 138:923–936
CAS PubMed Google Scholar
Maldonado G, Greenland S (1994) A comparison of the performance of model-based confidence intervals when the correct model form is unknown: coverage of asymptotic means. Epidemiology 5:171–182
Article CAS PubMed Google Scholar
Mantel N, Haenszel WH (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719–748
CAS PubMed Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, New York
Book Google Scholar
Michels KB, Greenland S, Rosner BA (1998) Does body mass index adequately capture the relation of body composition and body size to health outcomes? Am J Epidemiol 147:167–172
Article CAS PubMed Google Scholar
Moolgavkar SH, Venzon DJ (1987) General relative risk regression models for epidemiologic studies. Am J Epidemiol 126:949–961
CAS PubMed Google Scholar
Pearl J (2009) Causality, 2nd edn. Cambridge, New York
Book Google Scholar
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49:1373–1379
Article CAS PubMed Google Scholar
Pike MC, Hill AP, Smith PG (1980) Bias and efficiency in logistic analyses of stratified case-control studies. Int J Epidemiol 9:89–95
Article CAS PubMed Google Scholar
Pregibon D (1981) Logistic regression diagnostics. Ann Stat 9:705–724
Article Google Scholar
Raftery AE (1995) Bayesian model selection in social research (with discussion). Sociol Methodol 25:111–196
Article Google Scholar
Robins JM, Greenland S (1986) The role of model selection in causal inference from nonexperimental data. Am J Epidemiol 123:392–402
CAS PubMed Google Scholar
Robins JM, Greenland S (1994) Adjusting for differential rates of prophylaxis therapy for PCP in high- versus low-dose AZT treatment arms in an AIDS randomized trial. J Am Stat Assoc 89:737–749
Article Google Scholar
Robins JM, Blevins D, Ritter G, Wulfsohn M (1992) G-estimation of the effect of prophylaxis therapy for Pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology 3:319–336. Errata: Epidemiology 1993; 4:189
Article CAS PubMed Google Scholar
Robins JM, Greenland S, Hu FC (1999) Estimation of the causal effect of time-varying exposure on the marginal means of a repeated binary outcome. J Am Stat Assoc 94:687–712
Article Google Scholar
Robins JM, Hernán MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11:561–570
Article PubMed Google Scholar
Rosenthal R, Rubin DB (1979) A note on percent variance explained as a measure of importance of effects. J Appl Psychol 9:395–396
Google Scholar
Rothman KJ, Greenland S, Lash TL (2008) Modern epidemiology, 3rd edn. Lippincott Wolters Kluwer, Philadelphia
Google Scholar
Royston P, Sauerbrei W (2008) Multivariable model building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables. Wiley, New York
Book Google Scholar
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge, New York
Book Google Scholar
Saltelli A, Chan K, Scott EM (eds) (2000) Sensitivity analysis. Wiley, New York
Google Scholar
Sclove SL, Morris C, Radhakrishna R (1972) Non-optimality of preliminary-test estimators for the mean of a multivariate normal distribution. Ann Math Stat 43:1481–1490
Article Google Scholar
Sheehe P (1962) Dynamic risk analysis in retrospective matched-pair studies of disease. Biometrics 18:323–341
Article Google Scholar
Shen X, Huang H, Ye J (2004) Inference after model selection. J Am Stat Assoc 99:751–762
Article Google Scholar
Steyerberg EW (2009) Clinical prediction models. Springer, New York
Book Google Scholar
Strömberg U (1996) Collapsing ordered outcome categories: a note of concern. Am J Epidemiol 144:421–424
Article PubMed Google Scholar
Titterington DM (1985) Common structure of smoothing techniques in statistics. Int Stat Rev 53:141–170
Article Google Scholar
Viallefont V, Raftery AE, Richardson S (2001) Variable selection and Bayesian model averaging in epidemiological case-control studies. Stat Med 20:3215–3230
Article CAS PubMed Google Scholar
Weiss RE (1995) The influence of variable selection: a Bayesian diagnostic perspective. J Am Stat Assoc 90:619–625
Article Google Scholar
White H (1994) Estimation, inference, and specification analysis. Cambridge University Press, New York
Book Google Scholar
Ye J (1998) On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 93:120–131
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Epidemiology, School of Public Health, University of California, 90095-1772, Los Angeles, CA, USA
Sander Greenland

Authors

Sander Greenland
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Epidemiological Methods and Etiologic Research, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
Wolfgang Ahrens
Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
Iris Pigeot

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Greenland, S. (2014). Regression Methods for Epidemiological Analysis. In: Ahrens, W., Pigeot, I. (eds) Handbook of Epidemiology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09834-0_17

Download citation

DOI: https://doi.org/10.1007/978-0-387-09834-0_17
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-09833-3
Online ISBN: 978-0-387-09834-0
eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics