# Statistical Issues in the Estimation of the Causal Effects of Smoking Due to the Conduct of the Tobacco Industry

• Donald B. Rubin
Chapter
Part of the Statistics for Social Science and Public Policy book series (SSBS)

## Summary

A major legal issue for the past several years has been the tobacco industry’s liability for health-care expenditures incurred because of its alleged misconduct beginning in the mid-1950s. Quantifying answers to such causal questions is a statistical enterprise, which has been especially active in the last quarter century.

This chapter summarizes my formulation of a statistically valid approach for estimating the potential damages in the tobacco litigation. Six distinct statistical tasks are outlined, although no specific estimates are produced. These six tasks are: formulation of mathematical statistical framework; assembly of data to estimate health-care-expenditure relative risks of smoking in the actual world; design of the statistical analyses to estimate these expenditure relative risks—a problem closely related to causal inference in observational studies; assembly and analysis of appropriate data to estimate the prevalence of different types of smoking behaviors and other health-expenditure-related factors in the relevant population—a problem of survey inference; assembly and analysis of appropriate data to estimate the dollar pots of health-care expenditures of various types in the relevant population—another problem of survey inference; assembly and analysis of information concerning the prevalence of smoking and other health-expenditure-related factors in a counterfactual world without the alleged misconduct of the tobacco industry—a problem involving explicit assumptions justified by actual-world experimental and observational data. This sixth task is the critical step where the alleged misconduct, and thus causal inferences, enter the equation; the second through fifth tasks involve the careful assembly and analysis of actual-world data. The outputs from the last five tasks (2 – 6) are input into an equation, which is derived in the first task, to give an estimate of the causal effect of the alleged misconduct on health-care expenditures.

The plausibility and validity of the results depend critically on the use of detailed information on health-care expenditures, smoking behavior, and covariates (i.e., background characteristics and nonsmoking health-care-expenditure-related factors). The reason is that this detail is used to justify the key assumption in the mathematical statistical formulation in the first task. The need for this level of detailed information places extra demands on the data-based efforts in the last five tasks.

The formulation presented here distinguishes issues of fact about the actual world, involving the health-care-expenditure relative risks of smoking and the prevalence of smoking behaviors and other health-care-expenditure-related factors, from issues using actual-world facts to conjecture about the counterfactual world. The results show that, under the key assumption, counterfactual-world estimation enters the equation only through the differences between actual-and counterfactual-world prevalences of smoking and other health-expenditure-related behaviors in subpopulations defined by background charactertics.

## Keywords

Propensity Score Causal Effect Smoking Behavior Actual World Statistical Issue
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

