Modeling and Analysis of Cost Data

  • Shizhe Chen
  • XH Andrew ZhouEmail author
Reference work entry
Part of the Health Services Research book series (HEALTHSR)


Cost has become an important outcome in health services research. It can be used not only as a measure for health care spending but also as a measure for a part of health care value. Given ever-increasing rising health care expenditure, the value of health care should include not only traditional measures, such as mortality and morbidity, but also the cost of health care. Due to a limited resource, a new treatment with a slightly better efficacy but much higher cost than an existing treatment may not be a choice of a treatment for a patient. Hence, it is important to be able to approximately analyze cost data. However, appropriately analyzing health care costs may be hindered by special distribution features of cost data, including skewness, zero values, clusters, heteroscedasticity, and multimodality.

Over the decades, various methods have been proposed to address these features. This chapter would be devoted in introducing methods that are able to provide relatively trustworthy results with acceptable efficiency, covering topics on mean inference, regression, and prediction.


  1. Ai C, Norton EC. Standard errors for the retransformation problem with heteroscedasticity. J Health Econ. 2000;19(5):697–718.PubMedCrossRefGoogle Scholar
  2. Aitchison J. On the distribution of a positive random variable having a discrete probability mass at the origin. J Am Stat Assoc. 1955;50(271):901–8.Google Scholar
  3. Blough DK, Madden CW, Hornbrook MC. Modeling risk using generalized linear models. J Health Econ. 1999;18(2):153–71.PubMedCrossRefGoogle Scholar
  4. Box GEP. Science and statistics. J Am Stat Assoc. 1976;71(356):791–9.CrossRefGoogle Scholar
  5. Briggs A, Nixon R, Dixon S, Thompson S. Parametric modelling of cost data: some simulation evidence. Health Econ. 2005;14(4):421–8.PubMedCrossRefGoogle Scholar
  6. Callahan CM, Kesterson JG, Tierney WM, et al. Association of symptoms of depression with diagnostic test charges among older adults. Ann Intern Med. 1997;126(6):426.PubMedCrossRefGoogle Scholar
  7. Yea-Hung Chen and Xiao-Hua Zhou. Interval estimates for the ratio and difference of two lognormal means. Stat Med, 25(23):4099–4113, 2006. ISSN 1097-0258. Scholar
  8. Dominici F, Cope L, Naiman DQ, Zeger SL. Smooth quantile ratio estimation. Biometrika. 2005;92(3):543–57.CrossRefGoogle Scholar
  9. Duan N. Smearing estimate: a nonparametric retransformation method. J Am Stat Assoc. 1983;78(383):605–10. ISSN 01621459. URL Scholar
  10. Efron B. Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika. 1981;68(3):589–99.CrossRefGoogle Scholar
  11. Fisher RA. The fiducial argument in statistical inference. Ann Hum Genet. 1935;6(4):391–8.Google Scholar
  12. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, volume 1. Springer Series in Statistics. 2001.Google Scholar
  13. Gupta RC, Li X. Statistical inference for the common mean of two log-normal distributions and some applications in reliability. Comput stat data anal. 2006;50(11):3141–64.CrossRefGoogle Scholar
  14. Hall P. On the removal of skewness by transformation. J R Stat Soc Ser B Methodol. 1992;54(1):221–8.Google Scholar
  15. Hannig J, Iyer H, Patterson P. Fiducial generalized confidence intervals. J Am Stat Assoc. 2006;101(473):254–69. Scholar
  16. Hayashi F. Econometrics, vol. volume 1. Princeton: Princeton University Press; 2000.Google Scholar
  17. Koenker R. Quantreg: quantile regression. R package version, 4. 2009.Google Scholar
  18. Koenker R, Hallock KF. Quantile regression. J Econ Perspect. 2001;15(4):143–56.CrossRefGoogle Scholar
  19. Krishnamoorthy K, Mathew T. Inferences on the means of lognormal distributions using generalized p-values and generalized confidence intervals. J stat plann infer. 2003;115(1):103–21.CrossRefGoogle Scholar
  20. Land CE. An evaluation of approximate confidence interval estimation methods for lognormal means. Technometrics. 1972;14(1):145–58.CrossRefGoogle Scholar
  21. Manning WG, Mullahy J. Estimating log models: to transform or not to transform? J Health Econ. 2001;20(4):461–94.PubMedCrossRefGoogle Scholar
  22. Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Econ. 2005;24(3):465–88.PubMedCrossRefGoogle Scholar
  23. Manning WG. The logged dependent variable, heteroscedasticity, and the retransfor-mation problem. J Health Econ. 1998;17(3):283–95. ISSN 0167-6296. URL Scholar
  24. McCullagh P, Nelder JA. Generalized linear models. Boca Raton: Chapman & Hall/CRC; 1989.CrossRefGoogle Scholar
  25. McLachlan GJ, Peel D. Finite mixture models, vol. volume 299. Hoboken: Wiley-Interscience; 2000.CrossRefGoogle Scholar
  26. Owen WJ, DeRouen TA. Estimation of the mean for lognormal data containing zeroes and left-censored values, with applications to the measurement of worker exposure to air contaminants. Biometrics. 1980;36(4):707–19. ISSN 0006341X. URL Scholar
  27. Seber GAF, Lee AJ. Linear regression analysis, vol. volume 936. Hoboken: Wiley; 2012.Google Scholar
  28. Tian L, Wu J. Confidence intervals for the mean of lognormal data with excess zeros. Biom J. 2006;48(1):149–56.PubMedCrossRefGoogle Scholar
  29. Lili Tian. Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Stat Med, 24(20):3223–3232, 2005. ISSN 1097-0258. Scholar
  30. Tsui K-W, Weerahandi S. Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J Am Stat Assoc. 1989;84(406):602–7. ISSN 01621459. URL Scholar
  31. Weerahandi S. Generalized confidence intervals. J Am Stat Assoc. 1993;88(423):899–905. ISSN 01621459. URL Scholar
  32. Weisberg S. Applied linear regression, volume 528. Wiley; 2005.Google Scholar
  33. Welsh AH, Zhou XH. Estimating the retransformed mean in a heteroscedastic two-part model. J stat plann infer. 2006;136(3):860–81.CrossRefGoogle Scholar
  34. Wu J, Wong ACM, Jiang G. Likelihood-based confidence intervals for a log-normal mean. Stat Med. 2003;22(11):1849–60.PubMedCrossRefGoogle Scholar
  35. Zhou XH. Estimation of the log-normal mean. Stat Med. 1998;17(19):2251–64.PubMedCrossRefGoogle Scholar
  36. Zhou XH, Gao S. Confidence intervals for the log-normal mean. Stat Med. 1997;16(7):783–90.PubMedCrossRefGoogle Scholar
  37. Zhou XH, Gao S. One-sided confidence intervals for means of positively skewed distributions. Am Stat. 2000:100–4.Google Scholar
  38. Zhou XH, Tu W. Comparison of several independent population means when their samples contain log-normal and possibly zero observations. Biometrics. 1999;55(2):645–51.CrossRefGoogle Scholar
  39. Zhou XH, Tu W. Confidence intervals for the mean of diagnostic test charge data containing zeros. Biometrics. 2000;56(4):1118–25.PubMedCrossRefGoogle Scholar
  40. Zhou XH, Lin H, Johnson E. Non-parametric heteroscedastic transformation regression models for skewed data with an application to health care costs. J R Stat Soc Ser B Stat Methodol. 2008;70(5):1029–47.CrossRefGoogle Scholar
  41. Zhou X-H, Gao S, Hui SL. Methods for comparing the means of two independent log-normal samples. Biometrics. 1997;53(3):1129–35. ISSN 0006341X. URL Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of BiostatisticsUniversity of WashingtonSeattleUSA
  2. 2.Beijing International Center for Mathematical ResearchPeking UniversityBeijingChina
  3. 3.VA Puget Sound Healthcare SystemUniversity of WashingtonSeattleUSA

Personalised recommendations