Health Care Management Science

, Volume 17, Issue 3, pp 284–301 | Cite as

A predictive modeling approach to increasing the economic effectiveness of disease management programs

  • Andreas Bayerstadler
  • Franz Benstetter
  • Christian Heumann
  • Fabian Winter


Predictive Modeling (PM) techniques are gaining importance in the worldwide health insurance business. Modern PM methods are used for customer relationship management, risk evaluation or medical management. This article illustrates a PM approach that enables the economic potential of (cost-)effective disease management programs (DMPs) to be fully exploited by optimized candidate selection as an example of successful data-driven business management. The approach is based on a Generalized Linear Model (GLM) that is easy to apply for health insurance companies. By means of a small portfolio from an emerging country, we show that our GLM approach is stable compared to more sophisticated regression techniques in spite of the difficult data environment. Additionally, we demonstrate for this example of a setting that our model can compete with the expensive solutions offered by professional PM vendors and outperforms non-predictive standard approaches for DMP selection commonly used in the market.


Health insurance Selection for disease management programs Predictive modeling Generalized linear model Comparison of methods 



We would like to thank the health insurance company concerned for providing us claims data and the three PM vendors for participating in the test.


  1. 1.
    Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723CrossRefGoogle Scholar
  2. 2.
    Antonio K, Beirlant J (2007) Actuarial statistics with generalized linear mixed models. Insur Math Econ 40(1):58–76CrossRefGoogle Scholar
  3. 3.
    Belitz C, Brezger A, Kneib T, Lang S (2009) BayesX—software for Bayesian inference in structured additive regression models, version 2.0.1. Erhältlich unter:
  4. 4.
    Billings J, Mijanovich T (2007) Improving the management of care for high-cost medicaid patients. Health Aff 26(6):1643–1655CrossRefGoogle Scholar
  5. 5.
    Blough DK, Madden CW, Hornbrook MC (1999) Modeling risk using generalized linear models. J Health Econ 18:153–171CrossRefGoogle Scholar
  6. 6.
    Bodenheimer T, Lorig K, Holman H, Grumbach K (2002) Patient self-management of chronic disease in primary care. J Am Med Assoc 288(19):2469–2475CrossRefGoogle Scholar
  7. 7.
    Breiman L (1984) Classification and regression trees. Chapman & Hall/CRC, LondonGoogle Scholar
  8. 8.
    Buntin MB, Zaslavsky AM (2004) Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures. J Health Econ 23:525–542CrossRefGoogle Scholar
  9. 9.
    Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, New YorkCrossRefGoogle Scholar
  10. 10.
    Davison AC (2003) Statistical models. Cambridge University Press, New YorkCrossRefGoogle Scholar
  11. 11.
    De Jong P, Heller GZ (2008) Generalized linear models for insurance data. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  12. 12.
    Diehr P, Yanez D, Ash A, Hornbrook M, Lin DY (1999) Methods for analyzing health care utilization and costs. Ann Rev Public Health 20:125–144CrossRefGoogle Scholar
  13. 13.
    Duan N, Manning WG, Morris CN, Newhouse JP (1983) A comparison of alternative models for the demand for medical care. J Bus Econ Stat 1(2):115–126Google Scholar
  14. 14.
    Fahrmeir L, Kneib T (2010) Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, LondonGoogle Scholar
  15. 15.
    Francis L (2001) Neural networks demystified. Tech. rep. Casualty actuarial society forum. Available at:
  16. 16.
    Francis L (2003) Martian chronicles: is MARS better than neural networks? Tech. rep. Casualty actuarial society forum. Available at: Scholar
  17. 17.
    Freeman R, Lybecker KM, Taylor DW (2011) The effectiveness of disease management programs in the medicaid population. Tech. rep. The Cameron Institute. Available at:
  18. 18.
    Frees EW, Valdez EA (2008) Hierarchical insurance claims modeling. J Am Stat Assoc 103(484):1457–1469CrossRefGoogle Scholar
  19. 19.
    Frees EW, Young VR, Luo Y (1999) A longitudinal data analysis interpretation of credibility models. Insur Math Econ 24:229–247CrossRefGoogle Scholar
  20. 20.
    Freitag AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer Verlag, BerlinCrossRefGoogle Scholar
  21. 21.
    Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–141CrossRefGoogle Scholar
  22. 22.
    Good PI (2005) Introduction to statistics through resampling methods and R/S-PLUS. Wiley, New YorkCrossRefGoogle Scholar
  23. 23.
    Haberman S, Renshaw AE (1998) Actuarial applications of generalized linear models. In: Hand D, Jacka S (eds) Statistics in finance. Arnold, E., LondonGoogle Scholar
  24. 24.
  25. 25.
    Inglis SC, Clark RA, McAlister FA, Ball J, Lewinter C, Cullington D, Stewart S, Cleland JGF (2010) Structured telephone support or telemonitoring programmes for patients with chronic heart failure. Cochrane Database Syst Rev 2010 8:CD007228Google Scholar
  26. 26.
    Kolyshkina I, Wong SSW, Lim S (2004) Enhancing generalised linear models with data mining. Discussion paper. Casualty actuarial society. Arlington, Virginia. Available at:
  27. 27.
    Lamers LM (1999) A risk-adjuster for capitation payments based on the use of prescribed drugs. Med Care 37:824–830CrossRefGoogle Scholar
  28. 28.
    Lamers LM (2004) AIC and BIC—comparisons of assumptions and performance. Sociol Methods Res 33(2):188–229CrossRefGoogle Scholar
  29. 29.
    Liang KY, Zeger S (1986) GEE estimators. Biometrika 73(1):13–22CrossRefGoogle Scholar
  30. 30.
    Lorig KR, Ritter P, Stewart AL, Sobel DS, Brown WB, Bandura A, Gonzalez VM, Laurent DD, Holman HR (2001) Chronic disease self-management program: 2-year health status and health care utilization outcomes. Med Care 39(11):1217–1223CrossRefGoogle Scholar
  31. 31.
    MacKay D (2003) Information theory, inference and learning algorithms. Cambridge University Press, CambridgeGoogle Scholar
  32. 32.
    Manning WG (1998) The logged dependent variable, heteroscedasticity, and the retransformation problem. J Health Econ 17:283–295CrossRefGoogle Scholar
  33. 33.
    Manning WG, Mullahy J (2001) Estimating log models: to transform or not to transform? J Health Econ 20:461–494CrossRefGoogle Scholar
  34. 34.
    McCullagh P, Nelder JA (1989) Generalized linear models. Chapman & Hall / CRC, LondonCrossRefGoogle Scholar
  35. 35.
    McCulloch CE, Searle SR (2001) Generalized, linear, and mixed models. Wiley, New YorkGoogle Scholar
  36. 36.
    Mehmud S, Winkelman R (2007) A comparative analysis of claims-based tools for health risk assessment. Tech. rep. Society of Actuaries. Available at:
  37. 37.
    Meyer J, Smith BM (2008) Chronic disease management: evidence of predictable savings. Tech. rep. Health management associates. Available at:
  38. 38.
    Miller AJ (1990) Subset selection in regression. Chapman and Hall, New YorkCrossRefGoogle Scholar
  39. 39.
    Mullahy J (1998) Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J Health Econ 17:247–281CrossRefGoogle Scholar
  40. 40.
    Newhouse JP, Manning WG, Keeler EB, Sloss EM (1989) Adjusting capitation rates using objective health measures and prior utilization. Health Care Financ R 10(3):41–54Google Scholar
  41. 41.
    Nugent R (2008) Chronic diseases in developing countries: health and economic burdens. Ann N Y Acad Sci 1136:70–79CrossRefGoogle Scholar
  42. 42.
    Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133:225–245CrossRefGoogle Scholar
  43. 43.
    Powers CA, Meyer CM, Roebuck MC, Vaziri B (2005) Predictive modeling of total healthcare costs using pharmacy claims data: a comparison of alternative econometric cost modeling techniques. Med Care 43(11):1065–1072CrossRefGoogle Scholar
  44. 44.
    R Development Core Team (2009) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria. ISBN 3-900051-07-0
  45. 45.
    Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  46. 46.
    Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464CrossRefGoogle Scholar
  47. 47.
    Tutz G (2000) Analyse kategorialer Daten. Oldenbourg Verlag, MunichGoogle Scholar
  48. 48.
    Tutz G, Fahrmeir L (2001) Multivariate statistical modelling based on generalized linear models. Springer, New YorkGoogle Scholar
  49. 49.
    Veazie PJ, Manning WG, Kane RL (2003) Improving risk adjustment for medicare capitated reimbursement using nonlinear models. Med Care 41(6):741–752Google Scholar
  50. 50.
    Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn.Springer, BerlinCrossRefGoogle Scholar
  51. 51.
    Viaene S, Derrig RA, Baesens B, Dedene G (2002) A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J Risk Insur 69(3):373–421CrossRefGoogle Scholar
  52. 52.
    Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models and the Gauss-Newton method. Environ Res 104:402–409Google Scholar
  53. 53.
    Yau KW, Lee AH, Ng ASK (2002) A zero-augmented gamma mixed model for longitudinal data with many zeros. Aust N Z J Stat 44(2):177–183CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Andreas Bayerstadler
    • 1
  • Franz Benstetter
    • 1
  • Christian Heumann
    • 2
  • Fabian Winter
    • 1
  1. 1.Munich Health, Munich ReMunichGermany
  2. 2.Institute of StatisticsLudwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations