Drug Safety

pp 1–15 | Cite as

An Implementation and Visualization of the Tree-Based Scan Statistic for Safety Event Monitoring in Longitudinal Electronic Health Data

  • Stephen E. SchachterleEmail author
  • Sharon Hurley
  • Qing Liu
  • Kenneth R. Petronis
  • Andrew Bate
Short Communication



Longitudinal electronic healthcare data hold great potential for drug safety surveillance. The tree-based scan statistic (TBSS), as implemented by the TreeScan® software, allows for hypothesis-free signal detection in longitudinal data by grouping safety events according to branching, hierarchical data coding systems, and then identifying signals of disproportionate recording (SDRs) among the singular events or event groups.


The objective of this analysis was to identify and visualize SDRs with the TBSS in historical data from patients using two antifungal drugs, itraconazole or terbinafine. By examining patients who used either itraconazole or terbinafine, we provide a conceptual replication of a previous TBSS analyses by varying methodological choices and using a data source that had not been previously used with the TBSS, i.e., the Optum Clinformatics™ claims database. With this analysis, we aimed to test a parsimonious design that could be the basis of a broadly applicable method for multiple drug and safety event pairs.


The TBSS analysis was used to examine incident events and any itraconazole or terbinafine use among US-based patients from 2002 through 2007. Event frequencies before and after the first day of drug exposure were compared over 14- and 56-day periods of observation in a Bernoulli model with a self-controlled design. Safety events were classified into a hierarchical tree structure using the Clinical Classifications Software (CCS) which mapped International Classification of Diseases, 9th Revision (ICD-9) codes to 879 diagnostic groups. Using the TBSS, the log likelihood ratio of observed versus expected events in all groups along the CCS hierarchy were compared, and groups of events that occurred at disproportionally high frequencies were identified as potential SDRs; p-values for the potential SDRs were estimated with Monte-Carlo permutation based methods. Output from TreeScan® was visualized and plotted as a network which followed the CCS tree structure.


Terbinafine use (n = 223,968) was associated with SDRs for diseases of the circulatory system (14- and 56-day p = 0.001) and heart (14-day p = 0.026 and 56-day p = 0.001) as well as coronary atherosclerosis and other heart disease (14-day p = 0.003 and 56-day p = 0.004). For itraconazole use (n = 36,025), the TBSS identified SDRs for coronary atherosclerosis and other heart disease (p = 0.002) and complications of an implanted or grafted device (14-day p = 0.001 and 56-day p < 0.05). Use of both drugs was associated with SDRs for diseases of the digestive system at 14 days (p < 0.05) and this SDR had been observed among terbinafine users in a previous TBSS analysis with a different data source. The TreeScan® visualization facilitated the identification of the atherosclerosis and other heart disease SDRs as well as highlighting the consistency of the SDR for diseases of the digestive system across drugs and data sources.


With the TBSS, we identified potential SDRs related to the circulatory system that may reflect the cardiac risk that was described in the itraconazole product label. SDRs for diseases of the digestive system among terbinafine users were also reported in a previous signal detection analysis, although other SDRs from the previous publications were not replicated. The TBSS visualizations aided in the understanding and interpretation of the TBSS output, including the comparisons to the previous publications. In this conceptual replication, differences in the results observed in our analysis and the previous analyses could be attributable to variation in modeling and design choices as well as factors that were intrinsic to the underlying data sources. The broad consistency, but far from perfect concordance, of our results with the known safety profile of these antifungals including the risks from the itraconazole product label supports the rationale for continued investigations of signal detection methods across differing data sources and populations.



We would like to thank Richard Gong for his support in the creation of the TreeScan® visualizations and careful inspection of the SAS® code that generated the input data for TreeScan®.

Compliance with Ethical Standards

Conflict of interest

Stephen Edward Schachterle, Qing Liu, Kenneth R. Petronis, and Andrew Bate are full-time employees of Pfizer and hold Pfizer stocks and stock options. Sharon Hurley was a full-time contract employee of Pfizer at the time of her contribution.


No sources of external funding were used to assist in the preparation of this study.


  1. 1.
    Norén GN, Hopstadius J, Bate A, Star K, Edwards IR. Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov. 2009;20(3):361–87.CrossRefGoogle Scholar
  2. 2.
    Choi NK, Chang Y, Choi YK, Hahn S, Park BJ. Signal detection of rosuvastatin compared to other statins: data-mining study using national health insurance claims database. Pharmacoepidemiol Drug Saf. 2010;19(3):238–46.CrossRefGoogle Scholar
  3. 3.
    Norén GN, Hopstadius J, Bate A, Edwards IR. Safety surveillance of longitudinal databases: results on real-world data. Pharmacoepidemiol Drug Saf. 2012;21(6):673–5.CrossRefGoogle Scholar
  4. 4.
    Maro JC, Brown JS, Kulldorff M. Medical product safety surveillance: how many databases to use? Epidemiology. 2013;24(5):692–9.CrossRefGoogle Scholar
  5. 5.
    Brown JS, Petronis KR, Bate A, Zhang F, Dashevsky I, Kulldorff M, et al. Drug adverse event detection in health plan data using the gamma poisson shrinker and comparison to the tree-based scan statistic. Pharmaceutics. 2013;5(1):179–200.CrossRefGoogle Scholar
  6. 6.
    Kulldorff M, Dashevsky I, Avery TR, Chan AK, Davis RL, Graham D, et al. Drug safety data mining with a tree-based scan statistic. Pharmacoepidemiol Drug Saf. 2013;22(5):517–23.CrossRefGoogle Scholar
  7. 7.
    Arnaud M, Begaud B, Thurin N, Moore N, Pariente A, Salvo F. Methods for safety signal detection in healthcare databases: a literature review. Expert Opin Drug Saf. 2017;16(6):721–32.CrossRefGoogle Scholar
  8. 8.
    Maro JC, Brown JS, Dal Pan GJ, Kulldorff M. Minimizing signal detection time in postmarket sequential analysis: balancing positive predictive value and sensitivity. Pharmacoepidemiol Drug Saf. 2014;23(8):839–48.Google Scholar
  9. 9.
    Orre R, Bate A, Noren GN, Swahn E, Arnborg S, Edwards IR. A bayesian recurrent neural network for unsupervised pattern recognition in large incomplete data sets. Int J Neural Syst. 2005;15(3):207–22.CrossRefGoogle Scholar
  10. 10.
    Wisniewski AF, Bate A, Bousquet C, Brueckner A, Candore G, Juhlin K, et al. Good signal detection practices: evidence from IMI PROTECT. Drug Saf. 2016;39(6):469–90.CrossRefGoogle Scholar
  11. 11.
    Bate A, Brown EG, Goldman SA, Hauben M. Terminological challenges in safety surveillance. Drug Saf. 2012;35(1):79–84.CrossRefGoogle Scholar
  12. 12.
    Kulldorff M, Fang Z, Walsh SJ. A tree-based scan statistic for database disease surveillance. Biometrics. 2003;59(2):323–31.CrossRefGoogle Scholar
  13. 13.
    Yih KW, Nguyen M, Maro JC, Baker M, Balsbaugh C, Brown J, et al. Mini-sentinel CBER/PRISM methods protocol: pilot of self-controlled tree-temporal scan analysis for gardasil vaccine. Version 2.0. Accessed 20 Dec 2015.
  14. 14.
    Wang SV, Maro JC, Baro E, Izem R, Dashevsky I, Rogers JR, et al. Data mining for adverse drug events with a propensity score matched tree-based scan statistic. Epidemiology. 2018;29(6):895–903.CrossRefGoogle Scholar
  15. 15.
    Maro JC, Dashevsky I, Kulldorff M. Postlicensure medical product safety data-mining: power calculations for Bernoulli Data. Sentinel Methods Report. 2017. Accessed 16 Nov 2018.
  16. 16.
    Li R, Weintraub E, McNeil MM, Kulldorff M, Lewis EM, Nelson J, et al. Meningococcal conjugate vaccine safety surveillance in the Vaccine Safety Datalink using a tree-temporal scan data mining method. Pharmacoepidemiol Drug Saf. 2018;27(4):391–7.CrossRefGoogle Scholar
  17. 17.
    Wang SV, Schneeweiss S, Berger ML, Brown J, de Vries F, Douglas I, et al.; Joint ISPE-ISPOR Special Task Force on Real World Evidence in Health Care Decision Making. Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26(9):1018–32.Google Scholar
  18. 18.
    R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2014. Accessed 12 Sept 2018.
  19. 19.
    Stergachis A, Saunders KW, Davis RL, Kimmel SE, Schinnar R, Chan KA, et al. Examples of automated databases. In: Strom B, Kimmel SE, editors. Textbook of pharmacoepidemiology. Chichester: Wiley; 2013. p. 173–214.Google Scholar
  20. 20.
    Rossello-Urgell J, Vaque-Rafart J, Armadans-Gil LL, Vaquero-Puerta JL, Elorza-Ricart JM, Quintas-Fernandez JC, et al. The importance of the day of the week and duration of data collection in prevalence surveys of nosocomial infections. J Hosp Infect. 2004;57(2):132–8.CrossRefGoogle Scholar
  21. 21.
    Elixhauser A, Steiner CA, Whittington C, et al. Clinical classifications for health policy research: hospital inpatient statistics, 1995. Healthcare Cost and Utilization Project, HCUP 3 Research Note. Rockville, MD: Agency for Health Care Policy and Research; 1998. AHCPR Pub. No. 98-0049.Google Scholar
  22. 22.
    Elixhauser A, Steiner CA. Hospital inpatient statistics, 1996. Healthcare Cost and Utilization Project (HCUP) Research Note. Rockville, MD: Agency for Health Care Policy and Research; 1999. AHCPR Pub. No. 99-0034.Google Scholar
  23. 23.
    Cowen ME, Dusseau DJ, Toth BG, Guisinger C, Zodet MW, Shyr Y. Casemix adjustment of managed care claims data using the clinical classification for health policy research method. Med Care. 1998:1108–13.Google Scholar
  24. 24.
    Elixhauser A, McCarthy E. Clinical classifications for health policy research, version 2: hospital inpatient statistics. Rockville: US Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research; 1996.Google Scholar
  25. 25.
    Duffy S, Elixhauser A, Sommers JP. Diagnosis and procedure combinations in hospital inpatient data. Rockville: US Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research; 1996.Google Scholar
  26. 26.
    Fruchterman TM, Reingold EM. Graph drawing by force-directed placement. Softw Pract Exper. 1991;21(11):1129–64.CrossRefGoogle Scholar
  27. 27.
    Lamisil (terbinafine hydrochloride) [package insert]. New Jersey: Novartis Pharmaceuticals Corporation; 2011.Google Scholar
  28. 28.
    Sporanox (itraconazole) [package insert]. Beerse: Janssen Pharmaceuticals, Inc.; 2003.Google Scholar
  29. 29.
    Kojic EM, Darouiche RO. Candida infections of medical devices. Clin Microbiol Rev. 2004;17(2):255–67.CrossRefGoogle Scholar
  30. 30.
    Ferrara JL, Levine JE, Reddy P, Holler E. Graft-versus-host disease. Lancet. 2009;373(9674):1550–61.CrossRefGoogle Scholar
  31. 31.
    Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hippisley-Cox J, et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21(6):611–21.CrossRefGoogle Scholar
  32. 32.
    Trifirò G, Pariente A, Coloma PM, Kors JA, Polimeni G, Miremont-Salamé G, et al. Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? Pharmacoepidemiol Drug Saf. 2009;18(12):1176–84.CrossRefGoogle Scholar
  33. 33.
    Yang MS, Lee JY, Kim J, Kim GW, Kim BK, Kim JY, et al. Incidence of Stevens-Johnson syndrome and toxic epidermal necrolysis: a nationwide population-based study using national health insurance database in Korea. PLoS One. 2016;11(11):e0165933.CrossRefGoogle Scholar
  34. 34.
    Perlroth J, Choi B, Spellberg B. Nosocomial fungal infections: epidemiology, diagnosis, and treatment. Med Mycol. 2007;45(4):321–46.CrossRefGoogle Scholar
  35. 35.
    Bassetti M, Taramasso L, Nicco E, Molinari MP, Mussap M, Viscoli C. Epidemiology, species distribution, antifungal susceptibility and outcome of nosocomial candidemia in a tertiary care hospital in Italy. PLoS One. 2011;6(9):e24198.CrossRefGoogle Scholar
  36. 36.
    Lund JL, Richardson DB, Sturmer T. The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application. Curr Epidemiol Rep. 2015;2(4):221–8.CrossRefGoogle Scholar
  37. 37.
    Johnson ES, Bartman BA, Briesacher BA, Fleming NS, Gerhard T, Kornegay CJ, et al. The incident user design in comparative effectiveness research. Pharmacoepidemiol Drug Saf. 2013;22(1):1–6.CrossRefGoogle Scholar
  38. 38.
    Zhong W, Maradit-Kremers H, St Sauver JL, Yawn BP, Ebbert JO, Roger VL, et al. Age and sex patterns of drug prescribing in a defined American population. Mayo Clinic Proc. 2013;88(7):697–707.CrossRefGoogle Scholar
  39. 39.
    ICD-9-CM: International classification of diseases, 9th revision, clinical modification. Salt Lake City: Medicode; 1996.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Worldwide Safety and Regulatory, Pfizer Inc.New YorkUSA
  2. 2.City University of New York Graduate School of Public Health and Health PolicyNew YorkUSA

Personalised recommendations