Drug Safety

pp 1–8 | Cite as

A Comparison Study of Algorithms to Detect Drug–Adverse Event Associations: Frequentist, Bayesian, and Machine-Learning Approaches

  • Minh Pham
  • Feng ChengEmail author
  • Kandethody RamachandranEmail author
Original Research Article



It is important to monitor the safety profile of drugs, and mining for strong associations between drugs and adverse events is an effective and inexpensive method of post-marketing safety surveillance.


The objective of our work was to compare the accuracy of both common and innovative methods of data mining for pharmacovigilance purposes.


We used the reference standard provided by the Observational Medical Outcomes Partnership, which contains 398 drug–adverse event pairs (165 positive controls, 233 negative controls). Ten methods and algorithms were applied to the US FDA Adverse Event Reporting System data to investigate the 398 pairs. The ten methods include popular methods in the pharmacovigilance literature, newly developed pharmacovigilance methods as at 2018, and popular methods in the genome-wide association study literature. We compared their performance using the receiver operating characteristic (ROC) plot, area under the curve (AUC), and Youden’s index.


The Bayesian confidence propagation neural network had the highest AUC overall. Monte Carlo expectation maximization, a method developed in 2018, had the second highest AUC and the highest Youden’s index, and performed very well in terms of high specificity. The regression-adjusted gamma Poisson shrinkage model performed best under high-sensitivity requirements.


Our results will be useful to help choose a method for a given desired level of specificity. Methods popular in the genome-wide association study literature did not perform well because of the sparsity of data and will need modification before their properties can be used in the drug–adverse event association problem.


Compliance with Ethical Standards


This project was funded by the Florida Department of Health Ed and Ethel Moore Alzheimer’s Disease Research Program (Grant number 7AZ23), and the University of South Florida Proposal Enhancement Grant to Feng Cheng. These grants enabled all the FAERS data submissions to be combined and stored in a local database.

Conflicts of interest

Minh Pham, Feng Cheng, and Kandethody Ramachandran have no conflicts of interest that are directly relevant to the content of this study.


  1. 1.
    Lawson DH. Pharmacovigilance in the 1990s. Br J Clin Pharmacol. 1997;44(2):109–10.CrossRefGoogle Scholar
  2. 2.
    VigiBase. The WHO global ICSR Database system: basic facts. Drug Inf J. 2008;42(5):409–19.CrossRefGoogle Scholar
  3. 3.
    Szarfman A, Machado SG, O’neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Safety. 2002;25(6):381–92.CrossRefGoogle Scholar
  4. 4.
    Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM sigmod record, ACM.Google Scholar
  5. 5.
    Silverstein C, Brin S, Motwani R. Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Discov. 1998;2(1):39–68.CrossRefGoogle Scholar
  6. 6.
    Evans S, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. 2001;10(6):483–6.CrossRefGoogle Scholar
  7. 7.
    Rothman KJ, Lanes S, Sacks ST. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol Drug Saf. 2004;13(8):519–23.CrossRefGoogle Scholar
  8. 8.
    Waller P, et al. The reporting odds ratio versus the proportional reporting ratio: ‘deuce’. Pharmacoepidemiol Drug Saf. 2004;13(8):525–6.CrossRefGoogle Scholar
  9. 9.
    DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat. 1999;53(3):177–90.Google Scholar
  10. 10.
    DuMouchel W, Pregibon D. Empirical Bayes screening for multi-item associations. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM. 2001.Google Scholar
  11. 11.
    Bate A, et al. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol. 1998;54(4):315–21.CrossRefGoogle Scholar
  12. 12.
    DuMouchel W, et al. Antipsychotics, glycemic disorders, and life-threatening diabetic events: a Bayesian data-mining analysis of the FDA adverse event reporting system (1968–2004). Ann Clin Psychiatry. 2008;20(1):21–31.CrossRefGoogle Scholar
  13. 13.
    DuMouchel W, Harpaz R. Regression-adjusted GPS algorithm (RGPS). ORACLE Health Sciences; 2012.Google Scholar
  14. 14.
    Xiao C, et al. An MCEM framework for drug safety signal detection and combination from heterogeneous real world evidence. Sci Rep. 2018;8(1):1806.CrossRefGoogle Scholar
  15. 15.
    Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics. 2012;99(6):323–9.CrossRefGoogle Scholar
  16. 16.
    Qi Y, Klein-Seetharaman J, Bar-Joseph Z. Random forest similarity for protein-protein interaction prediction from multiple sources. In: Altman RB, editor. Biocomputing. Singapore: World Scientific; 2005. pp. 531–42.Google Scholar
  17. 17.
    Li J, et al. Detecting gene-gene interactions using a permutation-based random forest method. BioData Min. 2016;9(1):14.CrossRefGoogle Scholar
  18. 18.
    Ruczinski I, Kooperberg C, LeBlanc ML. Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications. J Multivar Anal. 2004;90(1):178–95.CrossRefGoogle Scholar
  19. 19.
    Kooperberg C, Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol. 2005;28(2):157–70.CrossRefGoogle Scholar
  20. 20.
    Witte JS, Fijal BA. Introduction: analysis of sequence data and population structure. Genet Epidemiol. 2001;21(S1):S600–1.CrossRefGoogle Scholar
  21. 21.
    Harpaz R, et al. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther. 2013;93(6):539–46.CrossRefGoogle Scholar
  22. 22.
    Pham MH. Signal detection of adverse drug reaction using the adverse event reporting system: literature review and novel methods. Tampa: University of South Florida; 2018.Google Scholar
  23. 23.
    Ryan PB, et al. Defining a reference set to support methodological research in drug safety. Drug Saf. 2013;36(1):33–47.CrossRefGoogle Scholar
  24. 24.
    Domingos P, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn. 1997;29(2–3):103–30.CrossRefGoogle Scholar
  25. 25.
    Hand DJ, Yu K. Idiot’s Bayes—not so stupid after all? Int Stat Rev. 2001;69(3):385–98.Google Scholar
  26. 26.
    Bermejo P, Gámez JA, Puerta JM. Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowl-Based Syst. 2014;55:140–7.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of South FloridaTampaUSA
  2. 2.Department of Pharmaceutical Sciences, College of PharmacyUniversity of South FloridaTampaUSA

Personalised recommendations