A Comparison Study of Algorithms to Detect Drug–Adverse Event Associations: Frequentist, Bayesian, and Machine-Learning Approaches
It is important to monitor the safety profile of drugs, and mining for strong associations between drugs and adverse events is an effective and inexpensive method of post-marketing safety surveillance.
The objective of our work was to compare the accuracy of both common and innovative methods of data mining for pharmacovigilance purposes.
We used the reference standard provided by the Observational Medical Outcomes Partnership, which contains 398 drug–adverse event pairs (165 positive controls, 233 negative controls). Ten methods and algorithms were applied to the US FDA Adverse Event Reporting System data to investigate the 398 pairs. The ten methods include popular methods in the pharmacovigilance literature, newly developed pharmacovigilance methods as at 2018, and popular methods in the genome-wide association study literature. We compared their performance using the receiver operating characteristic (ROC) plot, area under the curve (AUC), and Youden’s index.
The Bayesian confidence propagation neural network had the highest AUC overall. Monte Carlo expectation maximization, a method developed in 2018, had the second highest AUC and the highest Youden’s index, and performed very well in terms of high specificity. The regression-adjusted gamma Poisson shrinkage model performed best under high-sensitivity requirements.
Our results will be useful to help choose a method for a given desired level of specificity. Methods popular in the genome-wide association study literature did not perform well because of the sparsity of data and will need modification before their properties can be used in the drug–adverse event association problem.
Compliance with Ethical Standards
This project was funded by the Florida Department of Health Ed and Ethel Moore Alzheimer’s Disease Research Program (Grant number 7AZ23), and the University of South Florida Proposal Enhancement Grant to Feng Cheng. These grants enabled all the FAERS data submissions to be combined and stored in a local database.
Conflicts of interest
Minh Pham, Feng Cheng, and Kandethody Ramachandran have no conflicts of interest that are directly relevant to the content of this study.
- 4.Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM sigmod record, ACM.Google Scholar
- 9.DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat. 1999;53(3):177–90.Google Scholar
- 10.DuMouchel W, Pregibon D. Empirical Bayes screening for multi-item associations. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM. 2001.Google Scholar
- 13.DuMouchel W, Harpaz R. Regression-adjusted GPS algorithm (RGPS). ORACLE Health Sciences; 2012.Google Scholar
- 16.Qi Y, Klein-Seetharaman J, Bar-Joseph Z. Random forest similarity for protein-protein interaction prediction from multiple sources. In: Altman RB, editor. Biocomputing. Singapore: World Scientific; 2005. pp. 531–42.Google Scholar
- 22.Pham MH. Signal detection of adverse drug reaction using the adverse event reporting system: literature review and novel methods. Tampa: University of South Florida; 2018.Google Scholar
- 25.Hand DJ, Yu K. Idiot’s Bayes—not so stupid after all? Int Stat Rev. 2001;69(3):385–98.Google Scholar