Abstract
Data mining is receiving considerable attention as a tool for pharmacovigilance and is generating many perspectives on its uses. This paper presents four concepts that have appeared in various professional venues and represent potential sources of misunderstanding and/or entail extended discussions: (i) data mining algorithms are unvalidated; (ii) data mining algorithms allow data miners to objectively screen spontaneous report data; (iii) mathematically more complex Bayesian algorithms are superior to frequentist algorithms; and (iv) data mining algorithms are not just for hypothesis generation. Key points for a balanced perspective are that: (i) validation exercises have been done but lack a gold standard for comparison and are complicated by numerous nuances and pitfalls in the deployment of data mining algorithms. Their performance is likely to be highly situation dependent; (ii) the subjective nature of data mining is often underappreciated; (iii) simpler data mining models can be supplemented with ‘clinical shrinkage’, preserving sensitivity; and (iv) applications of data mining beyond hypothesis generation are risky, given the limitations of the data. These extended applications tend to ‘creep’, not pounce, into the public domain, leading to potential overconfidence in their results. Most importantly, in the enthusiasm generated by the promise of data mining tools, users must keep in mind the limitations of the data and the importance of clinical judgment and context, regardless of statistical arithmetic. In conclusion, we agree that contemporary data mining algorithms are promising additions to the pharmacovigilance toolkit, but the level of verification required should be commensurate with the nature and extent of the claimed applications.
Similar content being viewed by others
References
Hand DJ, Blunt G, Kelly M, et al. Data mining for fun and profit. Stat Sci 2000; 15: 111–31
Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf 2001; 10(6): 483–6
Van Puijenbroek E, Diemont W, van Grootheest K. Application of quantitative signal detection in the Dutch spontaneous reporting system for adverse drug reactions. Drug Saf 2003; 26(5): 293–301
Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-thanexpected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf 2002; 25(6): 381–92
Bate A, Lindquist M, Edwards IR, et al. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol 1998; 54(4): 315–21
Bate A, Lindquist M, Orre R, et al. Data-mining analyses of pharmacovigilance signals in relation to relevant comparison drugs. Eur J Clin Pharmacol 2002; 58(7): 483–90
Wilson AM, Thabane L, Holbrook A. Application of data mining techniques in pharmacovigilance. Br J Clin Pharmacol 2004; 57(2): 127–34
DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat 1999; 53(3): 170–90
Hauben M. A brief primer on automated signal detection. Ann Pharmacother 2003; 37(7-8): 1117–23
Wang C. Amoeba regression and time series models. In: Sense and nonsense of statistical inference: controversy misuse and subtlety. New York: Marcel Dekker, 1993: 72–97
Lilienfeld DE. A challenge to the data miners. Pharmacoepidemiol Drug Saf 2004; 13(12): 881–4
Hauben M, Zhou X. Quantitative methods in pharmacovigilance: focus on signal detection. Drug Saf 2003; 26(3): 159–86
Kiyoshi K, Daisuke K, Toshiki H. Comparison of data mining methodologies using Japanese spontaneous reports. Pharmacoepidemiol Drug Saf 2004; 13(6): 387–94
Lindquist M, Stahl M, Bate A, et al. A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO international database. Drug Saf 2000; 23(6): 533–42
Van Puijenbroek EP, Bate A, Leufkens HG, et al. A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Drug Saf 2002; 11(1): 3–10
Emmanuael R, Tubert-Bitter P, Thiessard F. Evaluation of data mining methods in pharmacovigilance using simulated datasets. Poster presentation at 20th ICPE conference; Bordeaux, France 2004
Follmann M, Michel A. Proportional reporting rations for signal detection in the drug safety database of a pharmaceutical company. Poster presentation at 19th ICPE conference; Philadelphia, USA 2003
Follmann M, Michel A, Geyer C. Comparison of different methods for signal detection in the drug safety database of a pharmaceutical company. Poster presentation at 20th ICPE conference; Bordeaux, France 2004
Yukari K, Eri K, Moriko K. The impact of grouping drugs by ATC codes on detecting a signal from Japanese spontaneous reports. Poster presentation at ICPE Conference; Bordeaux, France 2004
Hauben M, Walsh L, Reich L. Predictive value of a computerized signal detection algorithm (MGPs) when applied to FDA AERS database [abstract]. Pharmacoepidemiol Drug Saf 2005; 14(S1-S218): S17 (no.35)
Hauben M. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics. Pharmacotherapy 2004; 24(9): 1122–9
Trontell A. Expecting the unexpected: drug safety, pharmacovigilance, and the prepared mind. N Engl J Med 2004; 351(14): 1385–7
Wang C. Objectivity, subjectivity, and probability. In: Sense and nonsense of statistical inference: controversy misuse and subtlety. New York: Marcel Dekker, 1993: 137–184
Hand DJ. Presentation at joint workshop on statistical data mining; 2003; Eindhoven, The Netherlands, 2003
Hauben M, Reich L. Safety related drug-labelling changes: findings from two data mining algorithms. Drug Saf 2004; 27(10): 735–44
Hauben M, Reich L. Drug-induced pancreatitis: lessons in data mining. Br J Clin Pharmacol 2004; 58(5): 560–2
Moseley J, Heeley E, Ekins-Daukes S, et al. Preliminary comparison of 2 signal detection methodologies in the UK regulatory authority spontaneous ADR database. Drug Saf 2004; 27(12): 950–1
Hauben M, Reich L, Gerrits C. Comparative performance of proportional reporting ratios (PRR) and multi-item Gamma Poisson shrinker (MGPS) for the identification of crystalluria and urinary tract calculi caused by drugs [abstract]. Pharmacoepidemiol Drug Saf 2005; 14(S1-S218): S7
Strom BL. Evaluation of suspected adverse drug reactions. JAMA 2005; 293(11): 1324–5
Almenoff JS, DuMouchel W, Kindman LA, et al. Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol Drug Saf 2003; 12(6): 517–21
PhRMA. Request for proposal: validity and value of data mining methods as an adjunct to traditional methods for detecting safety signals from spontaneous reporting databases [online]. Available from URL: http://www.phrma.org/publications/publications7Data_Mining_RFP.pdf [Accessed 2005 Feb 5]
Acknowledgements
No sources of funding were used to assist in the preparation of this review. The authors have no conflicts of interest that are directly relevant to the content of this review.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hauben, M., Patadia, V., Gerrits, C. et al. Data Mining in Pharmacovigilance. Drug-Safety 28, 835–842 (2005). https://doi.org/10.2165/00002018-200528100-00001
Published:
Issue Date:
DOI: https://doi.org/10.2165/00002018-200528100-00001