Drug Safety

, Volume 28, Issue 10, pp 835–842 | Cite as

Data Mining in Pharmacovigilance

The Need for a Balanced Perspective
  • Manfred Hauben
  • Vaishali Patadia
  • Charles Gerrits
  • Louisa Walsh
  • Lester Reich
Current Opinion


Data mining is receiving considerable attention as a tool for pharmacovigilance and is generating many perspectives on its uses. This paper presents four concepts that have appeared in various professional venues and represent potential sources of misunderstanding and/or entail extended discussions: (i) data mining algorithms are unvalidated; (ii) data mining algorithms allow data miners to objectively screen spontaneous report data; (iii) mathematically more complex Bayesian algorithms are superior to frequentist algorithms; and (iv) data mining algorithms are not just for hypothesis generation. Key points for a balanced perspective are that: (i) validation exercises have been done but lack a gold standard for comparison and are complicated by numerous nuances and pitfalls in the deployment of data mining algorithms. Their performance is likely to be highly situation dependent; (ii) the subjective nature of data mining is often underappreciated; (iii) simpler data mining models can be supplemented with ‘clinical shrinkage’, preserving sensitivity; and (iv) applications of data mining beyond hypothesis generation are risky, given the limitations of the data. These extended applications tend to ‘creep’, not pounce, into the public domain, leading to potential overconfidence in their results. Most importantly, in the enthusiasm generated by the promise of data mining tools, users must keep in mind the limitations of the data and the importance of clinical judgment and context, regardless of statistical arithmetic. In conclusion, we agree that contemporary data mining algorithms are promising additions to the pharmacovigilance toolkit, but the level of verification required should be commensurate with the nature and extent of the claimed applications.


Data Mining Data Mining Algorithm Spontaneous Reporting System Proportional Reporting Ratio Validation Exercise 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



No sources of funding were used to assist in the preparation of this review. The authors have no conflicts of interest that are directly relevant to the content of this review.


  1. 1.
    Hand DJ, Blunt G, Kelly M, et al. Data mining for fun and profit. Stat Sci 2000; 15: 111–31CrossRefGoogle Scholar
  2. 2.
    Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf 2001; 10(6): 483–6PubMedCrossRefGoogle Scholar
  3. 3.
    Van Puijenbroek E, Diemont W, van Grootheest K. Application of quantitative signal detection in the Dutch spontaneous reporting system for adverse drug reactions. Drug Saf 2003; 26(5): 293–301PubMedCrossRefGoogle Scholar
  4. 4.
    Szarfman A, Machado SG, O’Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-thanexpected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf 2002; 25(6): 381–92Google Scholar
  5. 5.
    Bate A, Lindquist M, Edwards IR, et al. A Bayesian neural network method for adverse drug reaction signal generation. Eur J Clin Pharmacol 1998; 54(4): 315–21PubMedCrossRefGoogle Scholar
  6. 6.
    Bate A, Lindquist M, Orre R, et al. Data-mining analyses of pharmacovigilance signals in relation to relevant comparison drugs. Eur J Clin Pharmacol 2002; 58(7): 483–90PubMedCrossRefGoogle Scholar
  7. 7.
    Wilson AM, Thabane L, Holbrook A. Application of data mining techniques in pharmacovigilance. Br J Clin Pharmacol 2004; 57(2): 127–34PubMedCrossRefGoogle Scholar
  8. 8.
    DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am Stat 1999; 53(3): 170–90Google Scholar
  9. 9.
    Hauben M. A brief primer on automated signal detection. Ann Pharmacother 2003; 37(7-8): 1117–23PubMedCrossRefGoogle Scholar
  10. 10.
    Wang C. Amoeba regression and time series models. In: Sense and nonsense of statistical inference: controversy misuse and subtlety. New York: Marcel Dekker, 1993: 72–97Google Scholar
  11. 11.
    Lilienfeld DE. A challenge to the data miners. Pharmacoepidemiol Drug Saf 2004; 13(12): 881–4PubMedCrossRefGoogle Scholar
  12. 12.
    Hauben M, Zhou X. Quantitative methods in pharmacovigilance: focus on signal detection. Drug Saf 2003; 26(3): 159–86PubMedCrossRefGoogle Scholar
  13. 13.
    Kiyoshi K, Daisuke K, Toshiki H. Comparison of data mining methodologies using Japanese spontaneous reports. Pharmacoepidemiol Drug Saf 2004; 13(6): 387–94CrossRefGoogle Scholar
  14. 14.
    Lindquist M, Stahl M, Bate A, et al. A retrospective evaluation of a data mining approach to aid finding new adverse drug reaction signals in the WHO international database. Drug Saf 2000; 23(6): 533–42PubMedCrossRefGoogle Scholar
  15. 15.
    Van Puijenbroek EP, Bate A, Leufkens HG, et al. A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Drug Saf 2002; 11(1): 3–10Google Scholar
  16. 16.
    Emmanuael R, Tubert-Bitter P, Thiessard F. Evaluation of data mining methods in pharmacovigilance using simulated datasets. Poster presentation at 20th ICPE conference; Bordeaux, France 2004Google Scholar
  17. 17.
    Follmann M, Michel A. Proportional reporting rations for signal detection in the drug safety database of a pharmaceutical company. Poster presentation at 19th ICPE conference; Philadelphia, USA 2003Google Scholar
  18. 18.
    Follmann M, Michel A, Geyer C. Comparison of different methods for signal detection in the drug safety database of a pharmaceutical company. Poster presentation at 20th ICPE conference; Bordeaux, France 2004Google Scholar
  19. 19.
    Yukari K, Eri K, Moriko K. The impact of grouping drugs by ATC codes on detecting a signal from Japanese spontaneous reports. Poster presentation at ICPE Conference; Bordeaux, France 2004Google Scholar
  20. 20.
    Hauben M, Walsh L, Reich L. Predictive value of a computerized signal detection algorithm (MGPs) when applied to FDA AERS database [abstract]. Pharmacoepidemiol Drug Saf 2005; 14(S1-S218): S17 (no.35)Google Scholar
  21. 21.
    Hauben M. Application of an empiric Bayesian data mining algorithm to reports of pancreatitis associated with atypical antipsychotics. Pharmacotherapy 2004; 24(9): 1122–9PubMedCrossRefGoogle Scholar
  22. 22.
    Trontell A. Expecting the unexpected: drug safety, pharmacovigilance, and the prepared mind. N Engl J Med 2004; 351(14): 1385–7PubMedCrossRefGoogle Scholar
  23. 23.
    Wang C. Objectivity, subjectivity, and probability. In: Sense and nonsense of statistical inference: controversy misuse and subtlety. New York: Marcel Dekker, 1993: 137–184Google Scholar
  24. 24.
    Hand DJ. Presentation at joint workshop on statistical data mining; 2003; Eindhoven, The Netherlands, 2003Google Scholar
  25. 25.
    Hauben M, Reich L. Safety related drug-labelling changes: findings from two data mining algorithms. Drug Saf 2004; 27(10): 735–44PubMedCrossRefGoogle Scholar
  26. 26.
    Hauben M, Reich L. Drug-induced pancreatitis: lessons in data mining. Br J Clin Pharmacol 2004; 58(5): 560–2PubMedCrossRefGoogle Scholar
  27. 27.
    Moseley J, Heeley E, Ekins-Daukes S, et al. Preliminary comparison of 2 signal detection methodologies in the UK regulatory authority spontaneous ADR database. Drug Saf 2004; 27(12): 950–1Google Scholar
  28. 28.
    Hauben M, Reich L, Gerrits C. Comparative performance of proportional reporting ratios (PRR) and multi-item Gamma Poisson shrinker (MGPS) for the identification of crystalluria and urinary tract calculi caused by drugs [abstract]. Pharmacoepidemiol Drug Saf 2005; 14(S1-S218): S7Google Scholar
  29. 29.
    Strom BL. Evaluation of suspected adverse drug reactions. JAMA 2005; 293(11): 1324–5CrossRefGoogle Scholar
  30. 30.
    Almenoff JS, DuMouchel W, Kindman LA, et al. Disproportionality analysis using empirical Bayes data mining: a tool for the evaluation of drug interactions in the post-marketing setting. Pharmacoepidemiol Drug Saf 2003; 12(6): 517–21PubMedCrossRefGoogle Scholar
  31. 31.
    PhRMA. Request for proposal: validity and value of data mining methods as an adjunct to traditional methods for detecting safety signals from spontaneous reporting databases [online]. Available from URL: [Accessed 2005 Feb 5]

Copyright information

© Adis Data Information BV 2005

Authors and Affiliations

  • Manfred Hauben
    • 1
    • 2
    • 3
  • Vaishali Patadia
    • 4
  • Charles Gerrits
    • 5
  • Louisa Walsh
    • 6
  • Lester Reich
    • 1
  1. 1.Risk Management StrategyPfizer IncNew YorkUSA
  2. 2.Department of MedicineNew York University School of MedicineNew YorkUSA
  3. 3.Departments of Pharmacology and Community and Preventive MedicineValhallaUSA
  4. 4.Global Drug SafetyAmylin PharmaceuticalsSan DiegoUSA
  5. 5.Department of Pharmacoepidemiology and Outcomes ResearchTakeda Global Research and DevelopmentLincolnshireUSA
  6. 6.Clinical Drug SafetyAstraZeneca LPWilmingtonUSA

Personalised recommendations