Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data

Al-Azzawi, Fahed; Mahmoud, Israa; Haguinet, François; Bate, Andrew; Sessa, Maurizio

doi:10.1007/s40264-023-01317-0

Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data

Original Research Article
Open access
Published: 10 June 2023

Volume 46, pages 743–751, (2023)
Cite this article

Download PDF

You have full access to this open access article

Drug Safety Aims and scope Submit manuscript

Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data

Download PDF

Fahed Al-Azzawi¹,
Israa Mahmoud¹,
François Haguinet²,
Andrew Bate^3,4,5 &
…
Maurizio Sessa¹

2420 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Introduction

Time- and resource-demanding activities related to processing individual case safety reports (ICSRs) include manual procedures to evaluate individual causality with the final goal of dismissing false-positive safety signals. Eminent experts and a representative from pharmaceutical industries and regulatory agencies have highlighted the need to automatize time- and resource-demanding procedures in signal detection and validation. However, to date there is a sparse availability of automatized tools for such purposes.

Objectives

ICSRs recorded in spontaneous reporting databases have been and continue to be the cornerstone and the most important data source in signal detection. Despite the richness of this data source, the incessantly increased amount of ICSRs recorded in spontaneous reporting databases has generated problems in signal detection and validation due to the increase in resources and time needed to process cases. This study aimed to develop a new artificial intelligence (AI)-based framework to automate resource- and time-consuming steps of signal detection and signal validation, such as (1) the selection of control groups in disproportionality analyses and (2) the identification of co-reported drugs serving as alternative causes, to look to dismiss false-positive disproportionality signals and therefore reduce the burden of case-by-case validation.

Methods

The Summary of Product Characteristics (SmPC) and the Anatomical Therapeutic Chemical (ATC) classification system were used to automatically identify control groups within and outside the chemical subgroup of the proof-of-concept drug under investigation, galcanezumab. Machine learning, specifically conditional inference trees, has been used to identify alternative causes in disproportionality signals.

Results

By using conditional inference trees, the framework was able to dismiss 20.00% of erenumab, 14.29% of topiramate, and 13.33% of amitriptyline disproportionality signals on the basis of purely alternative causes identified in cases. Furthermore, of the disproportionality signals that could not be dismissed purely on the basis of the alternative causes identified, we estimated a 15.32%, 25.39%, and 26.41% reduction in the number of galcanezumab cases to undergo manual validation in comparison with erenumab, topiramate, and amitriptyline, respectively.

Conclusion

AI could significantly ease some of the most time-consuming and labor-intensive steps of signal detection and validation. The AI-based approach showed promising results, however, future work is needed to validate the framework.

Evaluating performance of electronic healthcare records and spontaneous reporting data in drug safety signal detection

Article 09 December 2014

Towards Automating Adverse Event Review: A Prediction Model for Case Report Utility

Article 07 January 2020

A Method to Combine Signals from Spontaneous Reporting Systems and Observational Healthcare Data to Detect Adverse Drug Reactions

Article Open access 08 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Points

We developed a new framework on the basis of both AI and non-AI approaches to ease some of the most time-consuming and labor-intensive steps of signal detection and validation, such as the selection of controls and the identification of co-reported drugs that may serve as alternative causes for the reported adverse events in signals of disproportional reporting.
Future work is needed to further assess the extent of impact of our framework for other drugs and spontaneous reporting databases when performing routine signal detection and validation activities.

1 Introduction

Individual case safety reports (ICSRs) recorded in spontaneous reporting databases have been and continue to be the cornerstone and the most important data source in signal detection. In recent years, regulatory agencies could identify more than 80% of potential safety signals leading to regulatory action using ICSRs [1]. Despite the richness of this data source, the incessantly increased amount of ICSRs recorded in spontaneous reporting databases has generated problems in signal detection and validation due to the increase in resources and time needed to process cases. The number of ICSRs recorded in the US Food and Drug Administration Adverse Event Reporting System (FAERS) in the last 10 years (2011–2021) has increased threefold, and a similar trend is seen in Europe (i.e., EudraVigilance) [2].

Time- and resource-demanding activities related to processing ICSRs include manual procedures to evaluate individual causality with the final goal of dismissing false-positive safety signals. One of the most time- and resource-demanding procedures for dismissing safety signals is the identification of alternative causes for the reported adverse events (AEs) in ICSRs after disproportionality signals have been identified. This case assessment/signal validation procedure is currently normally performed manually by evaluating variables such as the co-reported drugs in ICSRs [3]. Co-reported drugs are screened to identify identifying alternative potential causes for the newly identified drug–event pair. Additionally, co-reported drugs are screened to evaluate whether the disease for which these drugs are used (i.e., the therapeutic indication) may itself represent an alternative cause for the AEs. Naturally, care must be taken, as listed drugs that were used to treat the AE may have been listed by the reporter as well. For the latter, recent research has shown that the appropriate selection of controls in disproportionality analysis can mitigate the need for such activity, especially when drugs used for the same therapeutic indication are chosen as controls [4, 5]. Activities to identify alternative causes or appropriate controls in disproportionality analysis are time- and resource-demanding [3] when performed manually. However, these activities are crucial to dismiss false-positive signals, as shown in the 2020 annual report on EudraVigilance from the European Parliament, in which 80% of all potential safety signals in 2018–2020 ended up being not validated, or rather being false positives [2].

Eminent experts and a representative from pharmaceutical industries and regulatory agencies have highlighted the need to automatize time- and resource-demanding procedures in signal detection and validation [6]. However, to date there is a sparse availability of automatized tools for such purposes.

Therefore, this study aimed at developing automatized procedures for the identification of alternative causes for reported AEs in disproportionality signals originating from spontaneous reporting databases, with the final goal of dismissing false-positive disproportionality signals. For this purpose, FAERS was used as a data source and galcanezumab as a proof-of-concept drug, as this is a drug recently approved for which no studies have been conducted using spontaneous reporting databases.

2 Methods

2.1 Automatic Selection of Controls

In this article, we present a dual approach combining the Anatomical Therapeutic Chemical (ATC) classification system code and the approved therapeutic indication in the US Summary of Product Characteristics (SmPC) of galcanezumab for automatizing the selection of controls for disproportionality analysis when using FAERs. The ATC code is a unique code assigned to all active ingredients and consists of five levels: (1) the anatomical main group, (2) the therapeutic subgroup, (3) the pharmacological subgroup, (4) the chemical subgroup, and (5) the chemical substance [7]. By using the 4th level of the ATC code, or rather the chemical subgroup, we identified all active ingredients with the same therapeutic target (i.e., CGRP antagonists) as galcanezumab. Additional controls were identified by selecting active ingredients outside the chemical subgroup of galcanezumab but with the same approved therapeutic indication to avoid masking due to drug class effect [8] and confounding by indication. Drug class effect masking is a phenomenon that can occur when only drugs within the same chemical subgroup are used as controls in disproportionality analysis [8]. Confounding by indication is a type of bias that can occur when drugs used for different therapeutic indications are compared in disproportionality analysis [9].

To identify controls with the same approved therapeutic indication but outside the chemical subgroup of galcanezumab, DrugBank was used. DrugBank is a comprehensive online database containing information on active pharmaceutical ingredients and their respective pharmacological targets. In DrugBank it is possible to obtain a list of all active ingredients approved for a certain disease by searching a structured field known as “associated condition.” An “associated condition” is a term found in the drug indication section of DrugBank and refers to any specific state or medical condition for which the drug is indicated. By searching this field in DrugBank it is possible to obtain a list of drugs with an approved therapeutic indication for the medical condition under investigation which, in the case of galcanezumab, is “migraine prophylaxis.” To test the reliability of our approach for selecting controls in disproportionality analysis, we computed the proportion of drug classes correctly retrieved among drug classes mentioned in two recently published articles [10, 11]: (number of drug classes retrieved with our approach / number of drug classes mentioned in the two recently published reviews) × 100%. For each active ingredient, drug classes were obtained from the 4th level of the ATC code.

2.2 Data Sources and Management

ICSRs in which galcanezumab and the selected control groups (see Sect. 2.1) were reported as the suspected drugs were extracted from FAERS. FAERS is the spontaneous reporting database maintained by the FDA. The goal of FAERS is to support the FDA’s post-marketing safety surveillance. The database consists of spontaneous reports of AEs and product quality complaints, and medication errors reported by health care professionals and consumers are collected. ICSRs recorded in FAERS use the structure provided by the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) E2B guideline to collect ICSRs. This structure is used in all spontaneous reporting databases from the countries participating in the World Health Organization’s Programme for International Drug Monitoring, which accounts for 99% of countries worldwide [12]. All available zipped ASCII quarterly data extract files from the fourth quartile of 2012 to the third quartile of 2021 from FAERS were downloaded from the OpenFDA website. A local database was set up in R (version 4.1.2, R Development Core Team) by using the primary identifier of each ICSR as the key to linkage based on the relational structure described by Kass-Hout and colleagues [13].

2.3 Statistical Analysis

2.3.1 Comparability of Controls

For descriptive purposes, the comparability of galcanezumab cases and the selected controls were examined by comparing their distribution of sex, age, weight (kg), reporter’s qualification, number of co-reported drugs, and the year of reporting, as these variables have been previously identified as potential sources of confounding in disproportionality analysis [14]. Additionally, as a measure of comparability, the density functions of the propensity score of being in a report having galcanezumab as the suspected drug of these variables was computed for assessing the overlap in density functions among galcanezumab cases and selected controls [15, 16]. The propensity score was calculated using a logistic regression model with the following covariates: sex, age, weight (kg), reporter’s qualification, number of co-reported drugs, and the year of reporting. For the calculation of the propensity score, we replaced the missing values with an index value (e.g., for age we assigned the value 9999) to perform the computation for all cases. The default imputed value for age is convenient for categorization but may be suboptimal when included as a continuous explanatory variable in a model. Therefore, a subsequent sensitivity analysis was performed for the calculation of the propensity score where instead of using 9999, we imputed with the mode age and added another variable for whether age is missing or not.

2.3.2 Dismissal of False-Positive Disproportionality Signals

In the pool of the potential controls (see Sect. 2.1), two drugs within the chemical subgroup of galcanezumab and two drugs outside the chemical subgroup with the same therapeutic indication of galcanezumab were randomly chosen for disproportionality analyses. The reporting odds ratio (ROR) was used as a disproportionality measure and the 95% confidence interval (CI) of all AEs recorded for galcanezumab and the selected controls in FAERS with a frequency of reporting greater or equal to 3 was calculated according to the recommendations described by the European Medicine Agency (EMA) [17]. Of note, in the context of this framework, the ROR is defined as the ratio of spontaneous reports for a given drug that report a specific AE divided by the corresponding ratio of the drug used as a control in disproportionality analysis. Heatmaps of RORs for disproportionality analysis were plotted. Disproportionality signals for which the lower limit of the 95% CI was greater than 1 were further analyzed by using conditional inference trees (R Package: partykit; Version: 1.2–16; function: ctree) [18] to identify alternative cause co-reported drugs.

The use of a conditional inference tree required data management, as information stored in FAERS was not immediately usable for such a purpose. In particular, all the information on co-reported drugs in the ICSRs were stored as binary variables. Before using the conditional inference trees, binary variables were filtered on the basis of variance, and variables having ≥ 95% identical values across cases and controls were removed. The idea of this filter was to get rid of variables that only consist of noise and therefore have very little variation [19]. Pre-filtering has been performed to reduce the size of the dataset and increase speed in computational power while reducing RAM resources needed to store in R the highly dimensional dataset. The pre-processing is automatized and there is no human time involved in this procedure. For each disproportionality signal, conditional inference trees were used to identify disproportionally co-reported drugs among those variables kept after filtering based on variance. The steps for fitting the conditional inference trees were the following:

1.
We first filtered all cases with the adverse event of interest;
2.
We generated a highly dimensional set of co-reported drugs for cases identified in step 1);
3.
We fitted the conditional inference tree using as the class a binary variable named outcome, which included information on the drugs for which we performed the head-to-head comparison and co-reported drugs as predictors. For example, if we compared galcanezumab versus erenumab, the outcome variable will have two levels with labels “galcanezumab” and “erenumab.” The main reason for doing so is that we wanted disproportionally co-reported drugs only for the adverse event under investigation;
4.
All the co-reported drugs with a p-value < 0.05 were considered disproportionally co-reported and underwent validation.

The SmPC of disproportionally co-reported drugs was screened to identify those drugs that listed in the SmPC the AE in disproportionality signal mimicking procedures performed during signal validation [3, 20,21,22,23,24,25]. The disproportionality analysis was conducted again by removing cases with co-reported drugs for which the AE under investigation was listed in the SmPC as these cases had alternative causes for the AE. Finally, we calculated the fraction of disproportionality signals dismissed by using this procedure. A disproportionality signal, that is, a statistical alert, was considered dismissed if there was no statistical significance, according to the threshold set by EMA, after dismissing cases with co-reported drugs for which the AE under investigation was listed in the SmPC of the co-reported drugs [17].

A flowchart showing the implementation of automatic selection of controls and dismissal of false positive signals using conditional inference tree is shown in Fig. 1.

2.4 The Rationale for Using Conditional Inference Trees

A conditional inference tree is a method to build up a prediction model via classifying characteristics of cases and controls (i.e., predictors) to the outcome (i.e., the AE). In a decision tree, the characteristics of the cases/controls and outcomes under investigation can be binary: true or false, yes or no.

A conditional inference tree consists of a root, branches, and leaves. The predictor that has a higher degree of association with the outcome represents the root node of the tree. To determine the strength of association of individual predictors and therefore determine the hierarchy of the conditional inference trees, impurity measures are used (e.g., Gini Impurity Index) and statistical tests are used to determine disproportionality (i.e., chi-squared test). Impurity measures assess how good a predictor is in classifying individuals as having/not having the outcome. Each time before adding a new set of predictors for the outcome, the impurity of the whole tree is computed. The smaller impurity the better the classification is. Where to draw the line to divide the numerical value depends on the impurity as well.

In this study, conditional inference trees were used to identify co-reported drugs that were disproportionally reported between cases and controls reporting the AE under investigation. The advantage of using decision trees in this setting relies on the fact that this technique, if compared with the current approach used in disproportionality analysis (i.e., stratification), can investigate preemptive non-linear relationships between predictors and the outcome. The rationale for using conditional inference trees is also epidemiological. If cases and controls have statistically significant differences in the proportion of a co-reported drug, the latter may represent a potential confounder for the investigated signals if we subset cases and controls with the AE under investigation.

2.5 Framework Validation Using Simulated Data

The objective of the simulation was to show some generalizability of the conditional inference trees for the identification of confounders among co-reported drugs in disproportionality signals. The method was able, in three different scenarios, to correctly identify the disproportionally co-reported drugs serving as confounders for the disproportionality signals.

Scenario 1: We have simulated 1000 cases of a hypothetical AE for galcanezumab (500 cases) and a control (500 cases) from the multivariate normal distribution of three Gaussians [26] as follows: age~N(40,1), weight~N(75,1), and number of co-reported drugs~N(2,1). The correlation between age and weight was −0.8, the correlation between age and the number of co-reported drugs was 0.7, and the correlation between weight and the number of co-reported drugs was −0.9. We have then generated three binary variables (i.e., co-reported drug 1, co-reported drug 2, and co-reported drug 3) with marginal probabilities of 0.25–0.75, 0.75–0.25, and 0.5–0.5 with a binary variable named outcome. The variables co-reported drug 1, co-reported drug 2, and co-reported drug 3 were generated by thresholding a normal distribution. The threshold was the median of the numerical variable generated with a normal distribution as shown in Electronic Supplementary Material (ESM) 1. The correlations of the components were specified as a correlation matrix of the binary distribution [27]. Co-reported drug 1 was correlated with co-reported drug 2 (i.e., 0.514) while co-reported drug 3 was not correlated with co-reported drug 1 and co-reported drug 2 (i.e., −0.020 and 0.048). The variable outcome was correlated with co-reported drug 1 (i.e., −0.628) and co-reported drug 2 (−0.370) but not with co-reported drug 3 (i.e., −0.026).

This setup accounts for a scenario where co-reported drug 1 and co-reported drug 2 are correlated with the outcome, as they are disproportionally co-reported between galcanezumab and the control cases having a hypothetical AE. Additionally, in this scenario, co-reported drugs 1 and 2 are correlated variables with different strengths of correlation with the outcome. It is expected that the framework identified in the decision tree co-reported drug 1, as it is the co-reported drug with the strongest correlation with the outcome.

Scenario 2: We used the same setting as in scenario 1 but we reduced the sample size to 50 cases. This setup accounts for a scenario in which there is a limited amount of cases to perform disproportionality analysis. It is expected that the framework identified in the decision tree co-reported drug 1, as it is the co-reported medication with the strongest correlation with the outcome.

Scenario 3: We used the same setting as in scenario 1 but we dropped co-reported drug 1 from the dataset and changed the marginal probabilities of co-reported drug 2 to obtain a 20% disproportionality between galcanezumab and the control’s cases. This setup accounts for a scenario in which there is a relatively small disproportionality between cases and controls for co-reported drug 2. It is expected that the framework identified in the decision tree co-reported drug 2, as it is the co-reported medication with the strongest correlation with the outcome.

In all scenarios, we performed a descriptive analysis by presenting frequencies and percentages of binary variables and plotting the kernel density distribution of age, weight, and number of co-reported drugs and correlation matrix of variables in the analytical datasets. Additionally, we modeled a conditional inference tree to identify co-reported drugs that were disproportionally reported between galcanezumab and the control reporting the AE under investigation.

3 Results

3.1 Automatic Selection of Controls and Comparability

Table 1 provides an overview of drugs that were identified as potential control groups for galcanezumab. Erenumab, fremanezumab, amitriptyline, and topiramate were finally used as control groups for disproportionality analysis. In total, 86% of drug classes among those eligible for disproportionality analysis were identified. Supplementary Figs. 1, 2, 3, and 4 and Supplementary Table 1 presents the comparability of erenumab, fremanezumab, topiramate, and amitriptyline cases, respectively, in comparison with galcanezumab cases for density distributions of age, sex, weight (kg), reporter’s qualification, number of co-reported drugs, the year of reporting, and propensity score. The results of the sensitivity analysis for the computation of propensity score did not show major differences from the main analysis (ESM 2).

Table 1. Potential control groups of galcanezumab identified within and outside the chemical subgroup

Full size table

3.2 Dismissal of False-Positive Disproportionality Signals

In total, 399 signals of disproportionate reporting were identified for galcanezumab when compared with fremanezumab (n = 22), erenumab (n = 111), topiramate (n = 130), and amitriptyline (n = 136). Supplementary Figs. 5, 6, 7, and 8, show heatmaps of RORs of all identified disproportionality signals when fremanezumab, erenumab, topiramate, and amitriptyline were used as controls, respectively.

On the basis of purely co-reporting, it was possible to dismiss 22 (20.00%), 19 (14.29%), and 18 (13.33%) of the disproportionality signals identified for galcanezumab in comparison with erenumab, topiramate, and amitriptyline, respectively. In all, 340 disproportionality signals were not dismissed. Among disproportionality signals that cannot be dismissed on the basis of purely co-reported drugs, we estimated a further 0%, 15.32%, 20.88%, and 23.35% reduction in the number of galcanezumab cases to undergo manual validation in comparison with fremanezumab, erenumab, topiramate, and amitriptyline, respectively, as these proportions of cases have co-reported drugs as alternative causes.

3.3 Framework Validation

Descriptive analysis of cases simulated in scenarios 1, 2, and 3 are presented in Supplementary Tables 2–4 and Supplementary Figs. 10–12. The results of the conditional inference trees are plotted in Supplementary Figs. 13–15. In all scenarios, the framework was able to correctly identify the co-reported drug simulated as an alternative cause for the disproportionality signal.

4 Discussion

4.1 Automatic Selection of Control Groups and Comparability

At the moment, neither the EMA nor the FDA have described any guidelines concerning the selection of controls in disproportionality analysis, however, it is well known and acknowledged that the control group influences the results of a disproportionality analysis [28, 29]. This is in part as disproportionality analyses are frequently conducted against a background of the rest of the database [30]. It has to also be admitted, however, that choice of an appropriate control can sometimes be nearly impossible in a given dataset and the subjectivity associated with selection of controls. Nevertheless, this approach seems promising in some circumstances, in particular when the automatic process proposed in this study for the selection of controls within and outside the chemical subgroup of galcanezumab showed an 86% success rate, as 6 of 7 chemical subgroups indicated for acute and/or preventative migraine were identified. The selection of controls was based on pharmacoepidemiological principles of trying to increase the similarity in case characteristics among compared drugs in disproportionality analysis by including controls within the same chemical subgroup and controls outside the chemical subgroup with the same therapeutic indication. To assess similarity among cases and controls, we compared the propensity score of being in a report having galcanezumab as the suspected drug built using covariates that were known to introduce confounding in disproportionality analysis [31]. As expected, the drugs within the same chemical subgroup (i.e., fremanezumab and erenumab) were the most comparable to the galcanezumab group as shown by higher overlapping between propensity score density functions. Topiramate and amitriptyline were not as comparable, as they had a smaller proportion of female participants, a fewer number of co-reported drugs, a difference in years of reporting, as topiramate and amitriptyline have been marketed for a longer period, and lastly a higher median age. Such differences may be explained by the use of topiramate and amitriptyline for other therapeutic indications (i.e., depression), which is known to affect older adults more than those receiving galcanezumab for migraine prophylaxis [32, 33]. An interesting observation that was made during this study was the correlation between the comparability of cases and the control groups and the number of disproportionality signals identified in the statistical analysis, as shown in Supplementary Table 1. In this regard, we hypothesized that the number of disproportionality signals is influenced by the comparability of the selected control group to the drug under investigation. Furthermore, we believe using the overlap of the propensity score to prioritize controls in disproportionality analysis will be problematic, as most drugs within the 4th level of the ATC code will be retained as controls in disproportionality analysis. Finally, we believe that other approaches may be more effective in computing propensity score, such as the use of co-reported drugs as individual covariates rather than the number of co-reported drugs and the restriction of the study period. However, further studies are needed to validate these hypotheses, and while theoretically this is the case, non-random reporting patterns may mean that this is not the case in practice.

4.2 Automatic Identification of Alternative Causes in Disproportionality Signals

Our AI-based approach introduced automation in the identification of confounders during signal validation by identifying co-reported drugs that were disproportionately co-reported among galcanezumab cases and control groups. The rationale for conducting this analysis relies on pharmacoepidemiological principles. To be an alternative cause in disproportionality analysis, a co-reported drug needs to be associated with the occurrence of the AE under investigation and needs to be disproportionally co-reported between cases and controls. Our analytical approach performs this evaluation by using a conditional inference tree to identify the disproportionality of co-reported drugs among galcanezumab and controls in ICSRs with the AE under investigation. Only in one analysis, or rather, when fremanezumab was used as the control group to galcanezumab, were we not able to dismiss any of the disproportionality signals using conditional inference trees. We believe that a plausible cause for this result is that fremanezumab was the control group with the highest comparability to galcanezumab (i.e., 68.49% overlap of density functions of the propensity scores).

4.3 Limitations

The necessary step of checking the presence of an AE in the SmPC of co-reported drugs is a major component of the aforementioned time-limiting activities in signal validation, and it is not addressed in the automatic process proposed in our framework. However, we anticipate that soon there will be ad hoc tools for this task. Of note, the commercial version of DrugBank already includes this data.

Other machine learning approaches other than conditional inference trees could have been tested to assess the disproportionality of co-reported drugs but, for this pilot project, we limited our choice to conditional inference trees on the basis of the reasons provided in Sect. 2.3.

Currently, we do not have a clear cutoff to define “good” or “bad” regarding the number of drug classes that should be correctly identified by our framework during the selection of viable controls for disproportionality analysis. Therefore, future work is needed to identify the optimal cutoff.

We have arbitrarily decided to use four controls in disproportionality analysis because we are aware that the number of disproportionality signals generated in the framework will proportionally increase with the number of controls chosen (Supplementary Fig. 9). However, future work is needed to determine the optimal number of controls in our framework. In this regard, a challenge will be whether the optimal number would vary by drug under consideration, but evaluation of this issue and developing a systematic framework is extensive and challenging. It should be emphasized that increasing the number of controls also increases the number of disproportionality signals. However, this is also true because the same threshold is used to define a disproportionality signal irrespective of the number of tests. In other words, the problem of multiplicity is ignored. We hypothesized that the increase in the number of alerts largely depends on which control is being progressively added. Hence, the result is not general and further studies are necessary to clarify this result.

Finally, it is important to highlight that 340 out of 399 disproportionality signals were not dismissed, and that still represents an important number of signals to validate.

5 Conclusion

We developed a new framework that includes AI and non-AI approaches to ease some of the most time-consuming and labor-intensive steps of signal detection and validation, such as the selection of controls and the identification of co-reported drugs that may serve as alternative causes for the reported AEs listed in disproportionality signals. Future work is needed to further assess the extent of use of our framework for other drugs and spontaneous reporting databases when performing routine signal detection and validation activities.

References

Medicines Agency E. Guideline on good pharmacovigilance practices (GVP) Module IX_Signal management (Rev 1) Date of coming into effect of first version. EMA Web Page. 2017. p. 1–12.
European Medicine Agency. 2020 Annual Report on EudraVigilance for the European Parliament, the Council and the Commission. 2020 [cited 2022 Feb 17]; EMA Web Page2. 2022. p. 1–100.
European Medicine Agency. Validation of signals from the review of individual cases. EMA Web Page. 2009. p. 1–10.
Montastruc J-L, Sommet A, Bagheri H, Lapeyre-Mestre M. Benefits and strengths of the disproportionality analysis for identification of adverse drug reactions in a pharmacovigilance database. Br J Clin Pharmacol. 2011;72:905–8.
Article CAS PubMed PubMed Central Google Scholar
Caster O, Aoki Y, Gattepaille LM, Grundmark B. Disproportionality analysis for pharmacovigilance signal detection in small databases or subsets: recommendations for limiting false-positive associations. Drug Saf. 2020;43:479–87.
Article PubMed PubMed Central Google Scholar
Bate A, Stegmann J-U. Safety of medicines and vaccines–building next generation capability. Trends Pharmacol Sci Elsevier. 2021;42:1051–63.
Article CAS Google Scholar
WHO. ATC classification. WHO website. 2022.
Maignen F, Hauben M, Hung E, Van Holle L, Dogne J-M. Assessing the extent and impact of the masking effect of disproportionality analyses on two spontaneous reporting systems databases. Pharmacoepidemiol Drug Saf England. 2014;23:195–207.
Article Google Scholar
Kyriacou DN, Lewis RJ. Confounding by indication in clinical research. JAMA US. 2016;316:1818–9.
Article Google Scholar
de Vries T, Villalón CM, MaassenVanDenBrink A. Pharmacological treatment of migraine: CGRP and 5-HT beyond the triptans. Pharmacol Ther. 2020;211: 107528.
Article PubMed Google Scholar
Mayans L, Walling A. Acute migraine headache: treatment strategies. Am Fam Physician. 2018;97:243–51.
PubMed Google Scholar
Lindquist M. VigiBase, the WHO global ICSR database system: basic facts. Drug Inf J. SAGE Publications Sage CA: Los Angeles, CA; 2008;42:409–19.
Kass-Hout TA, Xu Z, Mohebbi M, Nelsen H, Baker A, Levine J, et al. OpenFDA: an innovative platform providing access to a wealth of FDA’s publicly available data. J Am Med Informatics Assoc. 2016;23:596–600.
Article Google Scholar
Wisniewski AFZ, Bate A, Bousquet C, Brueckner A, Candore G, Juhlin K, et al. Good signal detection practices: evidence from IMI PROTECT. Drug Saf New Zealand. 2016;39:469–90.
Article Google Scholar
Haukoos JS, Lewis RJ. The propensity score. JAMA. 2015;314:1637–8.
Article CAS PubMed PubMed Central Google Scholar
Meredith M, Ridout M. Overlap: estimates of coefficient of overlapping for animal activity patterns. R Packag version 02. 2014;4.
Guideline on the use of statistical signal detection methods on the EudraVigilance data anlysis system. http://www.ema.europa.eu/docs/en_GB/document_library/Regulatory_and_procedural_guideline/2009/11/WC500011437.pdf.
Hothorn T, Zeileis A. partykit: A modular toolkit for recursive partytioning in R. J Mach Learn Res JMLR org. 2015;16:3905–9.
Google Scholar
Kuhn M, Johnson K. Applied predictive modeling. Berlin: Springer; 2013.
Book Google Scholar
Sessa M, Rafaniello C, Sportiello L, Mascolo A, Scavone C, Maccariello A, et al. Campania Region (Italy) spontaneous reporting system and preventability assessment through a case-by-case approach: a pilot study on psychotropic drugs. Expert Opin Drug Saf England. 2016;15:9–15.
Article Google Scholar
Sessa M, Sullo MG, Mascolo A, Cimmaruta D, Romano F, Puca RV, et al. A case of figurate urticaria by etanercept. J Pharmacol Pharmacother. 2016;2:106–8.
Article Google Scholar
Sessa M, Sportiello L, Mascolo A, Scavone C, Gallipoli S, di Mauro G, et al. Campania Preventability Assessment Committee (Italy): a focus on the preventability of non-steroidal anti-inflammatory drugs’ adverse drug reactions. Front Pharmacol. 2017;8:305.
Article PubMed PubMed Central Google Scholar
Sessa M, Mascolo A, Callréus T, Capuano A, Rossi F, Andersen M. Direct-acting oral anticoagulants (DOACs) in pregnancy: new insight from VigiBase(®). Sci Rep. 2019;9:7236.
Article PubMed PubMed Central Google Scholar
Sessa M, Rossi C, Mascolo A, Grassi E, Fiorentino S, Scavone C, et al. Suspected adverse reactions to contrast media in Campania Region (Italy): results from 14 years of post-marketing surveillance. Expert Opin Drug Saf England. 2015;14:1341–51.
Article Google Scholar
Sessa M, Rossi C, Rafaniello C, Mascolo A, Cimmaruta D, Scavone C, et al. Campania preventability assessment committee: a focus on the preventability of the contrast media adverse drug reactions. Expert Opin Drug Saf Engl. 2016;15:51–9.
Article Google Scholar
Tong YL, Tong YL. The bivariate normal distribution. Multivar Norm Distrib. Berlin: Springer; 1990. p. 6–22.
Google Scholar
Tian S, Qin D, Zhou Y. Genetic data generation method based on multivariate binary random variables. Proc 6th Int Conf Comput Sci Appl Eng. 2022. p. 1–5.
Sessa M, Andersen M. New insight on the safety of erenumab: an analysis of spontaneous reports of adverse events recorded in the US Food and Drug Administration Adverse Event Reporting System Database. BioDrugs. New Zealand; 2021;
Zeinoun Z, Seifert H, Verstraeten T. Quantitative signal detection for vaccines: effects of stratification, background and masking on GlaxoSmithKline’s spontaneous reports database. Hum Vaccin. 2009;5:599–607.
Article CAS PubMed Google Scholar
Bate A, Lindquist M, Edwards IR, Orre R. A data mining approach for signal detection and analysis. Drug Saf. 2002;25:393–7.
Article CAS PubMed Google Scholar
Agency EM. Screening for adverse reactions in EudraVigilance. EMA/849944/2016. 2016;3–33.
Topamax (Topiramate): uses, dosage, side effects, interactions, warning.
Levate (amitriptyline), dosing, indications, interactions, adverse effects, and more.

Download references

Acknowledgements

Not applicable.

Funding

Open access funding provided by Royal Library, Copenhagen University Library.

Author information

Authors and Affiliations

Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark
Fahed Al-Azzawi, Israa Mahmoud & Maurizio Sessa
GSK, Wavre, Belgium
François Haguinet
GSK, London, UK
Andrew Bate
London School of Hygiene and Tropical Medicine, University of London, London, UK
Andrew Bate
New York University, New York, NY, USA
Andrew Bate

Authors

Fahed Al-Azzawi
View author publications
You can also search for this author in PubMed Google Scholar
Israa Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
François Haguinet
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Bate
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Sessa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahed Al-Azzawi.

Ethics declarations

Author contribution

MS and FA had the idea for the project. All authors drafted and commented on the manuscript. FA performed data management and analysis. MS developed the framework. All authors read and approved the final version of this manuscript.

Funding

Not applicable.

Conflicts of interest

The authors declare that they have no conflict of interest in this research.

Ethics approval

Not applicable.

Consent to participate

Not applicable

Consent for publication

Not applicable.

Availability of data and material

We used publicly available data from FAERS that can be accessed at: https://open.fda.gov/data/faers/. The code of our simulations is provided in ESM 1.

Code availability

Code availability can be requested upon reasonable request to the corresponding author.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 36 KB)

Supplementary file2 (PDF 206 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which permits any non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc/4.0/.

Reprints and permissions

About this article

Cite this article

Al-Azzawi, F., Mahmoud, I., Haguinet, F. et al. Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data. Drug Saf 46, 743–751 (2023). https://doi.org/10.1007/s40264-023-01317-0

Download citation

Accepted: 07 May 2023
Published: 10 June 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s40264-023-01317-0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Developing an Artificial Intelligence-Guided Signal Detection in the Food and Drug Administration Adverse Event Reporting System (FAERS): A Proof-of-Concept Study Using Galcanezumab and Simulated Data

Abstract

Introduction

Objectives

Methods

Results

Conclusion

Similar content being viewed by others

Evaluating performance of electronic healthcare records and spontaneous reporting data in drug safety signal detection

Towards Automating Adverse Event Review: A Prediction Model for Case Report Utility

A Method to Combine Signals from Spontaneous Reporting Systems and Observational Healthcare Data to Detect Adverse Drug Reactions

1 Introduction

2 Methods

2.1 Automatic Selection of Controls

2.2 Data Sources and Management

2.3 Statistical Analysis

2.3.1 Comparability of Controls

2.3.2 Dismissal of False-Positive Disproportionality Signals

2.4 The Rationale for Using Conditional Inference Trees

2.5 Framework Validation Using Simulated Data

3 Results

3.1 Automatic Selection of Controls and Comparability

3.2 Dismissal of False-Positive Disproportionality Signals

3.3 Framework Validation

4 Discussion

4.1 Automatic Selection of Control Groups and Comparability

4.2 Automatic Identification of Alternative Causes in Disproportionality Signals

4.3 Limitations

5 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Author contribution

Funding

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Availability of data and material

Code availability

Supplementary Information

Supplementary file1 (PDF 36 KB)

Supplementary file2 (PDF 206 KB)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation