To establish an unbiased dataset for comparison of clinical classifications within and between public databases, we performed a retrospective database query of BRCA1 and BRCA2 variants identified among consecutive patients who were referred to Myriad Genetic Laboratories, Inc., for sequencing and large rearrangement testing in November and December of 2013 (Fig. 1). Using this snapshot of data as of November 6, 2013, we then cross-referenced each variant in the dataset with its classification in five publicly accessible databases: (1) the Breast Cancer Information Core (BIC), an online, open-access breast cancer mutation database maintained by the National Human Genome Research Institute at the National Institutes of Health (NIH) (Szabo et al. 2000); (2) the Leiden Open Variation Database 2.0 (LOVD, chromium.liacs.nl/LOVD2/cancer/home.php?select_db=BRCA1, chromium.liacs.nl/LOVD2/cancer/home.php?select_db=BRCA2), maintained by the Leiden University Medical Center, the Netherlands (Fokkema et al. 2011); (3) ClinVar, a freely accessible public archive maintained by The National Center for Biotechnology Information (NCBI) at the NIH with the goal of reporting relationships between human variations and phenotypes (Landrum et al. 2014); (4) the BRCA1 and BRCA2 Universal Mutation Database (UMD, http://www.umd.be/BRCA2/), which contains published and unpublished information about BRCA1 and BRCA2 mutations reported in a network of 16 French diagnostic laboratories (Beroud et al. 2000); and (5) the Human Gene Mutation Database (HGMD), a paid subscription database maintained by the Institute of Medical Genetics in Cardiff (Stenson et al. 2009).
Variants and their respective classifications in LSDBs were compiled to analyze discrepancy rates between and within databases. To facilitate comparison between different classification schemes, we grouped classifications within databases into three major categories: pathogenic (pathogenic and likely pathogenic), benign (benign and likely benign), and variants of uncertain clinical significance (VUS). The criteria used for group classifications from each database are listed in Table 1. Multiple instances of the same variant, within the same database, were considered as “conflicting” if they were assigned both a pathogenic and benign classification. Classifications were not considered conflicting within the same database if a variant was classified as pathogenic and VUS or benign and VUS. In these cases, the pathogenic or benign clinical classification was used in cross-database comparisons. The variants with classifications that were found in the “Other Classifications” category were excluded from the comparison.
For quality assurance of the data, once the classifications in the data set were recorded, a blinded review was performed by two independent reviewers who were given a list of 100 variants from the overall dataset. These 100 variants were randomly selected in order to obtain a representative subset of the variants analyzed here. Each reviewer queried all five databases, noting the classification if available. Once the review was complete, the subset of variants was compared to the initial list of classifications to verify their accuracy. This approach allowed for the discovery and correction of any systematic errors, which arose primarily from inconsistent nomenclature across databases.
Agreement across databases
To investigate the degree to which LSDBs agree on variant classifications, we stratified the 2017 unique variants based on the number of databases in which they were present (e.g., all variants that were seen in four databases formed one group). For each database in which a variant occurred, its classification was noted. Agreement was defined as all LSDBs in which a variant occurred assigning the same classification. The frequency of agreement (%) was calculated as the number of variants where agreement was found against the number of variants assigned the specified classification in one or more databases (for sample sizes in the different categories, see Fig. 2). For example, a variant was judged to be in the “pathogenic” subset if at least one database assigned that classification. We recorded this agreement separately for all variants judged to be pathogenic, benign, and VUS.
Discrepancies within databases
We also examined the potential for conflicting variant classifications to occur within the same database. BIC, HGMD, and UMD provide a single “master” clinical classification per unique variant on the primary variant report page; however, ClinVar and LOVD currently do not provide a single “master” classification and instead list the conflicting entries. The variants with conflicting entries in these databases are listed in the Supplemental Material. We totaled the number of variants seen for each classification (pathogenic, benign, VUS) in each of the five LSDBs and recorded the conflicting classifications when multiple instances of the exact same variant were observed in the same database.
Analysis of additional evidence utilized by databases
The use of literature and unique empirical data is an important facet of variant classification. However, the clinical utility of such information may be subject to debate when, e.g., the raw data are unavailable to support a reported conclusion or an intermediate functional effect is reported in a biochemical assay. To assess the use of literature and unique empirical data by LSDBs, we first collected a subset of variants from the 124 unique variants found in all five LSDBs that could be conservatively classified as VUS. The following criteria were used to select this subset: (1) variant listed in all five of the databases, (2) variant receives a default classification of VUS when excluding literature and other empirical evidence, and (3) definitively classified by at least one of the five databases as pathogenic (or likely pathogenic). The draft form of updated guidelines from the ACMG/AMP/CAP Interpretation of Sequence Variations Workgroup was used for this analysis (Table 3). To establish a default VUS classification for criteria #2, the following types of variants were considered: missense variants, intronic variants greater than two nucleotides inside the intron from the native RNA splice acceptor or donor site, in-frame insertions/deletions, and variants within the 5' untranslated region (UTR). We next noted the reported classification of these variants in the five LSDBs and plotted all pathogenic and benign classifications (Fig. 3). Any variants that already held a VUS clinical classification in a given database were excluded from further analysis.
We also evaluated the content of the BIC, ClinVar, LOVD and UMD databases to determine whether they provided sufficient, verifiable supporting data for an independent reviewer to concur with the databases’ pathogenic classification. As HGMD had the highest discrepancy rate when all databases were compared, it was excluded from this analysis. To perform this analysis, we chose a set of “challenging to classify” variants, which we defined as variants present in all four databases (BIC, ClinVar, LOVD, and UMD) that would be conservatively classified as VUS per the criteria listed previously but that were classified as “pathogenic” in one or more databases. The evidence listed in the database was then evaluated using the evidence-based criteria released in draft form by the ACMG/AMP/CAP Interpretation of Sequence Variations Workgroup, which are soon to be published (Lyon et al. 2013; Richards et al. 2014). Sufficient supporting data was defined as verifiable data contained or referenced in the database that met the minimal requirements for a “Likely Pathogenic” classification (note—for the purposes of this evaluation, a variant’s listing in a public database was not used as a supporting line of evidence for classification).