Introduction

Measurement of autoantibodies against islet beta cell antigens is performed in many laboratories throughout the world. Autoantibodies to glutamic acid decarboxylase (GADA) and islet antigen-2 (IA-2A) are no longer solely research tools, but are beginning to be used in clinical practice. They have an important role in the clinical classification of diabetes [1], prediction of the need for insulin treatment [2], identification of individuals at risk of developing type 1 diabetes [3] and as endpoints in observational studies [4]. Assays are therefore increasingly run in clinical laboratories as well as in the research units where they were first established.

The Diabetes Autoantibody Standardization Program (DASP), a collaboration of the Immunology of Diabetes Society (IDS) and the US Centers for Disease Control and Prevention, was established on the basis of a series of IDS workshops to improve and standardise the measurement of the autoantibodies predictive of type 1 diabetes [58]. The major goals of DASP are to assist laboratories in improving methods by providing technical support, training and information, to organise workshops for harmonisation of antibody testing among laboratories, and to provide reference materials for the development of new measurement technologies. The aims of the first DASP evaluation were to assess and improve comparability of islet autoantibody measurements among laboratories and to undertake extended evaluation of the new WHO international reference reagent for antibodies to GAD and IA-2 [8, 9]. To achieve those aims, we established the format that has been applied in the subsequent workshops—that is the circulation of aliquots of coded sera from relatively large numbers of unselected patients with newly diagnosed type 1 diabetes and healthy controls [9]. This differs from other autoantibody standardisation programmes that generally rely on pre-selection of autoantibody positive and negative sera [1013].

We now report the GADA and IA-2A results of the follow-up workshops run in 2002, 2003 and 2005, and compare them with those of DASP 2000. The main aims of these workshops were to allow participating laboratories to assess the sensitivity and specificity of their assays and to assess concordance between laboratories if GADA and IA-2A results were expressed in WHO units/ml derived from previously calibrated in-house standards. In addition we have been able to evaluate changes in GADA and IA-2A measurement that have been implemented in participating laboratories since the first workshop.

Methods

Study design

For each of the four workshops, participating laboratories received uniquely coded sets of sera from 50 patients with newly diagnosed type 1 diabetes and 100 healthy control individuals. These included the 50 control sera circulated in DASP 2000. Sera were prepared and frozen in 100 μl aliquots. Coded sera were distributed to 60 laboratories in 17 countries in DASP 2002, to 50 laboratories in 17 countries in DASP 2003, and to 46 laboratories in 18 countries in DASP 2005 (see Electronic supplementary material [ESM] for list of participating laboratories). Laboratories requested between one and five uniquely coded sets of sera depending on the volume needed for their assays. The different sets could not be combined, therefore only assays with a maximum requirement of 100 μl could be included in the workshop.

Sera from patients were obtained within 13 days of diabetes diagnosis (median 2 days). The median patient age was 18 years (range 9–35 years). Eight patient sera were included in all four workshops, 13 in three workshops, and 41 in two workshops. Control sera were obtained from 100 US blood donors (median age 20 years; range 18–28 years). The same 100 control sera were circulated in the 2002, 2003 and 2005 workshops. Laboratories were asked to test the sera with any assays currently in use for detection of autoantibodies. Laboratories could also report results of additional assays for evaluation, including multiple assays for a single autoantibody if desired.

Data analysis

Sensitivity and specificity

Laboratory-defined sensitivity for each assay was calculated as the percentage of sera from patients reported as positive using the laboratory’s own cut-off. Laboratory-assigned specificity was calculated as the percentage of healthy control sera reported as negative using the same cut-off.

Receiver operator characteristic curves

Receiver operator characteristic (ROC) curves were used to evaluate the performance of each assay in discriminating disease from non-disease. The area under the ROC curve (AUC) with 95% CI was calculated assuming a non-parametric distribution of results. An AUC of 1.0 would indicate that the assay achieved 100% accuracy in identifying disease, and an AUC of 0 would indicate that the assay gave a positive result for control individuals and a negative result for patients. Accordingly, an AUC of 0.5 would indicate that the assay gave a random assignment of disease and non-disease. In order to facilitate further comparisons among laboratories, the coordinates of the ROC curve were used to define the level of sensitivity that corresponded to a specificity of 95%, which was defined as the adjusted sensitivity95 (AS95).

Use of the WHO standard in antibody quantification

The WHO international reference reagent for GADA and IA-2A (97/550) was defined as containing 100 WHO units/vial (250 WHO units/ml) [8]. This standard had been used in DASP 2000 and allows laboratories to express GADA and IA-2A levels in common units. Participating laboratories were asked to calibrate their local standards in DASP 2002, assigning values in WHO units/ml on the basis of the median of five assays. These calculations were checked centrally and laboratories were not asked to re-calibrate for subsequent workshops if the format of their assay was unchanged. Reference reagent was provided to laboratories participating in the programme for the first time or if there had been substantial changes in the assay between workshops.

Laboratories used between one and 11 standards to calculate their results. The instructions provided for the calculation of WHO units/ml were as follows: (1) single standard: units = [(cpm test sample − cpm negative control)/(cpm local WHO calibrated standard − cpm negative control)] × WHO units of local standard; and (2) multiple standards: a regression curve was constructed for cpm (or optical density) vs the assigned WHO units/ml for each of the standards. This regression curve was used to convert cpm (or optical density) of the proficiency samples into WHO units/ml. High values for sera (above 250 WHO units/ml) were replaced with 250 and low values (below 1) were replaced with 1 for further comparisons.

The Spearman rank correlation test was used to test for correlation between variables with a non-normal distribution of values (r S). Non-parametric tests were used to compare antibody levels in patient and control individual samples, and for comparisons between laboratories, workshops and assay methods. Assays for which results were missing on more than 10% of samples were not included in inter-laboratory comparisons. These are indicated in ESM Tables 1 and 2. Two-tailed p values <0.05 were considered significant.

Results

GADA assays

The results of GADA determination in DASP 2002 (55 assays), 2003 (52 assays) and 2005 (52 assays) are summarised in Table 1 . All assays differentiated between patient and control sera with higher levels in patients compared with controls (all p < 0.001). There were no overall differences in AUC or AS95 between the workshops (Table 1). Full results for individual laboratories are given in ESM Table 1.

Table 1 Results of GADA and IA-2A determinations in DASP 2002, 2003 and 2005

Assay format

In the first workshop the highest sensitivity, specificity, AUC and AS95 for GADA were achieved in laboratories using in-house RIA. In DASP 2003, a commercial ELISA kit tested in two different laboratories achieved the highest sensitivity, specificity, AUC and AS95 (ESM Table 1). In DASP 2005, the sensitivity and specificity were higher using ELISA (n = 8; median sensitivity 89%; interquartile range 85–92%; median specificity 98%; interquartile range 96–99%) than RIA (n = 40; median sensitivity 80%; interquartile range 74–82%, p = 0.002; median specificity 96%; interquartile range 93–98%, p = 0.038). The median AUCs according to assay format in the three workshops are shown in Fig. 1a. In DASP 2005, ten assays (laboratories 109c, 110, 116, 126, 128, 132g, 133c, 145, 150 and 200) reported values for both AUC and AS95 in the upper quartile. Six of these laboratories (109c, 128, 132g, 133c, 145 and 200) used a commercial ELISA with recombinant human GAD65 with biotin–streptavidin–peroxidase for detection. Median GADA levels for each patient and control sample calculated as WHO units/ml from RIA (n = 40) in DASP 2005 correlated with median levels from ELISA (n = 8; r S = 0.81; p < 0.001; Fig. 1b). Of 50 patient samples, two were reported as positive in the majority of the RIA and were negative in ELISA, and one was reported as positive in the majority of the ELISA and negative in RIA.

Fig. 1
figure 1

The effects of GADA assay format. a Differences in GADA AUC between in-house RIA (circles), commercial RIA kits (triangles) and ELISA (squares) in the three workshops. Horizontal bars represent median AUC. b The correlation between median GADA levels in samples from 50 patients and 100 control individuals in DASP 2005 as reported by RIA (40 assays) and ELISA (eight assays; r S = 0.81; p < 0.001)

Concordance of measurements

In DASP 2005, serum samples from 30 patients and none of the control samples were reported positive in >95% of assays; an additional five patient samples and one control sample were reported positive in >90% of assays; an additional six patient samples were reported positive in >50% of assays and finally a further three patient samples and five control samples were positive in >25% of assays (ESM Fig. 1). Agreement for either positivity or negativity in >95% of assays was observed for 122 samples (34 patient samples and 88 control samples). The ranges of GADA levels expressed as WHO units/ml for the 50 patient samples and 100 control samples are shown in ESM Fig. 2. As expected, the CV in GADA assays for patient samples was inversely related to titre.

Combined ROC curve

The median values for each sample compiled from all measurements were used to construct a combined ROC curve with AUC 0.95 (95% CI 0.91–1.00). Using this curve, a cut-off value of 18 WHO units/ml corresponded to a specificity of 98% and a sensitivity of 88% (Fig. 2).

Fig. 2
figure 2

Generalised ROC curves for GADA (upper line) and IA-2A (lower line) in DASP 2005. These were compiled from all measurements of GADA and IA-2A in 50 assays. For GADA, the AUC was 0.95 (95% CI 0.91–1.00), and a cut-off value of 18 WHO units/ml corresponded to 98% specificity (arrow) and 88% sensitivity. For IA-2A, the AUC was 0.86 (95% CI 0.78–0.94), and a cut-off value of 2.5 WHO units/ml corresponded to 98% specificity and 74% sensitivity

IA-2A assays

The results of IA-2A determination in DASP 2002 (53 assays), 2003 (52 assays) and 2005 (52 assays) are summarised in Table 1. All assays differentiated between patient and control sera with higher levels in patients compared with controls (all p < 0.001, Mann–Whitney U test). The performance of IA-2A improved over the course of the workshop programme. The median AUC in DASP 2002 was 0.81 (interquartile range 0.79–0.83), in DASP 2003 was 0.82 (0.78–0.84), and in DASP 2005 was 0.85 (0.82–0.87; p < 0.0001; Table 1). Full results for individual laboratories are given in ESM Table 2.

Assay format

In the first two workshops the AUC and AS95 for IA-2A were similar using in-house and kit RIA, but lower using ELISA. In DASP 2005, the AUC were similar using RIA and ELISA, but the sensitivity was lower in IA-2A ELISA (n = 6; median 65%; interquartile range 62–67%) compared with RIA (n = 43; median 70%; interquartile range 68–72%; p = 0.002). The specificity was similar in ELISA (n = 6; median 99%; interquartile range 98.8–99.2%) and in RIA (n = 43; median 100%; interquartile range 97–100%; p = NS). Median AUC according to assay format in the three workshops is shown in Fig. 3a. Median IA-2A levels for each patient and control sample reported by RIA (n = 43) correlated with those by ELISA (n = 6; r S = 0.54; p < 0.001; Fig. 3b).

Fig. 3
figure 3

The effects of IA-2A assay format. a Differences in AUC between in-house RIA (circles), commercial RIA kits (triangles) and ELISA (squares) in the three workshops. Horizontal bars represent median AUC. b The correlation between median IA-2A levels in 50 patient samples in DASP 2005 as reported by RIA (43 assays) and ELISA (six assays; r S = 0.54; p < 0.001)

IA-2 clone

All three workshops included RIA for IA-2A that used IA-2ic/full length and the shorter version IA-2bdc. As observed in DASP 2000 [9], a small number of sera had clearly dichotomous results in assays using different clones. In DASP 2005, the sensitivity of IA-2bdc RIA was lower (n = 5; median 64%; interquartile range 61–65%) compared with IA-2ic/full length RIA (n = 38; median 70%; interquartile range 69–72%; p < 0.001), as was the AUC (p = 0.004, Fig. 4a). Specificities for IA-2bdc and IA-2ic/full length RIA were similar (n = 5; median 99%; interquartile range 94–100% and n = 38; median 100%; interquartile range 98–100%; p = NS). There was a correlation between median levels found using IA-2ic/full length assays (n = 38) and IA-2bdc assays (n = 5) (r S = 0.62; p < 0.001) but four outlying samples were reported as median >90 WHO units/ml in IA-2ic/full length assays but <10 WHO units/ml in assays using IA-2bdc (Fig. 4b).

Fig. 4
figure 4

The effects of IA-2 clone. a The AUC obtained using IA-2ic or full length IA-2 compared with IA-2bdc in in-house RIA in DASP 2005. Horizontal bars represent median AUC. b The correlation between median levels of IA-2A in samples from 50 patients and 100 control individuals in 38 assays using IA-2ic/full length and in five assays using IA-2bdc (r S = 0.62; p < 0.001)

Concordance of measurements

In DASP 2005, serum samples from 27 patients were reported positive in >95% of assays; an additional three patient samples in >90% of the assays, an additional five patient samples in >50% of assays and finally one additional patient sample in >25% of assays (ESM Fig. 3). None of the control samples was reported positive in more than 10% of assays. Agreement for either positivity or negativity in >95% of assays was observed for 121 samples (35 patient samples and 86 control individuals). The ranges of IA-2A levels expressed as WHO units/ml for the 50 patient samples and 100 control samples are shown in ESM Fig. 4. As expected, CV in IA-2A assays for patient samples was inversely related to titre.

Combined ROC curve

The median values compiled from all measurements were used to construct a combined ROC curve with AUC 0.86 (95 CI% 0.78–94). Using this curve, a cut-off value of 2.5 WHO units/ml corresponded to a specificity of 98% and a sensitivity of 74% (Fig. 2).

Other assays

In DASP 2005, three laboratories reported results for combined assays for antibodies to GAD65 and IA-2ic and one for islet cell antibodies using ELISA technology. The RIA for antibodies to GAD65/IA-2ic achieved 98% sensitivity and 64% specificity. Two ELISA for antibodies to GAD65/IA-2ic participated in the workshop, one assay achieved 96% sensitivity with 98% specificity (Laboratory 132) and the other assay achieved 70% sensitivity with 81% specificity (Laboratory 204). The ICA ELISA using autoantigens derived from highly purified primate material achieved 96% sensitivity with 98% specificity (Laboratory 145).

Discussion

Both GADA and IA-2A assays show remarkable concordance among laboratories in categorisation of samples and quantification of antibody levels in spite of the use of different assay formats. GADA assays generally maintained high levels of sensitivity with good discrimination between health and disease, while the performance of IA-2A assays progressively improved over the three workshops. Many laboratories participated using commercial assay kits, both RIA and ELISA, and we have shown that some of these achieved levels of sensitivity and specificity equivalent to in-house RIA. Over this series of workshops there has been a particular improvement in the performance of GADA ELISA, and in DASP 2005 a commercial ELISA performed very well in a number of laboratories. Although used in fewer laboratories, the ELISA for IA-2A also improved and achieved AUC equivalent to those of in-house RIA, although results of the two assay formats correlated less well than for GADA. A small number of combined assays for GADA and IA-2A were included in DASP 2005 and, as expected, some achieved high sensitivity, confirming the value of this approach for initial screening [1416], particularly since one of these assays used the ELISA format. In addition, the IA-2 clone used was a determinant of differences in assay performance; the relatively small numbers of RIA using the IA-2bdc giving significantly lower sensitivity and AUC than those using IA-2ic.

The design of DASP proficiency evaluations provides blinded testing of large numbers of unselected sera from health and disease in a wide variety of academic and clinical laboratories, allowing assessment of performance in terms of true disease sensitivity and specificity, and as closely as possible mimicking the ‘real life’ situation. In this it differs substantially from most quality assurance schemes, which generally involve the testing of relatively few sera pre-selected for antibody positivity [1013]. There are some caveats. First, because of the volume of sera required, very young children are under-represented and the control individuals are not precisely matched with the cases for age or country of origin. Second, the control samples are from blood donors and we have had to rely on reports that neither they nor their family members have diabetes rather than direct assessment of disease status. Third, the methods used to quantify antibodies are not entirely comparable among laboratories because they used local standards that had been calibrated to the international reference material at different times—some had performed the calibration exercise in 2002 or 2003 and others did this as part of the 2005 workshop.

In the past, solid-phase assays have proved less useful than liquid phase assays in identifying diabetes-associated antibodies [9, 17], but in both DASP 2003 and 2005 an ELISA for GADA and a combined GADA/IA-2A assay from the same company achieved high levels of sensitivity and specificity both in the hands of the manufacturer and in a number of other laboratories. This has important implications for testing in clinical laboratories, allowing high throughput testing using a less technically challenging format and avoiding the use of radioactivity. This will facilitate the more widespread testing for these autoantibodies and will limit the need for the use of reference laboratories. It should, however, be noted that the performance of different ELISA kits included in the 2002–2005 workshops varied widely and that the improvement was related to kit design. It is therefore important that potential users seek details of the performance of any kit in the context of a blinded sample exchange such as DASP. The disadvantages of current ELISA kits are that they are generally more costly than in house RIA, and—of particular importance to studies in young children or using stored samples—they require larger serum volumes than RIA based on 35S-labelled antigens.

We identified clone-related differences in the sensitivity of IA-2A assays in DASP 2000 [9] and have confirmed this in the subsequent proficiency evaluations. Comparison of the median levels obtained in assays using IA-2ic and IA-2bdc clearly demonstrates that there are a small number of patient sera that are identified as strongly IA-2A positive in the former but are negative or weakly positive in the latter (Fig. 4b), supporting the notion that IA-2bdc lacks at least one disease-related epitope. The proportion of laboratories using the IA-2bdc clone has progressively decreased. Our data suggest that this may, at least in part, explain the improvement in IA-2A sensitivity, and imply that laboratories using the IA-2bdc clone should consider switching to IA-2ic.

In summary, the DASP proficiency evaluations have demonstrated that current assays for GADA and IA-2A generally achieve high levels of sensitivity and specificity, although human errors in data handling still have the potential to undermine this. In spite of substantial differences in assay methods, there was good concordance between laboratories when GADA and IA-2A levels were expressed in common units, but some differences persist. This suggests that in large studies requiring directly comparable results from different laboratories, further harmonisation towards use of common antigens, standards and protocol will be required, and on the basis of the results of DASP 2005, standard assay protocols for GADA and IA-2A for use in the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) type 1 diabetes consortia are being developed and validated. Although relatively demanding in terms of sample volumes, organisation, reagents and time, the workshop format highlights systematic differences related to assay method and allows full evaluation of novel methods, such as the GADA ELISA, in laboratories with different demands and levels of experience. Also, of increasing importance in recent workshops, the addition of substudies to the core workshop allows for rapid evaluation of new ways to discriminate disease-relevant antibodies such as insulin autoantibody affinity [18]. The programme of autoantibody workshops in type 1 diabetes provides a model for other autoimmune diseases.