Introduction

Bacterial canker of kiwifruit caused by Pseudomonas syringae pv. actinidiae (Psa) was first described in Japan (Takikawa et al. 1989) and subsequently in Italy and in Korea (Koh et al. 1984; Scortichini 1994). Whereas the disease caused severe economic losses in Japan and in Korea, in Italy remained sporadic and with a low incidence for 20 years. However, in 2007/2008 economic losses started to be observed in Italy, and in 2010/2012 also in all the main areas of kiwifruit cultivation in the world (https://gd.eppo.int/taxon/PSDMAK/reporting).

Four different Psa population (named biovars) have also been previously described and characterised by different virulence (Chapman et al.2012). The biovar 1 (also named Psa1) include strains associated with the first epidemics of bacterial canker in Japan and in Italy. Biovar 2 strains (Psa2) are only reported from Korea. Biovar 1 and 2 have not been detected since 1998. Biovar 3 is currently reported from Chile, Argentina, China, Italy and other European countries, Japan, Korea and New Zealand. Biovar 3 is also referred to as PsaV or Psa3, and is the population responsible for the global pandemic first reported in Italy in 2008. Biovar 4 included low virulence strains (Chapman et al. 2012; Vanneste et al. 2013). This population, also referred to as Psa LV, has been reported by Ferrante and Scortichini (2014) different from the pathovar actinidiae and subsequently classified as a new pathovar, P. syringae pv. actinidifoliorum (Pfm) (Cunty et al. 2014).

In consideration of the high impact of Psa for the kiwifruit industries around the world, the control of this pandemic became an urgent issue. However, the application of control strategies needed a reliable detection of the causal agent (i.e in propagative material, or in a new outbreak area) by using harmonized diagnostic protocols based on high performance methods. Guidance on the validation process is reported in the EPPO Standard PM 7/98 (2) (European Plant Protection Organization 2014a) that mentions: “A test is considered fully validated when it provides data for the following performance criteria: analytical sensitivity, analytical specificity, reproducibility and repeatability”. Concerning Pseudomonas syringae pv. actinidiae (Psa) - the causal agent of bacterial canker of Actinidia spp.- an inter-laboratory comparative study on the detection methods was performed among Italian laboratories in 2011 (Loreti et al. 2014). This study showed that, among the media tested for isolation, the modified King’s B medium (KBC) (Mohan and Schaad 1987) was better for Psa isolation than the modified Nutrient Sucrose Agar (mNSA) (Mohan and Schaad 1987), the King’s B medium (KB) or the Nutrient Sucrose Agar (NSA) medium. In addition, the PCR-based assays (simplex-PCR of Rees-George et al. (2010), duplex-PCR of Gallelli et al. (2011) used directly on infected matrices (wood, leaf, pollen), were more reliable than isolation. Finally, the simplex-PCR of Rees-George et al. (2010) and the duplex-PCR of Gallelli et al. (2011) were the most inclusive and exclusive identification method, respectively (Loreti et al. 2014). Recently an EPPO standard (PM 7/120 (European Plant Protection Organization 2014b) has been published as a formal guide on procedures for Psa diagnosis. This standard includes isolation on agar plate and two PCRs: the simplex-PCR of Rees-George et al. (2010) and the duplex-PCR of Gallelli et al. (2011). This standard can be applied to different matrices (budwood, shoots, twigs, pollen, in vitro micropropagated plants).

New molecular methods have been developed by several research groups respectively for the detection and the identification of all the biovars of Psa (nested-PCR, Biondi et al. 2013; multiplex-PCR, Balestra et al. 2014) and also specifically for the Psa biovar 3 (simplex-PCR-C and real-ime PCR, Gallelli et al. 2014), but these methods have not been validated as Psa detection and/or identification methods. The purpose of this study was to provide validation data for these new molecular tests in comparison to the tests previously assessed.

Inter-laboratory test performance studies (TPS) are an essential part of the validation process of analytical methods: they are used to determine the performance of different tests among laboratories, to establish their comparability and, consequently, to provide objective evidence that the tests are suitable for a specific intended use (https://ec.europa.eu/jrc/en/ /interlaboratory-comparisons). Therefore, an inter-laboratory test performance study was conducted to select the most efficient detection methods for Psa.

This paper presents the results of this inter-laboratory comparison including nine laboratories from Europe: two from New Zealand and one from Turkey - (Table S1). Because plants are one of the main pathways for the introduction and spread of the bacterium (European Plant Protection Organization 2012), and bark canker is a typical symptom of Psa on woody plant tissue, woody extracts were used as a matrix to compare the detection tests.

Isolation of Psa on the mNSA and KBC (Mohan and Schaad 1987), simplex, duplex, nested, multiplex and real-time PCR-based methods (Rees-George et al. 2010; Gallelli et al. 2011; Biondi et al. 2013; Balestra et al. 2014; Gallelli et al. 2014) were tested on thirteen woody extracts of Actinidia deliciosa cv. ‘Hayward’ spiked with bacterial suspensions of different concentrations. These methods were evaluated using the performance criteria defined in the EPPO standards PM7/98 (2) and PM7/122 (1) (European Plant Protection Organization 2014a, c). Since each laboratory processed an identical set of samples, under different conditions, the study aimed to evaluate the benefits and disadvantages of each method. This inter-laboratory test was completed by a study performed by a subgroup of four laboratories from bacterial suspensions. The purpose of this further study was to provide validation data on the test capacity to identify Psa.

According to the results of this work (TPS and further study) different methods are proposed for the screening or identification of Psa. Similarly, one or two detection tests are recommended depending on the level of Psa acceptable for the situation (i.e. certification of propagation material or in case of low or high disease prevalence). The performance criteria obtained for each method should be taken into account for the revision of the existing EPPO Standard on Psa detection and identification scheme.

Material and methods

Study design

The study comprised two parts to individuate on the one hand the performance of the screening or detection methods, and on the other hand strain identification methods on pure culture of bacteria. The first part includes the evaluation of the Psa detection methods for the screening of wood plant material by an inter-laboratory study that involved thirteen laboratories. As the number of data collected allowed statistical analyses, the results were analysed to compare the accuracy, diagnostic specificity, diagnostic sensitivity, repeatability and reproducibility of the methods. The results were also compared using the Bayesian approach. The second part conducted outside the inter-laboratory study, aimed at evaluating the capacity of the methods to identify Psa-like colonies by assessing the analytical specificity (inclusivity, exclusivity) on pure culture collection. The aggregated results of four laboratories are presented. This second part was not intended to be an inter-laboratory study per se, since each laboratory prepared its own bacterial suspensions. Therefore, the results obtained are only giving rise to descriptive statistics.

Participant laboratories

The following 13 laboratories were candidates for the TPS: Instituto Valenciano de Investigaciones Agrarias (IVIA), Centro de Proteccion Vegetal y Biotechnologia-Spain; Deputación de Pontevedra. Estación Fitopatolóxica Areeiro-Spain; The French Agency for Food, Environmental and Occupational Health & Safety, Plant Health Laboratory (ANSES-LSV)-France; Università degli Studi di Modena e Reggio Emilia (UniMoRe)-Italy; Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria, Centro di Ricerca Difesa e Certificazione, Sede di Roma (CREA-DC)-Italy; Università della Tuscia, - Department of Agriculture and Forest Sciences (DAFNE), Viterbo-Italy; Laboratorio Fitopatologico Regione Lombardia, Servizio Fitosanitario/Fondazione Minoprio-Italy; Benaki Phytopathological Institute (BPI), Department of Phytopathology Laboratory of Bacteriology-Greece; Instituto Nacional de Investigação Agrária e Veterinária (INIAV), UEIS-SAFSV Laboratório de Fitobacteriologia-Portugal; Austrian Agency for Health and Food Safety, Institute for Sustainable Plant Production (AT-AGES)-Austria; Ministry of Agriculture and Forestry, Plant Health and Environment Laboratory, Diagnostic and Surveillance Services, Biosecurity New Zealand (MPI-PHEL)-New Zealand; Plant and Food Research (PFR)-New Zealand; Plant Protection Central Research Institute (PPCRI)-Turkey. From these candidates one laboratory decided not to participate, as a consequence the final numbe of participating laboratories was 12.

Part 1: inter-laboratory study

The samples used for the test performance study

Twenty-three identical sets, each included thirteen samples, and were prepared by ANSES-LSV. Details on the sample composition are provided in Table 1. The samples consisted of Actinidia deliciosa cv. Hayward extracts from homogenised woody tissues (canes) prepared by crushing twigs pieces in PBS-Tween as recommended in the EPPO protocol PM7/120 (2014b) spiked (or not) with suspensions of the bacterial strain Psa ISF 8.43 (biovar 3) containing 107 CFU ml−1 (D7), 105 CFU ml−1 (D5), 104 CFU ml−1 (D4), 103 CFU ml−1 (D3) or 0 CFU ml−1 (D0),. Bacterial suspensions were prepared from a loopful of a 24–48 h bacterial culture in a 0.5 ml volume of distilled sterile water and bacterial concentrations were determined spectrophotometrically (A660 = 0.1 OD corresponding to 5 × 107 CFU per ml). The sample with the highest Psa concentration (D7) and the sample with no Psa (D0) were prepared in duplicate; the other samples (D5, D4 and D3) were prepared in triplicate. Samples were randomised within each set and the sets were randomly assigned to the participants. Although the order of the samples was subject to randomisation, the preparation and constitution of the samples within each set was identical, thus maximising the sample homogeneity. After the randomisation process, each sample was labelled with a code. Each laboratory checked if samples and materials were in appropriate condition upon arrival. A protocol with the details for the detection procedures was sent to each laboratory.

Table 1 Samples used to evaluate the different performance criteria in the TPS

Isolation on mNSA/King’s B medium

Fifty μL of each wood extract sample, and its 10-fold and 100-fold dilutions, were plated onto KBC (King et al. 1954) or mNSA (Oxoid nutrient agar supplemented of 5% w/v sucrose) as described by Mohan and Schaad (1987) (semi-selective media) and incubated at 25–27 °C for 72 h. Psa strain CRA-FRU 8.43 was used as a reference to assist selection and purification of putative Psa colonies (i.e. colonies with a morphology similar to Psa) on each medium.

DNA extraction

Bacterial cells were concentrated from the wood extract samples by centrifuging 500 μl of each sample at 12000 g for 10 min and resuspending the pellet in 400 μl of the AP1 Buffer of the DNeasy Plant Mini Kit (Qiagen, Germany). DNA extraction was performed according to manufacturer’s instructions, with the following modification: after washing with Buffer AW the samples were air-dried for 10 min and the DNA was eluted in 100 μl AE Buffer. The extracted DNA was then analysed by PCR.

PCR based methods

Molecular methods, referenced as M1 to M11 are detailed in the Table 2. Participating laboratories strictly followed the methods as reported in Table S2. When negative results were obtained with undiluted samples, the decimal dilutions were tested. The molecular methods were performed by all laboratories following the procedure described in the original papers (Rees-George et al. 2010; Gallelli et al. 2011; Biondi et al. 2013; Balestra et al. 2013; Gallelli et al. 2014). The use of reagents was left to the appraisal of the laboratories following the suggestion of the original paper (e.g. enzyme).

Table 2 Methods evaluated during the interlaboratory test performance study

Evaluation of performance criteria

Performance criteria and validation procedure were established following PM 7/76 (4) and PM7/98 (2) EPPO standards (European Plant Protection Organization 2014a, 2017) and International Organization for Standardization ISO 16140:2003 (2003). In particular, accuracy (AC) with diagnostic specificity (DSP) and diagnostic sensitivity (DSE), analytical sensitivity (ASE), reproducibility (CO), repeatability (DA) and concordance odds ratio (COR) were assessed.

The definitions and the calculations of these performance criteria (except accuracy) and all statistical tests used are detailed in Chabirand et al. (2017).

Likelihood ratios were also calculated to compare the methods using the Bayesian approach, as explained in Chabirand et al. (2017).

Evaluation of accuracy

In reference to ISO 5725–1 standard (International Organization for Standardization), accuracy (AC) was defined as the closeness of agreement between a test result (obtained with a method) and the accepted reference value (i.e. for qualitative method, the sample’s real status).

Accuracy was evaluated for all results by calculating the ratio of the sum of the number of positive and negative agreements between a method and the sample’s real status for the number of tested samples. However, as the number of positive and negative samples was not equivalent (11 positive samples vs two negative samples per panel), this ratio was weighted for each observation so that positive samples and negative samples made equal contribution to assessment of accuracy. Confidence intervals (95%) were calculated for AC criterion using the likelihood method (Rao and Scott 1984). Tests on the equality of AC (weighted data) between methods and with the sample’s real status were performed using the adjusted Wald test based on the differences between observed cells counts and those expected under independence (“survey” package in R statistical software) (Koch et al. 1975; Thomas and Rao 1990).

Indeterminate results

Indeterminate results obtained by some laboratories were processed using two hypotheses (H1 and H2) as reported in Chabirand et al. (2017), in order to use, for the calculations, only binary results (positive or negative). In particular, (H1) the laboratory hypothetically made the right decision for the indeterminate results in relation to the samples’ real status (i.e. the indeterminate results were counted as positive for positive samples and negative for negative samples) and (H2) the opposite.

Outlier results

The ISO 16140 standard (International Organization for Standardization) stipulates that the organising laboratory shall determine which results are suitable and which are outliers for use in calculations. Consequently, the results of a laboratory were excluded (considered as outliers) for a given method when the statistical analysis showed a significant difference for the number of indeterminate results obtained by this laboratory compared with others and when the number of indeterminate results obtained by this laboratory represented more than 50% of indeterminate results obtained for the method and when the number of indeterminate results obtained by this laboratory represented more than 50% of results obtained for the panel of samples (i.e. number of indeterminate results ≥7).

Results of a laboratory were also excluded for a given method (i) when the expected result for at least one control was not obtained or (ii) when the number of false results (false positives (FP) + false negatives (FN)) obtained by this laboratory represented more than 50% of false results obtained for the method and when ≥50% of false results were recorded from the panel of samples (i.e. FP + FN ≥ 7).

Data analysis

Statistical tests were performed using the R statistical software package (version 3.3.1; R Development Core Team, Vienna, Austria). Statistical tests were considered significant for a calculated p-value lower than 5%.

Not all the methods were implemented by all the participants. Table S2 summarises which methods were implemented by which participant. Thus, depending on the methods, the performance assessment of each one was based on the results of three to eleven laboratories. This creates a distortion in the precision of assessment of the methods. All the data were processed mentioning this distortion of precision with its accompanying caveats, and being aware that the non-significance of a statistical test does not mean the absence of differences, but the non-identification of differences.

Part 2: analytical specificity

Bacterial strains and cell lysis

To determine the analytical specificity of the different molecular methods, a loopful of 24 to 48 h old culture grown in KB or NSA of bacterial Psa strains (NCPPB 3739 (bv. 1); ISPAVE 019 (bv. 1); KN2 (bv. 2); ISPAVE 020 (bv.1); OMP-BO 1875,1 (bv. 3); OMP-BO 8581,1 (bv. 3); OMP-VE 4136 (bv. 3); CRA-PAV 1625 (bv. 3); CRA-PAV 1530 (bv. 3); CRA-PAV 1699 (bv. 3); CFBP 8025 (bv. 3); CFBP8047 (bv.3); SFR-TO 242a (bv. 3); CFBP8036; CRA-FRU 8.43; CFBP8053; CFBP8062; CFBP8065; CFBP8066; CFBP8092; CFBP8097; CFBP8108; BPI A1; BPI B1; BPI D1–1; BPI E3; BPI G1; BPI 10; BPI 17a; BPI 22), bacterial strains phylogenetically close related to Psa or other Pseudomonas (P. syringae pv. morsprunorum NCCPB 2995; P. syringae pv. tomato NCPPB 1106, NCPPB 2563, IVIA 2650–1; P. syringae pv. theae NCPPB 2598, CFBB 4097; P. avellanae NCPPB3487 (GR), NCPPB 3873 (IT), ISPaVe 1267 (IT); P. syringae pv. syringae CFBP4702, IVIA 3840), bacterial strains associated to kiwifruit as P. syringae pv. actinidifoliorum (Pfm) (CFBP 8038, CFBP 8051, CFBP 7812, CFBP 7951), Pseudomonas spp. (LSV 28.72), P. syringae (LSV 37.27, LSV 37.28, LSV 40.35, LSV 43.31) (Table 3) were resuspended in 0.5 ml of sterile distilled water to a density of approximately 5 × 107 CFU ml-1 and checked by CRA-PAV, ANSES-LSV, IVIA, BPI using different molecular methods. Each participant denatured 100 μl aliquot of each bacterial suspension at 95 °C for 10 min, cooled it on ice and after a centrifugation at 6000×g for 1 min, and used the lysate (2–5 μl) as template in the PCR assays. The lysate could be stored at −20 °C for subsequent analyses. The Psa strain CRA-FRU 8.43 was used as template positive control.

Table 3 Bacterial strains used to evaluate analytical specificity: inclusivity (tested on pure culture of several P. syringae pv. actinidiae strains) and exclusivity (tested on pure culture of non-target strains of several Pseudomonas spp.)

Evaluation of analytical specificity

This performance criterion was assessed in the second part of the study in order to evaluate the methods for strain identification, and in particular, their ability to identify all the target strains (inclusivity) and their capacity not to give false positives with non-target strains (exclusivity). A set of 30 target and 20 non-target bacterial strains either phylogenetically related to Psa (Gardan et al. 1999; Sarkar and Guttman 2004) or associated to the host material, were tested (Table 3).

Results

Part 1: inter-laboratory study

Indeterminate results

Depending on the methods, the rate of indeterminate results (Table 2) ranged from 0% (methods M1, M7, M8, M9, M10 and M11) to 5.1% (method M3). Using Fisher’s exact test, no significant differences in the rate of indeterminate results were identified between methods for the overall results or when considering only positive or negative results (p-values respectively of 0.427, 0.444 and 0.595 for overall results, positive results and negative results respectively).

On the contrary, significant differences in the rates of indeterminate results were identified between laboratories for the overall results and also when considering only positive results (p-values respectively of 0.011 and 7.36 × 10–4 for overall results and positive results). When laboratory L23 is excluded from the analysis, there are no more significant differences between laboratories. The number of indeterminate results obtained by L23 represented more than 50% of indeterminate results obtained for method M2 (3/3) and for M4 (3/4), however it represented less than 50% of the results obtained from the panel of samples (23% for the two methods). So even if there were differences in the indeterminate rates between laboratories, the results of L23 for methods M2 and M4 were used for the performance assessment of methods.

Due to the small number of indeterminate results, no significant differences were identified in the performance assessment of methods between the two scenarios H1 and H2. So, only the first scenario which better reflects the reality is being presented.

Outlier results

The results obtained by some laboratories were not validated by the controls and were excluded from the analysis: they were the results of laboratory L03 for method M7 and the results of laboratory L15 for method M5. Regardless of the indeterminate results counted (scenario H1 or H2), no laboratories presented for a given method more than 50% of false results from the panel of samples and a number of false results greater than 50% of false results obtained for that method. Thus, no other outlier results were identified, and no other datasets were excluded from the analysis.

Fig. 2
figure 1

Relationship between pre- and post-test probabilities of Pseudomonas syringae pv. actinidiae (Psa) infection, according to the results obtained during the inter-laboratory test performance study for each evaluated method and for the combination of both methods M2 and M5. Pre-test probability (prevalence) was defined as the proportion of plants infected by Psa in a particular population at a specific time. Post-test probability was calculated as follows: post-test odds/(1 + post-test odds) where post-test odds = pre-test probability/(1 – pre-test probability) x likelihood ratio. For each method, the solid line represents the post-test probabilities of Psa infection after a positive test result for different prevalence rates. The broken line represents the post-test probabilities of Psa infection after a negative test result for different prevalence rates. For a given method, the closer to the vertical and horizontal axes the solid (and respectively the dotted) curves are, the higher the overall method performance is

Accuracy, diagnostic sensitivity and diagnostic specificity

The performance criteria of the different methods evaluated in the TPS are summarised in Table 4 and in Fig. 1. Detailed results obtained by each laboratory for each sample are available in Table S3. The best overall performance was obtained with methods M5 and M2 with an AC of 93.2% for each method. Using an adjusted Wald test, the AC results for M5 and M2 were not significantly different from the results with methods M4 and M1, but were significantly better than results obtained with methods M3 (significant only for M5, not significant for M2), M12, M6, M8, M9, M7, M10 and M11.

Table 4 Comparison of the performance criteria accuracy (AC), diagnostic sensitivity (DSE) and diagnostic specificity (DSP) obtained during the collaborative study for the different methods
Fig. 1
figure 2

Diagram summarising the performance of the different methods evaluated in the inter-laboratory test performance study.The figure allows an overview of the method performance (for detailed comparison of percentages, see the tables): the more the area of the polygon is important, the more the method performance is important. The figure also allows to identify for a given method, which performance criterion presents defects

Diagnostic sensitivity varied from 68.6% for M12 to 100% for M7, M8 and M9. Diagnostic specificity ranged from 72.7% for M3 to 100% for M7, M8 and M9. Isolation gave the lowest value (68.6%) due to the high number of false negatives (38/121) (Table 4). Using Fisher’s exact test, the DSE results for methods M7, M8, M9 were not significantly different from results obtained with M5, M6, M10 and M2, but were significantly better than results obtained with methods M4, M1, M11, M3 and M12.

Diagnostic specificity ranged from 16.7% for M7 and M11 to 100% for M3. Low values of diagnostic specificity were affected by false positive results: this performance criterion ranged from 16.7 and 20% (M10 and M11) to 50% (M6) and persists low by restriction analysis (16.7% with AluI (M7), 37.5% with BclI (M8) and BfmI (M9)). The values of diagnostic specificity for other molecular test ranged from 87.5% (M1) to 100% (M3) (Table 4).

Using Fisher’s exact test, the DSP results for M3 were not significantly different from the results for methods M2, M12, M4, M5 and M1 but were significantly better than results obtained with methods M6, M7, M8, M9, M10, and M11.

Only method M5 presented no significant variation from the theoretically expected results for all criteria (AC, DSE and DSP). Methods M1, M2, M3, M4 and M12 presented no significant variation from the theoretically expected results for DSP whereas methods M6, M7, M8, M9 and M10 presented no significant variation from the theoretically expected results for DSE. Method M11 presented significant variation from the theoretically expected results for all criteria. It is worth noting that, as DSP was assessed from less samples than DSE, the power of the statistical test (i.e. the probability that the test rejects a false null hypothesis) for DSP is much lower than for DSE, and consequently there were fewer chances to identify differences (if there were differences) in the DSP assessment than the DSE assessment.

Analytical sensitivity

The analytical sensitivity results for the different methods are summarised in Table 5.

Table 5 Results for analytical sensitivity (ASE), repeatability (DA), reproducibility (CO) and concordance odds ratio (COR) obtained during the collaborative study for the different methods

If some results seem to be incoherent with the serial dilution: methods M1 (D5 dilution), method M11 (D5) and method M12 (D4 or D3), no evidence of outliers could be identified thus all data was included in the statistical analysis.

The best analytical sensitivity was obtained with methods M8, M9, M7 and M5 for which the target could be reliably detected up to the D3 dilution. For methods M1, M2, M3, M4, M6 and M10, this level corresponded to the D4 dilution. For methods M11 and M12, this level corresponded to the D7 dilution.

Repeatability, reproducibility and odds ratio

The overall repeatability (DA) of the PCR protocols (Table 5) was above 90%, and the overall reproducibility (CO), varied from 77% (M6) to 93% (M7). The CO was above 90% only for method M3, M5 and M7. For isolation on semi-selective media, DA was 89% and CO was 68%.

While repeatability remained good for all methods (greater than 80%), the results of reproducibility were poor for some methods (68 and 77% for M12 and M6 respectively, 79% for M1).

The concordance odds ratio was not significantly different from 1.00 for all dilutions for methods M3, M5, M7, M8 and M9 (Fisher’s exact test) meaning that no significant differences between laboratories were obtained with these methods. Significant variations between laboratories were identified for methods M2, M4, M6 and M10 only for the lowest dilution. Significant variations between laboratories were identified for the D5 dilution for method M5 and for the dilutions D5 and D3 for methods M1 and the dilutions D5 and D4 for M12.

Method comparison by Bayesian approach

Likelihood ratios are shown in Table 6. The LR+ values from methods M2, M3, M4 and M12 are high, indicating that these methods generate a large change from pre- to post-test probability. The reliability of a positive test result is, therefore higher for these methods than for M1 and M5 (moderate change) and more particularly than for methods M6 to M11 (small change). The LR- of M2, M5, M7, M8 and M9 is very close or equal to zero, indicating that these methods generate a large change from pre- to post-test probability. The reliability of a negative test result is, therefore, much higher for these methods than for methods M1, M4 and M6 (moderate change) and more particularly than for methods M3, M10, M11 and M12 (small change).

Table 6 Comparison of likelihood ratios obtained during the collaborative study for the different methods

Only method M2 combines both a high LR+ and a high LR- (large change from pre- to post-test probability for both positive and negative results). Method M5 combines a high LR- and a moderate LR+ whereas method M4 combines a high LR+ and a moderate LR-.

The post-test probabilities of Psa (i.e. probability of the Psa infection established after a test result) can be graphically displayed (Fig. 2) as a function of the pre-test probabilities (i.e. Psa prevalence) and the likelihood ratio for each evaluated method and also for the combination of the two most reliable methods (methods M5 and M2). Let us examine the case where the population presents a prevalence of 50%. First we can consider the solid curves (i.e. the post-test probabilities of Psa infection after a positive test result): the probability of a tested individual really being infected after a positive result is higher than 90% for methods M2, M3, M4, M5 and M12; it is between 80 and 90% for method M1 and lower than 65% for methods M6, M7, M8, M9, M10 and M11. Then, we can consider the broken curves (i.e. the post-test probabilities of Psa infection after a negative test result): there is 0.0% probability that the plant is infected by Psa when tested with methods M7, M8 and M9. This probability is only 3.9% for method M5. This probability increases to 8.7%, 10.5 and 12.5% for methods M2, M4 and M6 respectively. Oppositely, relatively high probabilities of infection are reported for samples tested negative with M3, M12, M10 and particularly M11 (52.2%).

Part 2: analytical specificity

Analytical specificity was assessed through inclusivity and exclusivity. No false negatives were obtained when several Psa strains were assessed by PCR-based methods (inclusivity of 100% for all biomolecular methods). Data for analytical specificity of real-time PCR was previously reported by Gallelli et al. (2014); it should be noted that M4 (simplex-PCR-C) and M5 (real-time PCR) are not able to detect strain of Psa bv. 1 and 2 because, as reported in Gallelli et al. (2014), these methods are specific for the diagnosis of Psa biovar 3 (the virulent population that caused several bacterial canker outbreak world-wide since 2008) (Table 3).

A high risk of false positives results was observed by testing bacterial cultures of phylogenetically close related Pseudomonas sp. or kiwifruit associated bacteria (Table 3). The highest number of false positive or not conclusive results was observed with the following methods presenting low rates of exclusivity: M1 (2/11 equal to 18%), M3 (3/18; 13%), M6 (8/14; 43%) and M10 (1/12; 8%). The highest exclusivity was confirmed for M5 (100%), M2 (19/20; 95%) followed by M4 (14/18; 78%). The different bacterial species giving false positives for each method are reported in Table 3. Among four atypical bacterial strains isolated from kiwifruit, 3 resulted in false positives and one gave undetermined results when using M3 whereas M4 and M7, M8, M9 produced only one false positive each. All strains of Pfm gave undetermined results using M3, all were false positive using M1, M6 M7, M8, M9 and M10. Two out of four false positive were obtained by M11 and one false positive using M4 (Table 3).

Discussion

In recent years, the high economic impact of bacterial canker on kiwifruit production has prompted the scientific community to study the epidemiology, control, plant-pathogen interaction and diagnostic methods for Psa in order to manage this destructive pathogen. Because no full proof control strategy has been developed for Psa, special attention need to be paid to disease monitoring and to the certification of the sanitary status of the propagation material and other kiwifruit plant material. The availability of reliable and highly sensitive diagnostic methods is therefore of great importance. The study presented in this paper reported the results of 12 international laboratories and aimed to gather comparative data for several diagnostic methods in order to provide an objective value of their performance, and to provide input for improvement of the EPPO diagnostic protocol by including some of those newly validated methods.

Despite several advantages of the PCR-based methods, a potential limitation of these assays is the occurrence of false-positive results. The suitability of such test to assess accurately the phytosanitary status of plant material is measured by diagnostic sensitivity (DSE) and diagnostic specificity (DSP) (Jacobson 1997). The following methods: simplex-PCR, duplex-PCR, simplex-PCR-C and real-time PCR (M1, M2, M4 and M5, respectively) showed acceptable values of DSE and DSP, (88 to 96%), making them suitable as preliminary screening methods.

Conversely, M12 and M3 gave the highest values of DSP (95.5 and 100%, respectively) but a low DSE (68 and 72%). This latter result was predictable for isolation which notoriously has a low analytical sensitivity due to the high number of false negative results (38/121). In case of multiplex-PCR it could be influenced by the low number of participating laboratories (3) that were able to detect the pathogen in only one out of nine samples spiked with 103 CFU ml−1and with 104 CFU ml−1.

Despite high diagnostic sensitivity (82 to 100%), the nested-PCR of Biondi et al. (2013) showed a very low diagnostic specificity (50% for M6; 16·7 to 20% for M7 to M9 and 16.7 to 37.5% M10 and M11) resulted in an increase of the DSP. These results are due to the false positives responses obtained by amplification of contaminants, endophytes or epiphytes associated to the infected kiwifruit woody tissues. The presence of contaminants was observed on semi-selective mNSA (Mohan and Schaad 1987) in which the number of colonies with similar morphology to that of Psa (i.e. levan positive on mNSA medium) was higher than the spiked concentration of Psa. Therefore, additional identification tests on presumptive Psa colonies need to be performed. The high risk to obtain false positive results using the methods described by Biondi et al. (2013) can be explained because it was developed for the testing of bleeding sap samples, although extracts from kiwifruit cuttings artificially contaminated with Psa were tested as well (Biondi et al. 2013). So despite being very sensitive, the lack of specificity makes this method inapplicable as a rapid screening method for assaying kiwifruit woody tissues. Nested-PCR was reported to increase detection sensitivity and reduce the effect of PCR inhibitors (Kuchta et al. 2008; Zimmermann et al. 2004). However, the risk of false positives due to cross-contamination of reaction mixtures in routine analysis is increased by the introduction of a second PCR step and the simultaneous manipulation of the previously amplified products (Roberts et al. 1996). A realistic alternative to avoid the manipulation of the PCR tubes between the first and second round of amplification is the one tube nested-PCR followed by the identification of the amplified fragment by restriction analysis (Llop et al. 2000; Bertolini et al. 2003).

Only the real-time PCR method M5 showed no significant variation with the theoretically expected results for all criteria (AC, DSE and DSP). Methods M1, M2, M3, M4 and M12 showed no significant variation with the theoretically expected results for DSP whereas methods M6, M7, M8, M9 and M10 showed no significant variation with the theoretically expected results for DSE. Conversely, nested-PCR (Biondi et al. 2013), M11, presented significant variation with the theoretically expected results for all criteria considered.

For analytical sensitivity, the best results were obtained for methods M7, M8, M9 and M5 for which the target could be reliably detected up to the D3 dilution (103 CFU mL−1) (no significance with the theoretical detection level of 95%). For methods M1, M2, M3, M4, M6 and M7, this level corresponded to the D4 dilution (104 CFU ml−1). For methods M11 and M12, this level corresponded to the D7 dilution (107 CFU mL−1).

The first outcome of this work was a confirmation that isolation on semi selective media (KBC or mNSA) gave lower performance than the majority of molecular methods. This was not surprising since in the previous inter-laboratory testing it was already noted that direct-PCR analysis on latently infected plant material was superior for the detection of Psa (Loreti et al. 2014). This was also confirmed when analysing the repeatability and reproducibility of the methods. The repeatability and reproducibility were lower for isolation on agar plates (89 and 68%) than for the PCR-based methods. For the PCR-based methods, the repeatability was higher than 90% and the reproducibility was above 90% for M3, M5 and M7, These results highlight that direct isolation requires highly skilled personnel, able to recognise and select putative Psa colonies on agar plates, where the growth of saprophytes might be quite intense and fast.

The significant variations identified for the lowest dilution between laboratories (i.e. D5 for method M5, D5 and D3 for M1 and D5 and D4 for M12) showed such variation is laboratory dependent thus confirming that skills and experience are needed for identification on agar plate.

The comparison of methods according to the Bayesian approach shows that methods M2 (duplex-PCR, Gallelli et al. 2011), M5 (real-time PCR, Gallelli et al. 2014) and M4 (simplex-PCR-C, Gallelli et al. 2014) combine a good reliability in the test results both in case of positive and negative responses.

The Bayesian approach provide an overview of method performance, supplementing the traditional statistical approach, helping to choose the most appropriate detection scheme (i.e. combination of methods) according to the epidemiological context (Chabirand et al. 2017). The more data are available and balanced per method (large number of participants and if possible the same number of participants) and the more precise and reliable the performance assessment is. But so far, these recommendations can be difficult to combine with practical constraints and compromises are often implemented which can generate limitations in generalizing the results (e.g. only 3 laboratories implemented methods M3 and M11).

For Psa detection, the disease prevalence is usually low. In this context, routine analyses should be performed using one of the best PCR-based methods (M5, M2 and M4). In a context of certification of healthy material (involving an accurate determination of the Psa-free status), the use of two detection methods (e.g. methods M2 and M5) should be favoured. Indeed, the accuracy of a negative (resp. a positive) result is higher when both detection tests are used instead of only one test. For instance, the post-test probability of infection is lower than 1% if a negative result is obtained with methods M2 and M5 from a plant sampled in a population presenting up to 72% prevalence of infection (vs. 19% if method M5 is used alone). The risk of releasing infected material is minimised when the two test results are negative, which is essential for the certification of Actinidia spp. plants to ensure that the plant material will not present a risk to introduce or spread Psa. Similarly, the confirmation of a positive result by using two detection tests can be relevant, when presence of Psa might lead to an official decision to uproot and destroy material suspected being infected by Psa. Thus, we can see that the post-test probability of infection is higher than 90% with a positive result obtained both with method M2 and method M5 from a plant sampled in a population with at least 5% prevalence (vs. at least 32% prevalence if method M2 is used alone). The use of the methods proposed by Biondi et al. (2013) (in particular M7, M8 and M9) can be reliable in case of a negative result (0.0% probability that the plant is infected by Psa) but it must be necessarily used in combination with another method in particular in case of a low disease prevalence (10–25%), because the probability of an individual being really infected after a positive results is lower than 15–35%, so the risk of false positive is really high.

The high specificity of duplex-PCR (M2) (Gallelli et al. 2011) is ensured by the contemporary amplification of two targets, which increases the specificity of the analysis. The false positive results obtained by the simplex-PCR of Rees-George et al. (2010) with strains of Pfm, makes this latter method (M1), less reliable than simplex-PCR-C (M5). The occurrence of false positive or indeterminate results with Pfm (but also with atypical strain from kiwifruit) is also a crucial aspect for methods based on Biondi et al. (2013) (M6-M11) and on multiplex-PCR (Balestra et al. 2013) (M3). This aspect should to be taken into consideration for the identification of putative Psa colonies and for the preliminary screening of infected plant material. Pfm is a pathogen of kiwifruit which induces symptoms on leaves similar to those induced by Psa but it does not cause canker; it has a low economic impact, and it is not regulated. By using method M1, plants could be unnecessarily destroyed. Real-time PCR (M5) could be an alternative method that offers the advantages of high sensitivity, specificity and rapidity, since in contrast with conventional PCR, it does not require to run a gel electrophoresis. As previously mentioned, this method is specific for the detection of Psa biovar 3, considered to cause more serious disease based on its aggressiveness and rapid spread (Scortichini et al. 2012; Young 2012). It can be stressed that this method can be used for routine analysis as an alternative assay useful for a first screening to exclude the presence of this dangerous population or as identification test to confirm the identity of suspected colonies. Moreover, it’s use can be suggested in combination with another test in case of diagnosis of critical or symptomless samples.

Finally, the experience reported in this paper provides new information for the revision and implementation of the official diagnostic protocols (i.e. EPPO protocol 7/120 (European Plant Protection Organization 2014b).