Introduction

Targeted therapy against human epidermal growth factor receptor 2 (HER2)-overexpressing tumours provides a major breakthrough in cancer therapy [1]. The identification of cancer patients who are suited for anti-HER2 therapy depends on the analysis of cancer tissue by immunohistochemistry (IH) or in situ hybridisation (ISH), which usually is performed by pathology departments. Central retesting within the framework of therapy trials has revealed considerable interlaboratory variation [24]. Testing inaccuracy was identified as a major issue with either assay, IH and ISH [5]. Proficiency testing by round robin tests was launched in several countries as a potential remedy [610]. Although useful and indispensable, proficiency testing surveys render only an incomplete and ephemeral assessment of testing performance and do not necessarily reflect the lasting reliability. Furthermore, they rely on artificial systems such as tissue microarrays or cell lines [9, 10]. Usually, they do not cover the whole process and omit decisive steps such as tissue fixation and processing [11].

From regular proficiency tests, it became obvious that inaccurate results were not haphazardly distributed but followed a systematic trait [7, 8]. Participating pathologists who were unsuccessful in most instances failed either because of systematic false-positive or false-negative staining [8]. In a central review of 1,459 cases from Germany in an international therapy trial and tested locally as HER2 3+, only 1,167 could be confirmed by central testing (80%) (results not published). The 1,459 cases were derived from 116 different centres, from which a small number (6%) were responsible for 23% of discrepant cases with an average discordance rate of 50% (unpublished data). These observations led us to the conclusion that surveillance of positivity rates in HER2 testing may help identify laboratories with insufficient testing assays and a high yield of false-positive or negative results. Consequently, pathologists were offered the opportunity to compare their positivity rates with those of others in Germany, Austria and Switzerland. Because German guidelines require that every case of invasive breast cancer is tested for HER2, there are more than 40,000 HER2 tests of breast cancer in Germany per year [12]. From published results, it is difficult to calculate the proportion of positive cases to be expected when optimal testing circumstances are present. Initial studies suggested overexpression in as many as 30% of cases [13]. Larger series recently revealed lower positivity rates with either ISH or IH ranging from 18% to 22.7% [14, 15]. Therefore, a second aim of the study was to obtain an estimate of the positivity rate which has to be expected among a population of breast and gastric cancers in central Europe.

Material and methods

In 2010, all pathology departments in Germany, Austria and Switzerland were offered the opportunity by the German Society of Pathology and the Association of German Pathologists to enter their positivity rates for HER2 testing on breast and gastric cancer into a central web page. Institutes willing to participate received an access code to guarantee confidentiality. The individual figures could be entered on a weekly or monthly basis. The figures which were entered comprised the number of cases being HER2 0; HER2 1+; HER2 2+; HER2 2+; and amplified in ISH, HER2 3+. For those laboratories which only perform ISH, the numbers for cases not amplified or amplified or falling into the equivocal grey zone [5] were entered.

The average positivity rates of all other institutes corrected for the number of cases entered into the survey were compared to the individual result. A system of traffic lights was indicated to institutes whether they lay outside the 95% confidence interval (yellow) or the 99.5% confidence interval (red).

Statistics

Differences between institutions were analysed by using the χ 2 test. Before determining the rate of HER2-positive cases, the data were checked for outliers. The data from every institution were compared to the pooled data from the other institutions, and the data from those institutions differing highly significantly from the other institutions (p < 0.0005) were excluded in a stepwise manner. The exclusion procedure was stopped when none of the remaining institutions differed highly significantly (p < 0.0005) from the pooled data of the other laboratories that had not been excluded. Institutions excluded by this procedure were regarded as outliers and therefore not taken into consideration when determining the rate of HER2-positive cases.

On the basis of the rate of HER2-positive cases determined by the results from the laboratories not excluded as outliers, 95% and 99.5% confidence intervals were calculated for the number of HER2-positive cases applying the binomial distribution for n ≤ 500 and approximating the binomial distribution by the normal distribution for n > 500.

Results

Within 1 year, 42 institutes of pathology (9 in academic institutions, 17 in community hospitals and 16 in private praxis) entered the results of their HER2 testing in breast cancer into the system. Test results on 18,081 breast cancers were communicated. The average number of cases per institute was 430.5 ranging from 4 to 2,733 cases. Seven institutes entered results of fewer than 50 cases of breast cancer. With regard to gastric cancer, 3 institutions communicated more than 50 assessments during the period under study. Positivity rates for HER2 in breast cancer ranged from 7.6% to 31.6%. The average positivity rate of all 42 institutes corrected for the number of cases was 14.61 ± 4.55%. In order to exclude regional differences, the data were screened for potential association with postal codes, which turned out not to be the case (data not shown). Statistically, the results from six institutions were considered to be outliers (p < 0.000005). Therefore, the results of these institutes were not included when the expected rate of HER2-positive cases per institute and the number of assays were determined. Of the remaining 10,916 assessments, the mean proportion of positive cases was 16.7% (99% confidence interval 16.6–16.8). Six institutions were outside of the 99.5% confidence interval (Fig. 1). The number of HER2 assessments performed by the institutes outside the 99.5% confidence interval ranged from 189 to 3,287 cases. There were two institutes outside the 99.5% confidence interval which had entered more than 2,500 cases. Two institutes assessed HER2 exclusively by in situ hybridisation and did not rely on immunohistochemistry. One of these institutions had performed 491 assessments and proved to be outside the 99.5% confidence interval with 8.35% unequivocally amplified cases.

Fig. 1
figure 1

The number of HER2-positive breast cancer cases (HER2 3+, HER2 2+/amplified, amplified) per institute of pathology in relation to the number of cases investigated was plotted on a logarithmic scale. The 99.5% confidence interval is indicated by red lines. In institutions with a low number of assessments, the confidence interval is broader. The expected rate calculated from the mean value of 36 institutions within the 99.5% confidence interval is demonstrated by a blue line. There are six institutions outside the 99.5% confidence interval (indicated by red crosses). Four of these potentially underestimate HER2 and two have a higher positivity rate than could have been expected. Institutes with a positivity rate within the 99.5% confidence interval are represented by white circles

Of the remaining 36 participating institutes, 6 institutions were in between the 95% and 99.5% confidence interval (p < 0.0005) (Fig. 1). All of these institutions had communicated between 153 and 567 HER2 assays in breast cancer cases. No correlation to the type of institute (academic, community hospital or private praxis) could be observed.

The proportion of cases tested immunohistochemically as HER2 2+ ranged from 0% to 60.1% of all assessments (mean 16.5 ± 15.5%). With regard to the 36 reference institutes within the 99.5% confidence interval, the mean percentage of HER2 2+ cases was 18.7 ± 14.0% (Table 1). Of the HER2 cases which were further analysed by in situ hybridisation, 17.9 ± 17.0% were amplified (range 0.0–75.0%) (Table 1). Two of the six institutes outside the 99.5% confidence interval rendered a HER2 2+ assessment on 2.8% and 7.6% of cases, respectively. Institutes, which had lower numbers of HER2-positive cases, also revealed a low percentage of 2+ assessments. There was a highly significant correlation between low HER2 positivity rates and low proportion of cases within the 2+ category (p < 0.000005).

Table 1 HER2 positivity rates in breast and gastric cancer

With regard to gastric cancer, 15 institutes of pathology took part and entered 982 results of their assays. The average positivity rate was 24.11 ± 7.35%. After correction for one outlier, the mean positivity rate was 23.2 ± 5.7% (Table 1). Because the number of cases per institute was rather small, there was a broad range of positivity rates which fell into the 99.5% confidence interval (Fig. 2). Of the 15 participating institutes, only one institute was outside the 99.5% confidence interval and none further outside the 95% interval (Fig. 2). The percentage of cases tested as HER2 2+ was 28.7 ± 12.7% (range 0.0–71.4%). Of these, 30.5 ± 12.1% were amplified by in situ hybridisation (range 0–52.2%) (Table 1).

Fig. 2
figure 2

The number of HER2-positive gastric cancer cases (HER2 3+, HER2 2+/amplified) per institute of pathology in relation to the number of cases investigated was plotted on a logarithmic scale. The 99.5% confidence interval is indicated by red lines. In institutions with a low number of assessments, the confidence interval is broader. The expected rate calculated from the mean value of 14 institutions within the 99.5% confidence interval is demonstrated by a blue line. There is one institution outside the 99.5% confidence interval (indicated by a black cross). Institutes with a positivity rate within the 99.5% confidence interval are represented by black points. The confidence interval might narrow over time when more assessments are available for consideration

Discussion

HER2 testing provides the prototype of a new field in pathology, which has been termed predictive pathology. The results of clinical trials demonstrated a significant benefit of HER2-targeted therapy for early and late stages of breast cancer [1, 16] as well as recently also for gastric cancer [17]. Interlaboratory variation in HER2 testing became obvious from trials with central re-testing [3, 4]. Although regular participation in proficiency testing significantly improved the performance of individual institutes [8], there are doubts that the current quality assurance methods are sufficient to reduce testing variation. In order to improve the reliability of testing, several efforts have been undertaken. Guideline recommendations have been published which set standards for thresholds between positive and negative HER2 test results and define algorithms [5, 18]. Furthermore, regular and predominantly tissue microarray-based proficiency tests are organised in Europe and USA [6, 9, 10].

Proficiency tests take place once or twice a year and do not reflect the permanent accuracy of HER2 assessment in routine practice. An auxiliary instrument to compensate for this limitation and to assure quality of HER2 testing is presented here. By monitoring positivity rates in HER2 testing, institutes of pathology were identified, which lay outside the 99.5% confidence interval of expected results. The exact frequency of HER2-overexpressing or amplified cancers was not known and had to be determined in order to define a reference value. The positivity rates reported in the literature range from 18% to 30% [1315]. On the basis of 10,916 assessments in 36 institutes of pathology, a mean positivity rate of 16.7% was determined (Table 1). Because HER2 testing is performed on every breast cancer in Germany, there is no selection bias in this study as might be the case in therapy trials. Six institutes were outside of the 99.5% confidence interval (Fig. 1). These outliers were informed that a systematic error in the methodology of HER2 assessment in their laboratory might cause over- or underestimation of HER2 in cancer. Of the six institutes which were outside the 99.5% confidence interval, five had participated in round robin tests on HER2 assessment offered in Germany. Three of the institutes with low positivity rates had received the information that the sensitivity of their detection method might be too low in at least one of the annual quality assurance trials. Interestingly, a high frequency of assessments did not protect from potential systematic errors. Two institutes which revealed positivity rates outside the 99.5% confidence interval had entered more than 2,500 cases (Fig. 1). It remains to be determined by further studies whether the traffic light system is efficient in improving HER2 assessments in underperforming institutes.

Diversity of positivity rates was highest when the HER2 2+ category was considered (Table 1). This finding indicates that the HER2 2+ category might be limited by subjectivity and poor reproducibility [19] (Fig. 3). In a recent meta-analysis on 17 studies encompassing 8,410 patients, the mean proportion of the HER2 2+ category was 23.2% with a broad range from 2.0% to 87.5% [20]. Only a slight enrichment for amplified cases was found (26.5% vs. 21.1%) [20]. When compared with ISH results in this study, there was no significant enrichment of amplified cases in the HER2 2+ group (Table 1).

Institutes which rely completely on ISH instead of IH to assess HER2 positivity were too few to allow for comparison (Fig. 3). Whether ISH or IH is more reliable and reproducible is a matter of debate [19]. In this study, one of the two institutes which exclusively performed ISH was outside the 99.5% confidence interval.

Fig. 3
figure 3

A high degree of variability between institutes was observed with regard to the HER2 2+ category. In particular, the differentiation between HER2 2+ and HER2 1+ might be handled differently. This case illustrates the borderline between the 2+ category (intraductal carcinoma, double arrow) and the 1+ category (invasive carcinoma, single arrow) (immunohistochemistry with 4B5 anti-HER2 monoclonal antibody, ×200)

In order to keep the entering of data into the HER2 monitor as simple as possible and not to reduce the compliance of participants, no detailed information on methods or composition of cases was requested from the participants. It cannot be excluded that an abnormal proportion of low-grade cancers or other specific conditions may be responsible for an aberrant positivity rate. Therefore, a positivity rate outside the 99.5% confidence interval does not necessarily imply that the HER2 assessment method in use is inadequate. Such a finding should, however, urge pathologists to consider this possibility. The primary aim of monitoring HER2 positivity rates is to alert institutes of pathology to potential systematic errors which require further measures to assure quality of testing. As a consequence of abnormal positivity rates, tests in use could be validated or participation in proficiency tests could take place with higher frequency. Only if methodological problems have been excluded should secondary influences such as abnormal composition of the set of samples in which HER2 has been assessed be taken into consideration. Two institutes with a high number of tests and a low positivity rate outside the 99.5% confidence interval also documented extremely low HER2 2+ rates. Unlike the 3+ category, the 2+ category is not related to histological grade. Therefore, it appears highly unlikely that in these two institutes, which together performed more than 6,000 HER2 assessments in breast cancer, a selection bias towards grade 1 and 2 cases might be responsible for the low total positivity rate.

Most therapy trials on targeted HER2 therapy require central retesting of samples which were locally assessed as HER2 positive. As a consequence, central retesting in trials alerts pathologists to false-positive but not to false-negative assessments. This inherent tendency might explain why there are twice as many institutes which potentially underestimate HER2 positivity than institutes with potential overestimation (n = 4; 99.5% confidence interval). Thus, without eliminating outliers, the mean rate of HER2-positive cases was lower (14.61 ± 4.55% in 18,221 breast cancers). An almost identical rate was found by questionnaires on 4,940 breast cancer samples in Sweden [21] and slightly higher in Australia (17.1%, 6,512 cases) [22]. In contrast to the HER2 monitor presented here, in both studies, outliers had not been eliminated from the calculation of the expected positivity rate.

There are several measures which institutes of pathology can take to assure quality of HER2 testing. Besides on-slide controls [23], participation in proficiency tests and adherence to guidelines [510, 18, 24], a further instrument is proposed here. Monitoring of positivity rates and comparison with an expected value will help identify potential errors in HER2 assessment, which lead to systematic over- or underestimation of HER2 in cancer.