Introduction

The use of multigene diagnostic testing to guide cancer treatment has increased dramatically, since the clinical introduction of the first US Food and Drug Administration (FDA)-cleared multigene diagnostic signature MammaPrint [1]. These types of genomic tests interrogate gene expression levels in tissues to give information about the disease state and/or prognosis, and have led to new diagnostic approaches that are recognized by international clinical guidelines [2, 3]. MammaPrint determines the expression of 70 signature genes and helps guide physicians to make adjuvant treatment decisions for early-stage breast cancer patients [4, 5]. The test is suitable for both fresh or fresh frozen (fresh) tissues and formalin-fixed paraffin-embedded (FFPE) tissues [4, 6, 7]. MammaPrint is part of Agendia’s In Vitro Diagnostic (IVD) Test Suite that also includes BluePrint, which reveals breast cancer subtype [8].

MammaPrint was developed and validated based on fresh frozen tissues and cleared by the FDA [1, 4, 9, 10]. However, for diagnostic purposes, shipment of frozen tissue is not always easily implemented in community hospitals. Therefore, RNAretain (for room temperature sample stabilization during shipment) was validated, FDA-cleared and implemented [11, 12] and subsequently, MammaPrint was validated and cleared for use of FFPE tissues [6, 13]. To date, MammaPrint is one of the two (MammaPrint and Prosigna [14]) FDA-cleared [15] clinically implemented breast cancer multigene expression tests with 6 clearances, and obtained the EU IVD CE mark, for both fresh and FFPE tissues [1, 11, 13, 1618]. Claims on current diagnostic clinical utility include its well-established prognostic value for metastasis risk assessment for early-stage breast cancer patients [5, 9, 10, 1922].

Since 2004, MammaPrint has been used for over 90,000 patients to determine the patient’s distant recurrence risk. The activity of the 70 signature genes is compiled in an index that is used to establish the qualitative MammaPrint results: Low Risk or High Risk for distant recurrence. Designated clinically low risk patients with positive hormonal receptor status could well be treated by hormonal therapy in the adjuvant setting [3], while clinically high risk patients are more suited to receive a combination of chemotherapy and hormonal therapy in an (neo-) adjuvant setting [3]. Clinical studies have shown MammaPrint’s clinical prognostic value in lymph-node negative [10, 22] and lymph-node positive [19] breast cancer, for women diagnosed at all ages [21], and irrespective of pathological grade [9], estrogen receptor status or HER2 expression [20]. The RASTER study [12] provided the first prospective evidence of clinical utility for MammaPrint. It showed a 98.9 % 5-year metastasis-free survival for patients that were clinically high risk, based on standard clinical parameters, but who chose not to receive adjuvant chemotherapy based on their MammaPrint Low Risk status [5]. These study results indicated that these patients could safely forgo chemotherapy [23]. Confirmation of MammaPrint’s prognostic value is expected in 2016, when results are projected for the large European Organization for Treatment and Cancer (EORTC) prospective randomized clinical MINDACT trial (EORTC 10041; BIG 3-04) [2426].

MammaPrint results can be obtained from both customized mini-arrays and from customized whole-genome arrays. Both array types contain multiple replicates of the 70 signature probes complemented with an identical large set of replicated normalization and control probes. Diagnostic MammaPrint results are generally obtained using customized mini-arrays (also referred to as 8-pack arrays), accommodating 8 individual samples per array [4]. In large clinical trials like MINDACT [26] and I-SPY 2 [27], MammaPrint expression analysis was performed on whole-genome arrays containing over 32,000 unique probes per array. This array type is currently used in the I-SPY 2 trial with a MammaPrint FDA Investigational Device Exempt (IDE) status. For MammaPrint diagnostic use in Europe, both the customized mini-array and the customized whole-genome array hold an IVD EU directive CE mark. The use of whole-genome arrays allows providing MammaPrint results to clinical trial participants alongside enabling full-genome analysis for the early development of new gene expression signatures. For the MINDACT trial, this resulted in an unprecedented dataset representing almost 6700 early-stage breast cancer patients.

The aim of the current study is to evaluate the equivalence and robustness of the MammaPrint test in different settings. Microarray types [28, 29] as well as tissue preservation techniques [30] have been known to potentially influence array expression profiles in general. We, therefore, evaluated the agreement of MammaPrint indices between the customized mini-arrays and whole-genome arrays for 1897 array sample pairs and between fresh and FFPE tissues in 552 tissue sample pairs, and assessed reproducibility of >11,000 control samples over the last 10 years as well as the precision and repeatability of the test.

Materials and methods

Samples

Patient samples

MammaPrint indices as reported were collected for all patient tumor samples processed for which the sample was hybridized to both a customized mini-array and a customized whole-genome array (n = 1897 sample pairs, totaling 3,794 sample hybridizations, 2005–2015), or for which samples with both fresh (fresh frozen or RNAretain) and FFPE tissues were hybridized to individual arrays (n = 552 totaling 1104 sample hybridizations (2011–2015)). Data for the fresh versus FFPE comparison include the RASTER series [12] and a data series described previously by Sapino et al. [6]. Clinicopathological data and clinical outcome data were available for those 345 of the RASTER patients with matched samples of fresh and FFPE tissues [5, 12]. Additionally, for assessing the analytical performance of MammaPrint, indices were collected for fresh and FFPE samples representing MammaPrint High Risk and Low Risk samples. A total of 7 samples were measured for 20 days in duplicate, totaling to 280 measurements.

For this study, only data and not samples were collected. All data and analyses used or performed for this study comply with the current ethical laws of the Netherlands. All patient sample data were anonymized in accordance with national ethical guidelines (“Code for Proper Secondary Use of Human Tissues,” Dutch Federation of Medical Scientific Societies), study samples belonging to the used data had Institutional Review Board approvals. For this type of study, formal consent is not required.

Control samples

High Risk and Low Risk MammaPrint control samples are standardly used as technical and experimental controls within each batch run of samples. Data from MammaPrint control samples, each composed of a pool of clinically representative breast cancer tumors, are continuously monitored in the clinical diagnostic setting, and were available for stability assessment of MammaPrint test indices in this study [4, 6, 31]. All control samples were processed at one of Agendia’s core laboratory facilities in Amsterdam (The Netherlands) or Huntington Beach/Irvine (California, US) from August 2005 until March 2015, totaling n = 11,333 data points for analyses.

Statistical analysis

Analyses and visualization of data were performed in R (version 3.1.1) and RStudio (version 0.98.994). All samples had passed standard diagnostic quality control criteria [4, 6, 31].

Equivalence of MammaPrint indices was determined by the Pearson correlation for assessment of the degree of linear correlation, and a Passing-Bablok regression analysis to obtain the regression equation. Bland–Altman plots were used to visually examine the existence of any constant bias in the difference of measurements between paired samples. The Bland–Altman analysis allows visual identification of any proportional error (if data points show an angled linear association), systemic error (if the horizontal mean difference line of the data points is shifted parallel away from zero), or dependence of a method on the magnitude of measurements (if data points show an association that widens or narrows from left to right).

Clinicopathological data and clinical outcome data of the 345 patient samples were analyzed using the statistical package SPSS 22.0 for Windows (SPSS Inc, Chicago, US). Survival analyses were performed to compare clinical performance of fresh and FFPE tissues. Kaplan–Meier analyses were used to compare the survival distributions of MammaPrint for fresh and FFPE for distant metastasis as first event (DMF) and distant recurrence-free interval (DRFI). DMF was defined as the time from surgery until the diagnosis of a distant metastasis as first event. For DMF, patients who present with a local recurrence, regional recurrence, or second primary tumor before the diagnosis of a distant metastasis were censored at such event, at death or at the end of follow-up. DRFI was defined as the time from surgery until the diagnosis of a first distant metastasis or breast cancer-related death.

Reproducibility of the MammaPrint test was tested by evaluating diagnostic controls over a period of 10 years to assess nearly all sources of variation in n = 11,333 samples. Reproducibility was measured in terms of the relative stability, calculated as 100 minus the relative standard deviation, where the standard deviation was measured as a percentile of the total MammaPrint range [31]. To enable assessment of MammaPrint stability of repeated control sample measurements over time and independent of control sample, MammaPrint indices were centered on their individual sample mean per RNA reference type.

The analytical performance of the current MammaPrint fresh and FFPPE versions was assessed in a precision evaluation experiment according to the EP2-A5 documentation [32], as described by Delahaye et al. [31]. This analysis included precision and repeatability assessment of MammaPrint High Risk and Low Risk samples. Precision was determined by calculating the relative precision for repeated measurements of High Risk and Low Risk breast cancer samples over a period of 20 days. Repeatability was determined by calculating the relative stability between duplicate runs of the repeated measurements performed each day. Both the relative stability and the relative precision are calculated as 100 minus the relative standard deviation.

Results

Equivalence of MammaPrint indices for mini-arrays versus whole-genome arrays

In large clinical trials, whole-genome gene expression arrays are used at times to enable discovery of expression signatures associated with a clinical endpoint of interest, alongside certified clinical diagnostic tests such as MammaPrint. To demonstrate the equivalence of FDA-cleared, EU IVD-certified MammaPrint between customized mini-array and customized whole-genome array types, we assessed the agreement between MammaPrint index values of breast cancer tissues hybridized to both array types (Fig. 1). Equivalence was measured using a set of 1897 samples with matching mini- and whole-genome arrays, spanning the MammaPrint index range generally seen in diagnostics. MammaPrint indices generated using the customized whole-genome array showed an almost perfect correlation with the matching indices for the diagnostic arrays (r = 0.99, 95 % CI 0.989–0.991). Figure 1 shows the scatterplot of the indices showing a tight clustering around the 45° line of perfect correlation. This was confirmed by the Passing and Bablok regression analysis (y = 0.002 + 1.00x) and a Bland and Altman analysis showing no bias in the association of MammaPrint indices between mini-array and whole-genome array types (Online Resource 1).

Fig. 1
figure 1

Equivalence of MammaPrint between customized mini- and whole-genome array types. Scatterplot showing equivalence of MammaPrint indices between customized mini-microarray used in diagnostics (x-axis) and customized whole-genome array (y-axis) hybridizations. Each dot represents a single female breast cancer sample for which labeled RNA was hybridized to both array types (n = 1897 sample pairs)

In conclusion, comparison of 1897 mini-array versus whole-genome array sample pairs showed equivalence and robustness of MammaPrint results between the different arrays types both containing the 70 diagnostic MammaPrint probes.

Equivalence of MammaPrint indices from fresh versus formalin-fixed paraffin-embedded tissues

The MammaPrint test, originally developed on fresh tissues, was translated to FFPE to facilitate diagnostics and subsequently cleared by the FDA in 2015 [6, 13]. Tissue preservation techniques have been known to generally influence array expression profiles [30]. Here, we extend our comparison study to examine the compatibility between MammaPrint indices results on fresh versus FFPE samples in a dataset of 552 sample pairs (Fig. 2). The Pearson correlation of the MammaPrint indices calculated for the fresh and FFPE tissues was 0.93 (95 % CI 0.92–0.94). This is an excellent correlation given the data are naturally influenced by the heterogeneity within the tumor tissue and potential difference in tissue preservation techniques. This was confirmed by the Passing and Bablok regression analysis (y = −0.076 + 1.05x) and a Bland and Altman analysis showing no relevant bias in the association of MammaPrint indices between fresh and FFPE tissue types (Online Resource 1). These results confirm the very high correlation between fresh and FFPE, which is in line with the MammaPrint FFPE FDA clearance [15].

Fig. 2
figure 2

Equivalence of MammaPrint between matched fresh and FFPE tissue samples. Equivalence was assessed on tumor samples for which both the FFPE and fresh tumor samples were analyzed. The data are naturally influenced by intrinsic tissue heterogeneity, over 5 % difference in MammaPrint results can be attributed to heterogeneity within the tumor [6, 31]. Each dot represents a single female breast cancer sample for which RNA derived from fresh and FFPE tissues was hybridized twice to either a customized mini or customized whole-genome microarray (n = 552 hybridization pairs)

In summary, comparison of 552 fresh versus FFPE sample pairs showed very high comparability of MammaPrint indices between the tissue preservation techniques.

Impact of tissue preservation technique on MammaPrint patient survival prediction

Highly correlated continuous values may result in a lower concordance when these continuous variables are presented as binary categories. In particular, samples with a result close to a classification threshold that have a very small difference between two measurements may still result in a switch in binary outcome. Therefore, samples in diagnostics with an index close to the MammaPrint classification threshold (area predefined) are tested multiple times to increase measurement precision. We assessed and compared the clinical performance of MammaPrint in a retrospective setting to investigate the overall clinical effect of samples that have MammaPrint results generated from both fresh and FFPE tissues.

Survival analysis for DMF and DRFI was performed for MammaPrint High Risk and Low Risk in a dataset of 345 early-stage breast cancer patients for which matching fresh and FFPE tissues were available (Fig. 3). The Kaplan–Meier curves (Fig. 3) for DMF and DRFI are similar between fresh and FFPE results. Moreover, 5-year survival rates for both the Low Risk and High Risk patients groups were comparable between fresh and FFPE as shown in Table 1.

Fig. 3
figure 3

Equivalence in Survival data for DMF and DRFI in 345 patients with matched fresh and FFPE samples. Kaplan–Meier curves were plotted for DRFI and DMF to assess the clinical equivalence of MammaPrint Low Risk (green lines) and High Risk (red lines) between matched fresh and FFPE in a series of 345 early-stage breast cancer patients. The majority of the patients that were classified as High Risk were treated in an adjuvant setting. The majority of patients classified as Low Risk were untreated. Log Rank (Mantel-Cox) p values are indicated per analysis

Table 1 Equivalence in Survival percentages for DMF and DRFI in 345 patients with matched fresh and FFPE samples

Summarizing, we showed clinical equivalence for MammaPrint results from fresh and FFPE tissues.

MammaPrint stability over time

To assess the stability of the MammaPrint test over time, we investigated the reproducibility of MammaPrint results in a large pool of control samples spanning a 10-year period, and investigated the precision and repeatability of MammaPrint High Risk and Low Risk samples.

Control samples are used standardly as technical and experimental controls within each batch run of samples to monitor the quality. A minimum of two control samples are processed with each batch run, covering both binary results of the MammaPrint test. Stability of MammaPrint indices was assessed over a 10-year time period using the control samples of Agendia’s MammaPrint quality control monitoring system (n = 11,333). Three different fresh Low Risk controls, five different fresh High Risk controls, one FFPE Low Risk control and two different FFPE High Risk controls were used during this period of 10 years. Once a control sample is near depletion, a new control sample is created to replace it. Subsequently, both controls are used simultaneously for at least 20 measurements to assure the quality of the new control sample. Because each control sample has a different expected MammaPrint index, indices were centered on their individual sample mean to enable stability assessment over time independent of control sample. A data series of Low Risk and High Risk control sample measurements for both fresh and FFPE specimen types was available for stability assessment: n = 2494 fresh Low Risk, n = 4072 fresh High Risk controls (Fig. 4a), and n = 1639 FFPE Low Risk, n = 3128 FFPE High Risk controls (Fig. 4b). MammaPrint indices of these control samples were plotted over time (10 years) in a run-sequence plot (Fig. 4), demonstrating high stability and reproducibility as the same result is generated regardless of scanner used, operator, lot, or day. The mean difference of the MammaPrint indices from the sample mean was 0.0295 (95 % CI 0.0289–0.0301) and 0.0477 (95 % CI 0.0466–0.0488) for fresh and FFPE controls, respectively. The standard deviation was 0.0415 for fresh and 0.0619 for FFPE. Reproducibility is given as 100 minus the relative standard deviation, 97.9 % for fresh and 96.9 % FFPE, resulting in overall reproducibility of 97.4 %, which is very close to the original reported value [6, 31].

Fig. 4
figure 4

Stability of MammaPrint indices of diagnostic control samples over time. Stability of MammaPrint indices of MammaPrint High Risk and MammaPrint Low Risk control samples measured over a period of 10 years (2005–2015) for fresh (a) and 5 years (2011–2015) for FFPE (b) separately. To enable comparison of the different control samples for Low Risk and High Risk, MammaPrint indices were centered on their individual sample means (see “Methods” section)

While the reproducibility of the MammaPrint test was assessed in control samples consisting of pooled RNA of breast cancer samples, the analytical performance of the MammaPrint test was determined using (non-pooled) individual samples. The analytical performance of MammaPrint fresh and FFPE in this study was assessed by determining the precision and repeatability in samples representing MammaPrint High Risk and Low Risk results that were measured twice per day for 20 days, totaling 280 measurements. The overall precision was 98.2 % (99.0 % for fresh and 97.3 % for FFPE), while repeatability overall was 98.3 % (99.0 % for fresh and 97.6 % for FFPE).

In summary, data on diagnostic control samples of Agendia’s quality control monitoring system, covering 10 years, and repeated measurements of samples show that the MammaPrint test for both fresh and FFPE is constant over time.

Discussion

The study results show that MammaPrint is a robust and stable test, regardless of whether the used array type is a customized mini-array or customized whole-genome array, or the specimen type is fresh or FFPE. The excellent reproducibility of MammaPrint indices of control samples over the last 10 years and the excellent precision and repeatability of MammaPrint High Risk and Low Risk samples is further testament to the quality and robustness of the MammaPrint test.

A major strength of our current study lies in the large sample size. Almost 2000 sample pairs demonstrated a near perfect agreement in MammaPrint index values between mini and whole-genome array types. Additionally, 552 sample pairs (with clinical data available for 345 sample pairs) contributed to the comparison between fresh and FFPE MammaPrint results, with the results indicating excellent agreement. Assessment of the >11,000 control samples exemplifies the high stability of the MammaPrint results over time. Finally, robustness was shown in the 280 measurements part of the precision and repeatability series.

Other IVD multigene assays are currently available for early breast cancer patients, including the Prosigna Breast Cancer Prognostic Gene Signature Assay (ProSigna, Seattle, WA, USA [14]), OncotypeDx Breast Cancer Assay (Genomic Health, Redwood City, CA, USA [33]), Breast Cancer Index (BioTheranostics, San Diego, CA, USA [34]), and EndoPredict test (Sividon Diagnostics GmbH, Koln, Germany [35]). These assays are dedicated to FFPE breast cancer tissues and based on either microarray or (RT)-(q)PCR technology. IVD tests, like MammaPrint, need to be accurate, reliable, and clinically meaningful. Regulation under Clinical Laboratory Improvement Amendments (CLIA) is the first crucial step for quality control related to laboratories and personnel. To ensure full patient safety and quality control of IVD tests, including the group of laboratory developed test (LDTs), adhering to FDA regulation is essential [36]. For diagnostic tests performed in the EU the equivalent regulatory oversight is the EU IVD CE mark and laboratory ISO-certification, requirements for which MammaPrint obtained regulatory certification as well. MammaPrint was the first of only two (MammaPrint and Prosigna Breast Cancer Prognostic Gene Signature Assay) of the above described multigene signatures to obtain IVD CE mark and FDA clearance, for which it was required to demonstrate both analytical validity and clinical validity [36]. Moreover, MammaPrint is the first multigene classifier to publicly demonstrate such robust stability over time. Of note is that MammaPrint is independent of clinical factors, whereas the Prosigna gene signature assay is a combination of gene expression assessment (PAM50) and clinical factors (tumor size and proliferation score). MammaPrint diagnostic testing, including array types, is performed per FDA IVD 510 K clearance, or FDA Investigational Device Exempt status (IDE) for tests in the US, and EU IVD CE mark, whichever is required based on the quality regulatory oversight in respective countries.

Large clinical studies featuring IVDs based on microarray technology often use whole-genome arrays to enable whole-genome analyses in relation to clinical endpoints of interest alongside the IVD results. This is a major advantage of the utility of microarrays for IVDs over RT-PCR. When diagnostics is generally performed on dedicated microarrays, it is of critical importance to demonstrate equivalence between the different array types. For MammaPrint, we demonstrated that based on the near perfect agreement (r = 0.99) between different array types, study results based on whole-genome arrays similarly apply to the current diagnostic setting.

Previously, special arrangements were made by physicians to obtain a fresh tissue specimen during surgery for MammaPrint. To broaden utility, we translated MammaPrint to FFPE, thereby enabling the use of FFPE tissues and facilitating the decision of applying a MammaPrint test even after surgery of the primary tumor. A comparison between fresh and FFPE tissues of the same sample is naturally influenced by intrinsic intratumoral heterogeneity and differences in tissue preservation techniques [30] that are together fully accounting for the observed deviations. A difference of 5 % in MammaPrint binary result can be attributed to these phenomena [6, 31]. Despite this influence, comparison between fresh and FFPE demonstrated very high agreement (r = 0.93) in both MammaPrint Index values [6] and visualized clinical outcome prognosis, affirming previously described results [13]. Taken together, study results of large clinical studies featuring the 70-gene profile MammaPrint, in particular, the MINDACT trial whose level 1a evidence results will be available in 2016, can be interpreted and transposed to the current diagnostic setting based on FFPE tissues without any reservations.

In summary, study results confirm that MammaPrint indices generated from customized mini-arrays show near perfect agreement to MammaPrint indices obtained from customized whole-genome arrays. Additionally, MammaPrint for fresh and FFPE tissues are robust, equivalent, and stable tests (also at the clinical level) in spite of influences by tissue heterogeneity and tissue preservation techniques. Finally, stability assessment confirmed a very high reproducibility, precision, and repeatability of the MammaPrint test over time. This combination of high equivalence between different array types, between tissue types, and high stability over time demonstrates that results from clinical trials like MINDACT and I-SPY 2 are similar to the current MammaPrint FFPE and fresh diagnostics and can be used interchangeably.