Introduction

A phyllodes tumor (PT) is the least common lesion in the category of fibroepithelial tumors of the breast, which has fibroadenoma as its common exponent (Tan et al. 2016; Mishra et al. 2013). These are quite unusual lesions, which correspond to 0.3 to 0.5% of all female breast tumors, and are composed of variable proportions of epithelial cells (the non-neoplastic component of lesion), and stromal cells (the neoplastic component of the tumor) (Tan et al. 2016; Lakhani 2012).

In addition to the rarity, the anatomopathological diagnosis is complicated by the complex nature of microscopic morphology associated with the differential diagnosis with morphologically similar lesions, and by its biological behavior depending on the graduation (Lakhani 2012; Lawton et al. 2014). PT of the breast covers a broad spectrum of biological behavior ranging from benign to the frankly malignant (Lakhani 2012; Chang et al. 2018; Tan et al. 2005a). Pathological diagnosis is the gold standard in the definition of these entities with different biological behaviors (benign, borderline and malignant), has a high correlation with recurrence-free survival, and is used in counseling and clinical management (Chang et al. 2018; Chng et al. 2018; Rakha et al. 2017). Studies have provided evidence that only two types of PTs could be distinguished at genetic basis—benign and malignant/borderline (Pareja et al. 2017; Lae et al. 2007). Complete surgical resection is the established treatment for breast PT, since residual PT at the excision margins is a strong predictor of local recurrence; however there is a consensus that benign PTs benefit from more conservative treatment (Tan et al. 2016; Lakhani 2012; Tan et al. 2012; Yonemori et al. 2006; Tremblay-LeMay et al. 2017; Shaaban and Barthelmes 2017). The graduation of PT is based on a constellation of microscopic parameters, which include (i) the degree of cellular atypia of stromal cells; (ii) the degree of stromal cellularity; (iii) the evaluation of the mitotic index in 10 microscopic high-power fields (HPF); (iv) stromal overgrowth; and (v) the characteristics of surgical margins (Tan et al. 2016; Lakhani 2012; Tan and Tan 2018). Since each of the three main morphological parameters (stromal atypia, stromal cellularity and mitotic index evaluation) presents three levels of stratification, there are significant challenges in the search for accuracy and diagnostic reproducibility (Rakha et al. 2017; Khazai et al. 2015).

Beyond the dispute regarding the histopathology of PTs, investigators have studied the role of the biological markers and their relationship with clinical and pathological characteristics, with p53 and Ki-67 perhaps being the most widely evaluated (Yonemori et al. 2006; Tan et al. 2005b; Pornchai et al. 2018; Vilela et al. 2014; Kucuk et al. 2013; Yemelyanova et al. 2011; Noronha et al. 2011; Tse et al. 2002; Umekita and Yoshida 1999; Kim and Kim 1993).

P53 is a tumor suppressor gene located on the short arm of chromosome 17p, and P53 mutations are among the most common identifiable genetic abnormalities in human cancer. The wild-type p53 gene product has a short half-life, and this unstable protein is thus rarely detectable by immunohistochemistry. In contrast, immunohistochemical positivity is believed to highlight the expression of mutant p53 protein, which is more stable with a longer half-life. Not all mutations result in immunopositivity, while positive staining may reflect events other than mutation, such as stabilization of wild-type p53 by cytoplasmic factors. Nevertheless, extensive immunohistochemical positivity for p53 increases the likelihood that there is an underlying mutation. It is commonly used as a surrogate method for tumor-suppressor gene mutation, since the sequencing of the P53 gene carried out by various authors has mostly discovered mutations (Feakins et al. 1999; Giacomazzi et al. 2013; Rivlin et al. 2011; Hanahan and Weinberg 2000; Munawer et al. 2012; Gatalica et al. 2001; Murnyak and Hortobagyi 2016).

Cellular proliferation is one of the fundamental biological processes, and its assessment provides useful predictive information with a growing body of evidence for the utility of Ki-67 for predicting prognosis in many different tumor entities (Vilela et al. 2014; Kucuk et al. 2013; Noronha et al. 2011; Umekita and Yoshida 1999; Munawer et al. 2012; Chan et al. 2004). Ki67 is expressed in proliferating cells during the mid-G1 phase, increasing in level through S and G2, and peaking in the M phase of the cell cycle. It is rapidly catabolized at the end of the M phase, and is undetectable in resting (G0 and early G1) cells (Tan et al. 2005c).

Although PT reports are quite controversial when correlating expression of p53 and Ki-67 with clinical variables, such as global survival and recurrence-free survival, most authors point to a strong relationship between the expression of these markers and the histopathological potential impact on graduation accuracy, even though this was not the original objective of the studies (Lae et al. 2007; Yonemori et al. 2006; Tan et al. 2005b; Vilela et al. 2014; Kucuk et al. 2013; Yemelyanova et al. 2011; Noronha et al. 2011; Tse et al. 2002; Umekita and Yoshida 1999; Kim and Kim 1993; Feakins et al. 1999; Rivlin et al. 2011; Munawer et al. 2012; Gatalica et al. 2001; Murnyak and Hortobagyi 2016; Chan et al. 2004; Tan et al. 2005c; Kim et al. 2018; Wang et al. 2017; Mastellaro et al. 2017; Piscuoglio et al. 2016; Vidal et al. 2015; Lin et al. 2014; Korcheva et al. 2011; Appel et al. 2008; Zlobec et al. 2006; Niezabitowski et al. 2001; Birch et al. 2001; Millar et al. 1999; Kocova et al. 1998; Kawai et al. 1994).

The objective of this study is to establish a positivity score for p53 and Ki-67 that allows the increase of graduation accuracy of PTs through the appropriate use of these ancillary methods. There is no consensus in the literature regarding (i) the percentage of ki67 that has discriminant power in relation to the degrees of PTs; and (ii) the intensity of staining and the percentage of expression of p53 that can be considered as a positive result.

Methods

Study design

A retrospective study cohort of 146 consecutive PTs surgically removed between January 2000 and December 2015. With the aim of reducing the biases usually related to retrospective studies in which the index test is not performed in order to be evaluated—resulting in non-standardized and not always blinded analysis in relation to the reference standard (Bossuyt et al. 2015)—all cases were reviewed based on current histopathological classification, and all immunohistochemical markers were performed at one time. Both were interpreted by the pathologist blinded to the diagnosis and the clinical status.

Participants

The archives of the Department of Pathology of Hospital de Clínicas de Porto Alegre, and more two pathology laboratories of the city, were searched for PT surgically removed between January 2000 and December 2015. The diagnostic slides and formalin-fixed, paraffin-embedded tissue blocks of 146 consecutive benign, borderline, and malignant PTs were retrieved. Cases with no sufficient material for all analysis were excluded. All samples were anonymized prior to pathological analysis, and ethical approval was received from the institution.

Test methods

A blinded review of all 146 cases, including all tumor sections (according to the latest WHO criteria) (Lakhani 2012) was performed by a pathologist without knowledge of the anatomopathological report or any clinical data, and the results were compared with the original diagnosis. The reviewed diagnosis was assumed as the gold standard for analysis.

A representative paraffin block for each case was chosen for immunohistochemical analysis. Two slides of formalin-fixed, paraffin-embedded sections, 4 μm in thickness, were prepared and affixed to electrostatically charged slides. After deparaffinization and rehydration in xylene and graded alcohols, endogenous peroxidase was blocked with hydrogen peroxide. Antigen retrieval—as in all staining processes—was performed on the Ventana BenchMark automatic staining system (Ventana, Tucson, AZ). Mouse monoclonal antibodies directed against Ki-67 (clone 30–9, prediluted, Ventana) and human p53 (clone DO-7 prediluted, Ventana) were used. Antigen-antibody reactivity was detected using the multimer Ventana detection kit with 3,3′ diaminobenzidine tetrahydrochloride as chromogen. Positive controls were included in all slides.

The stained sections for Ki67 were considered to be positive only if unequivocal nuclear staining was present, no matter the intensity, according to the recommendations of the International Ki67 in Breast Cancer Group (Dowsett et al. 2011). The most active areas—hot spots—with the maximal number of nuclei staining were chosen to perform counting. The Ki-67 index was defined as the percentage of cells that showed a positive staining in 10 microscopic HPF using a 40× objective with an eyepiece of 10× (0.28mm2 area), as previously described (Kocova et al. 1998; Jacobs et al. 2005). Thereafter, a receiver operating characteristic (ROC) curve was used to determine the best cutoff point for the test.

The intensity of the immunohistochemical neoplastic stromal nuclear staining for p53 was scored as negative (0), weak staining (1+), moderate staining (2+), and strong staining (3+). The percentage was analyzed in 10 microscopic HPF. The proportion of nuclear positive cells was categorized as sporadic (positive cells < 10%); focal (positive cells > 11% and < 50%); and diffuse (positive cells ≥50%). The immunohistochemical scores of 2+ and 3+ with focal to diffuse distribution were considered to represent positive expression of p53, as described previously (Yonemori et al. 2006). Index texts analysis was performed—without knowledge of the diagnosis or clinical status—by two pathologists independently and, for discordant cases, a consensus diagnosis was achieved on a multi-head microscope.

Analysis

Statistical analysis was carried out using the software SPSS for windows 21.0. Quantitative variables were described by mean and standard deviation or median and interquartile range. Categorical variables were described by absolute and relative frequencies. To compare means between degrees of tumor malignancy, the analysis of variance (ANOVA) plus the Tukey test were applied. In case of asymmetry, the Kruskal–Wallis test with the Dunn method was used. In comparing proportions, Pearson’s chi-squared test together with an adjusted residuals analysis was applied. To determine the best Ki-67 cutoff point for borderline or malignant PT, a receiver operating characteristic (ROC) curve was used. Diagnostic properties, such as sensitivity, specificity, positive and negative predictive values and accuracy, in addition to the kappa concordance coefficient, were calculated to aid in deciding the best combination of p53 and Ki-67 markers. A p-value below of 0.05 was considered a significant result.

Results

Participants

The clinicopathologic characteristics are summarized in Table 1. The overall diagnostic agreement level was high, and all cases were considered to be originally correctly diagnosed as PT, and the agreement between grades was achieved in 92.5%. Briefly, the median age of patients with PT was 45 years (range: 16–74 years) and the association between age and benign, borderline, and malignant subgroups (see Table 2) was statistically significant (p < 0.001); as the patient’s age increases the degree of malignancy of the tumor increases. The median size of PT was 4.0 cm (range: 1–20 cm) and a positive association was established in the comparison of histological grades of PT with the tumor size (p = 0.005). Clinical outcome data was available in 68 cases (40 benign, 12 borderline and 16 malignant). All patients with benign PT were alive and disease-free at their last follow up visit. In borderline group we found 1 PT related death and 6 local recurrence. Death was associated with disease progression in 6 patients, and among 10 alive patients 5 had recurrence, in the malignant PT group.

Table 1 Clinical and pathological characteristics
Table 2 Anatomopathological results

Test results

Significance was observed in the expression of Ki-67 in the comparison of benign, borderline, and malignant PT with p < 0.001 being significantly lower in benign tumors when compared to borderline and malignant tumors, with no significant difference between them. The 10% cutoff point for Ki-67 best balanced sensitivity with specificity, with an area under the curve of 0.98 (95% confidence interval [CI]: 0.95–1), resulting in high specificity (96.4%), sensitivity (88.9%), and accuracy (94.5%) for borderline or malignant PTs with a very good concordance coefficient kappa of 0.85 (p < 0.001). For the analysis, if more than 10% of the total neoplastic cell nuclei stained, the case was considered positive. Of the benign tumors, 106 of 110 (96.4%) showed Ki-67 positivity ≤10% of the neoplastic cells, and all benign tumors showed Ki-67 expression lower than 20%. On the other hand, all 20 malignant PT showed Ki-67 positivity > 20% in the stromal neoplastic cells. In the borderline group, the Ki-67 was ≥20% in 12 cases, with the remaining 4 cases achieving a positivity range ≥ 10 and < 20%.

The study also showed a greater expression of p53 in malignant and borderline tumors when compared to benign PTs, and the p53 expression was significantly associated with grade (p < 0.001). The expression of p53 was considered to be positive in 28 (19.2%) of all 146 PTs. Only 5 of 110 (4.5%) benign PTs expressed p53 positivity; meanwhile, p53 positivity was achieved in 15 of 20 (75%) malignant PTs. The specificity of the p53 positivity to diagnose a borderline or malignant PT was 95.5%; the sensitivity was 63.9% and accuracy was 87.7%, with a good concordance coefficient kappa of 0.64 (p < 0.001).

When considering either of the positive tests for the diagnosis of a borderline or malignant PT (see Fig. 1 and Table 3), we achieved a sensitivity of 100%, a specificity of 91.8%, a positive predictive value (80%), a negative predictive value (100%), and an accuracy of 93.8%, with a very good concordance coefficient kappa of 0.85 (p < 0.001). When considering both positive tests—p53 positivity and Ki67 positivity (> 10%)—a diagnosis of malignant or borderline TP can be made with 100% specificity. However, the sensitivity is low (52.8%), the positive predictive value is 100%, the negative predictive value is 86.6%, and the accuracy is 88.4%, with a good concordance kappa of 0.63 (p < 0.001).

Fig. 1
figure 1

Diagram to report flow of participants through the study. Ki-67: Test is considered positive for malignant/borderline PT if more than 10% of the total neoplastic nuclei is stained. p53: Immunohistochemical scores of 2+ and 3+ with focal to diffuse distribution is considered to represent positive expression of p53 and a positive test for malignant/borderline PT

Table 3 Agreement between P53 and Ki-67 tests with anatomopathological results of borderline/malignant PT

Discussion

An increasing trend of age, tumor size, and positivity scores of p53 and Ki67 were clearly detected for benign, borderline, and malignant PTs in this series (see Tables 2 and 3).

According to the WHO classification (Lakhani 2012), our cases reported as benign (75.5%), borderline (11%), and malignant (13.7%), are in accordance with published ranges of PT in the relevant series of other authors (Tan et al. 2016; Chang et al. 2018; Tan et al. 2005a; Efared et al. 2018; Jia et al. 2017; Spitaleri et al. 2013). An interesting study published in 2018 by Chang et al. analyzes the impact of the new WHO classification by retrospectively assessing 305 fibroepithelial lesions in the 2007–2017 period, and observing an increased diagnosis of benign PT, with a relative reduction in fibroadenoma diagnoses. The authors assume that this fact stems from the more methodical application of the morphological criteria described in the new classification (Chang et al. 2018).

The mean lesion size observed in this analysis, as well as the median and range, is also comparable to those reported in the literature (Tan et al. 2016; Tan et al. 2005a; Efared et al. 2018; Spitaleri et al. 2013). Although in this study series demonstrates a significant increase in the size of the lesions in relation to the degrees—a fact also observed in previous studies—the size of the lesion alone does not authorize the therapeutic decisions, according to Neville et al.’s study (Neville et al. 2018). The mean, median, and range of age of patients observed in this analysis are similar to those reported previously, as well as the significant association between age and degrees of PT (Lakhani 2012; Tan et al. 2005a; Jia et al. 2017; Spitaleri et al. 2013).

The differences in the clinical and pathological characteristics of PT reported by other authors are probably due to the inherent limitation of smaller series, as well as methodological aspects related to the recruitment of cases, or biases related to retrospective studies (Noronha et al. 2011; Kim et al. 2018; Piscuoglio et al. 2016; Sin et al. 2016).

PTs are biphasic neoplasms, but the stromal element is regarded as the neoplastic component and, consequently, as the determinant of clinical behavior (Tan et al. 2016; Feakins et al. 1999). Nevertheless, many authors describe immunohistochemical patterns of epithelial positivity as being indicative of p53 positivity (Tan et al. 2005b; Kim and Kim 1993; Munawer et al. 2012). Moreover, the scores used to evaluate the presence of positivity for p53 are quite varied when quantifying expression and intensity of the reaction, and in some studies the positivity criterion is not even detailed (Mishra et al. 2013; Wang et al. 2017). The strong relation between immunohistochemical expression of p53 and malignant PT had already been observed by Kim and Kim in 1993, in the first report in the literature on this subject (Kim and Kim 1993). Since then, studies evaluating p53 expression in PT have been few in the literature, with most of them including only a limited number of cases. The largest published series on PT originated from a single institution, where Tan et al. analyzed the prognostic role of morphological parameters in 355 Asian women diagnosed with PT (Tan et al. 2005b), authors considered epithelial and stromal positivity, and any percentage of expression or intensity of reaction, as a positive result for p53, without determination of a cutoff for the index test. Although counting on a series of only 15 PTs (6 malignant, 9 benign) and 20 fibroadenomas, Millar et al. suggest that not only the positivity itself but also the expression pattern of p53 and the intensity of the reaction are of diagnostic value for PT (Millar et al. 1999). The same impression was shared by Niezabitowski et al. Two years later, when they stated that expression of the p53 in tumor cells also could be useful as a predictive indicator when the number of cells and the intensity of expression are considered (Niezabitowski et al. 2001). Similarly, when analyzing 143 cases of PT (87 benign, 37 borderline, and 19 malignant), Tse et al. stated that the strong and diffuse pattern of p53 presents high specificity (99%) but low sensitivity (47%) for the diagnosis of malignant PT (Tse et al. 2002). However, this pattern was only observed in one (3%) borderline PT in this series, which limits its practical use since these tumors have biological behavior and consequent surgical management similar to malignant ones. When compared to our findings, the specificity (95.5%) and sensitivity (63.9%) values are similar. However, in our series this association was observed in the group that includes malignant and borderline PTs, which makes the finding more significant with implications for the practice and intended use of the index test. We believe that the difference is due to the nature of the score used to establish the positivity of the index test. When studying 63 PTs (50 benign and 13 malignant), Chan et al. found similar results (Chan et al. 2004). An interesting aspect of their study was—in the absence of generalized accepted standard percentage to define high expression—with different authors applying different cutoff levels from 5 to 34% (Kim and Kim 1993; Feakins et al. 1999; Gatalica et al. 2001; Niezabitowski et al. 2001; Millar et al. 1999; Kocova et al. 1998), Chan proposed a new score (0–10%, 11–30%, 31–50%, and 51–100%). However, their study did not take into account the intensity of the reaction in the evaluation, nor did it define what should be considered a positive result for the index test based on the analysis of the scores (Chan et al. 2004). One of the few studies that correlated p53 expression with systemic recurrence and survival in PT, adopted the intensity of the immunohistochemical staining and the proportion of positive cells, which were categorized as sporadic, focal, and diffuse (Yonemori et al. 2006). This score has been used in our study and is described in detail above.

Although there is a growing body of evidence for the utility of Ki-67 for predicting clinical prognosis in many different tumor entities, reports failed to demonstrate an association in PT cases (Kocova et al. 1998). The likely reasons are the same for p53: associated with a small series and short follow-up. Umekita and Yoshida were the first to correlate Ki-67 with the histological grade of malignancy in PT, using a grading histopathological system similar to the current WHO classification. Applying a 10% cutoff for positivity, but considering stromal and epithelial cells for analysis, they correlated Ki-67 with a histological grade of malignancy (Umekita and Yoshida 1999). When analyzing 52 benign PTs, 24 borderline PTs, and 42 malignant PTs, Niezabitowski et al. found a correlation between the expression of Ki-67 and prognostic factors among patients with malignant PT, and with histological grading (Niezabitowski et al. 2001). However, the cutoff used in the analyses—defined as 11.2% and derived from a previous study on cutaneous melanomas—is difficult to apply in practice. The same cutoff was used later and also showed a correlation with prognostic factors (Yonemori et al. 2006). Although almost all authors have been correlating Ki-67 expression with morphologic grading, increasing from benign, borderline, to malignant PTs, labelling indices have been reported to range from 1.3 to 50%. Also, there is no generally accepted standard percentage that defines a positive result, with different authors applying different cutoff levels from 5 to 20% (Yonemori et al. 2006; Umekita and Yoshida 1999; Chan et al. 2004; Niezabitowski et al. 2001; Jara-Lazaro et al. 2010). Furthermore, many reports have not used any cutoff to define a positive index test (Yonemori et al. 2006; Tan and Tan 2018; Umekita and Yoshida 1999; Chan et al. 2004; Niezabitowski et al. 2001; Jara-Lazaro et al. 2010).

Our study has some limitations. We performed all analyses on surgical specimens, so we cannot extend the application of our findings to core biopsy samples. A similar study design focused on biopsy specimens could clear this restriction. Alternatively, a tissue microarray (TMA) could be constructed with random samples and then the results correlated. The TMA technique can be effectively applied in PTs to study immunohistochemical markers, according to Tan et al. and Munawer et al. (Tan et al. 2005b; Munawer et al. 2012).

Conclusions

Our experiment provides a practical methodology to achieve a highly accurate grading of PT, benign versus borderline/malignant, compared to gold standard, based on clearly defined and easy to apply cutoffs of a simple immunohistochemical panel of Ki-67 and p53. When considering either of the positive tests for the diagnosis of a borderline or malignant PT, we achieved a sensitivity of 100% with a specificity of 91.8%. In conclusion, a PT positive for either of index tests should be graded as borderline or malignant. We hope this new approach might provide a basis for the development of standardization in the use of p53 and Ki-67 for grading PTs.