Introduction

Allogeneic hematopoietic stem cell transplantation (allo-HSCT) is a widely used therapy for a range of malignant and non-malignant hematologic diseases. In allo-HSCT, the host’s immune and bone marrow systems are replaced by the donor’s immune and bone marrow systems. The donor immune system recognizes residual tumor cells as foreign and eradicates them via the graft-versus-leukemia (GVL) effect. Unfortunately, donor immune cells may also attack normal host tissue, particularly the skin, liver and gastro-intestinal (GI) tract, resulting in graft-versus-host disease (GVHD). The occurrence of GVHD remains one of the major barriers to a more widespread and successful application of allo-HSCT. GVHD has classically been distinguished into two forms: acute GVHD (aGVHD) which arises before the 100 day mark post-HSCT and chronic GVHD (cGVHD), which occurs after the 100 day mark post-HSCT [1, 2]. These two classical forms are separate pathophysiological entities. However, in 2005, the National Institute of Health’s new classification recognized additional categories such as late-onset acute GVHD (after day 100) and an overlap syndrome with features of both the acute and chronic disorder [3, 4]. These new forms of GVHD can be explained by wider utilization of HSCT in older recipients who have undergone reduced-intensity conditioning regimens prior to transplantation.

A major barrier to aGVHD research and treatment is that the diagnosis and prognosis rely almost entirely on the presence of clinical symptoms. Currently, no validated laboratory tests exist to predict the risk of developing aGVHD, responsiveness to treatment, or patient survival. This absence of validated aGVHD biomarkers is partially due to the complex pathology of GVHD. Biomarkers in HSCT settings are crucial for identifying patients at high risk for aGVHD early in their transplant course. Biomarkers may also lead to altered treatments including more stringent monitoring and/or preventative care. The ability to identify patients who will not respond to traditional treatment and who are at particularly high risk for subsequent morbidity and mortality could result in personalized treatment plans such as additional immunosuppressive treatments introduced early for patients at a high risk. Equally important is the identification of patients who will respond well to treatment, which could allow for rapid tapering of steroid regimens, thereby reducing long-term toxicity in low risk patients.

The current review will provide an update on Omics tools, the discovery and validation of the most clinically relevant biomarkers of aGVHD, and specific recommendations for their use in clinical trials.

Types of aGVHD biomarkers

The classical definition of a biomarker is any characteristic that can be objectively measured and validated as an indicator of a biological, pathogenic, or pharmacological response process [5]. With regards to aGVHD, a biomarker is being sought that indicates the GVHD risk after allo-HSCT, the prognosis once incurred, and the treatment responsiveness. Below are the types of biomarkers that are being investigated as potential aGVHD markers.

Histocompatibility antigen disparities

Antigen disparity can be at the level of the major histocompatibility complex (MHC), or at the level of minor histocompatibility antigens (mHA). The severity of aGVHD is directly related to the degree of MHC mismatch [6]. In transplantations that are MHC matched but mHA disparate, donor T cells still recognize MHC peptides derived from the products of recipient polymorphic genes, the mHAs [7, 8]. The expression of mHAs is wide and variable. Thus, different mHAs might dictate variable phenotype, target organ involvement, development kinetics of GVHD, or antitumor responses after allogeneic HSCTs [9]. Some mHAs, such as HA-1, HA-2, HB-1 and BCL2A1, are primarily found on hematopoietic cells, whereas others such as the H-Y antigens, HA-3, HA-8 and UGT2B17 are ubiquitous [1012]. These disparities are well known risk factors before transplantation and the goal of clinicians is always to best match the MHC disparities although full match is not always possible and mHA matching is not yet a common practice in the clinic. However, MHC disparities are not ideal biomarkers because they are static indicators incapable of providing the dynamic information necessary for monitoring aGVHD after transplantation.

Non-HLA polymorphisms

Increasing evidence indicates that non-HLA polymorphisms influence the risk of aGVHD and cGVHD. Most genetic variation among humans consists of single nucleotide polymorphisms (SNPs) that result in functional differences in gene products. Several SNPs have been identified as risk factors for GVHD (e.g., tumor necrosis factor [TNF]α, interleukin [IL]-6, interferon[IFN]-γ, IL-10, UDP-glucuronosyltransferase 2B17) [13, 14]. However, clinicians are confronted with the same issue as mHA disparities; donor selection according to SNP genotyping is still not performed clinically, although it may be available in the near future as recently reported by Petersdorf et al. [15]. An IL-23 receptor polymorphism has recently been found to lead to decreased CD4 and CD8 T cell function [16] and to lower incidence of aGVHD [17, 18]. The protective role of IL-23, which mediates the production of IL-22 when tissue is damaged, has also been demonstrated in an experimental model of intestinal GVHD indicating that the IL-23/IL-22 pathway acts as a protector of intestinal stem cells during times of inflammation and disease [19].

MicroRNAs or miRNAs

MicroRNAs or miRNAs are small, noncoding segments of RNA derived from a larger coding RNA, such as mRNA. miRNAs are approximately 20–25 nucleotides in length and function by binding to mRNAs. The binding of these two molecules is responsible for the regulation of gene expression. This binding typically modifies or silences translational products but can have effects on post-transcriptional modification also [20]. More recent identification of stable miRNA in circulating bodily fluids has allowed for their use as biomarkers of certain diseases [21], including aGVHD [22]. While protein is the prototypical biomarker and is much more informative than miRNA, miRNA does offer several advantages of its own. Aside from their stability in body fluids, miRNAs are also easily measurable via the use of quantitative PCR. Unfortunately, there remains a lack of standardization in the application of miRNA as a biomarker, often introducing bias into the data analysis. Despite this, miRNA has shown promise as a novel biomarker for predicting aGVHD. A recent study has shown that miR-155 expression is up-regulated in a GVHD experimental model and that its blockade in donor T-cells led to a decreased incidence of aGVHD and improved survival rates [22]. These findings suggest that miR-155 could not only be an interesting biomarker but also a potential new target for therapeutic agents of GVHD. Another recent study has also identified miR-100, which may be involved in the development of intestinal neovascularization in aGVHD. This study showed that miR-100 expression was down-regulated throughout the progression of aGVHD in mice, suggesting that miR-100 negatively regulates aGVHD. Furthermore, inactivation of miR-100 worsened the severity of aGVHD, leading the authors to conclude that it may have a protective role in aGVHD, specifically through inhibition of inflammatory neovascularization [23].

Cellular biomarkers

There are several different immune cell populations whose functions are altered in the aGVHD and cGVHD disease states. The potential to manipulate specific immune cell populations ex vivo and in vivo may allow for the development of new aGVHD therapies. In addition, some of these immune cell subsets look promising as biomarkers of aGVHD and cGVHD.

The maintenance of immune tolerance after allogeneic HSCT by regulatory T cells (Tregs), traditionally characterized as CD4+ CD25+ forkheadbox protein 3 (FOXP3)+, has been confirmed in patients. Daily administration of low-dose interleukin-2 (IL-2) induces selective expansion of functional Tregs and clinical improvement of cGVHD [24, 25]. Ex vivo expanded Tregs infused in patients after HLA-haploidentical transplantation have also been shown to prevent aGVHD and promote immune reconstitution [26]. Tregs are also promising cellular biomarkers. Two studies demonstrated that high numbers of donor graft Tregs were correlated with a lower incidence of aGVHD and improved survival in HSCT-recipients [27, 28]. Another study showed that Tregs frequency and aGVHD severity were negatively correlated. This inverse relationship lends itself useful for diagnostic and prognostic purposes [29].

CD30, a member of the tumor necrosis factor (TNF) receptor superfamily, is expressed on some activated memory T cells and released as a soluble form. Both have recently been shown as potential markers of aGVHD [30, 31].

Another cell population tagged as being a potential cellular biomarker of aGVHD is the invariant natural killer T cells (iNKTs). One recently conducted study found that a high dose of iNKTs in the graft was the only parameter to correlate with a decreased risk of aGVHD in a multivariate analysis including 57 HSCT recipients [32]. Another group analyzed a day 15 pre-GVHD iNKT/T-cell ratio and found that a low ratio was a predictor for aGVHD and increased mortality [33]. Of note, both studies included only patients who received in vivo T cell depleted transplants.

Dendritic cells are cellular subsets that have also been explored as potential markers [34, 35]. In cGVHD, B cells and their modulators, such as B-cell activating factor, are possible future biomarkers [3641].

Proteomic biomarkers

Proteomic biomarkers are detailed below.

Omics tools for the discovery and validation of proteomic biomarkers

Advances in engineering have allowed for increased data throughput, enabling the study of complete sets of molecules (“Omics”) with exponential speed, accuracy, and cost-effectiveness. Thus, the analysis of the entire spectrum of molecular and cellular organization is now possible, enabling researchers to gain insight into the mechanisms of disease, with fewer a priori assumptions. However, from genes (~20,000) to proteins, there are two more levels of complexity: the transcriptome (~100,000 RNA transcripts) and the proteome (~1,000,000 proteins) (Fig. 1).

Fig. 1
figure 1

The complexity of Omics as represented by the transformation of a caterpillar into a butterfly. A caterpillar and a butterfly have the same genome but different transcriptomes and proteomes. This increasing complexity from genome to proteome has sparked research into creating more efficient proteome discovery techniques

Here, we focus on the use of proteomics for the molecular diagnosis of aGVHD post-HSCT, since proteins are more proximal than other cellular metabolites to the ongoing pathophysiology of a disease [42]. The term “proteomics” indicates PROTEins expressed by a genOME and is the systematic analysis of a sample’s protein profile. Unlike the genome, the proteome varies with time and is defined as “the proteins present in one sample at a certain point of time”. Detailed below are the various proteomic techniques available for the discovery and validation of biomarkers with the basic workflow described in Fig. 2.

Fig. 2
figure 2

Workflow for the discovery, validation, and implementation of new biomarkers. Samples are obtained from patients diagnosed in the clinic with GVHD. The proteins in the sample are subjected to separation and purification subsequently followed by mass spectrometry for protein identification. The protein concentrations from the patients’ samples are then compared to known concentrations of the identified protein in an immunoassay (usually sandwich ELISA). Once a biomarker is validated, it is carried into clinical trials for analysis of its ability as a diagnostic and prognostic tool. The end goal is more personalized treatment and improved patient outcomes

Which biofluid should be used for clinical tests?

Ideal clinical tests are based on noninvasive collection, which allows for repetitive sample collection from the same patient in a short amount of time. GVHD biomarkers may be produced by several sources such as donor cells, the local or systemic cytokine milieu, or recipient target tissues during disease development. These biomarkers may then be released into a variety of body fluids. For noninvasive tests used in diagnostics, biofluids, such as plasma, sera, or urine, are the preferred samples. Enormous effort has been placed into developing standardized methods for clinical sample collection [43, 44]. Plasma and sera are the most frequently analyzed biofluids. The levels of individual blood proteins represents a summation of multiple, disparate events that occur in every organ system. Plasma and sera contain proteins shed by the affected tissue as well as proteins that reflect secondary systemic changes. However, plasma and sera are highly complex mixtures that contain high levels of many different proteins with a wide dynamic range, spanning twelve orders of magnitude from albumin to the lowest abundance protein. Often, the most clinically relevant proteins are in the lowest abundance such as cytokines and their receptors [45, 46]. To be able to detect these low abundance proteins, depletion of the pre-dominant proteins and subsequent fractionation of the proteome is required [47].

Urine samples represent an alternative to plasma/sera samples for biomarker discovery. Urine has four main advantages over plasma/sera: (i) it can be obtained in large quantities; (ii) the protein mixture is far less complex; (iii) the variation in protein abundance is low; and (iv) it is more stable than plasma. However, a limitation is that urine yields better information about diseases in the organs directly involved in its production and excretion, such as the kidneys. The proteins in urine are mainly products from kidney function (~70 %) and glomerular filtration of plasma proteins (~30 %) thus, urine is less informative for systemic diseases.

Another useful source of biofluid in the context of gastrointestinal diseases is feces. For instance, fecal markers of leukocyte influx into the mucosa are promising indicators of intestinal inflammation. Some neutrophil-derived proteins may be linked to the pathogenesis of intestinal inflammation due to their functions as damage-associated molecular pattern molecules (DAMPs). Phagocyte-specific DAMPs of the S100 family are released from neutrophils or monocytes, followed by pro-inflammatory activation of pattern recognition receptors. The complex of S100A8/S100A9, termed “calprotectin”, has been in use as a fecal marker of inflammatory bowel disease for 10 years [48]. Calprotectin has recently been shown to be an important intestinal aGVHD marker: a high risk feature for mortality [49, 50]. The role of the intestinal microbiota in GVHD has also been emphasized [51, 52]. Furthermore, proteomics studies with feces have been employed [53].

Aliquots of peripheral blood mononuclear cells (PBMCs) have also been used by immunologists to study cellular biomarkers. Unfortunately, most of the repositories contain plasma and sera rather than PBMCs, one reason being the cost of processing PBMCs.

In sum, for proteomic studies, the availability of the biofluids in the biorepositories will dictate the type of analysis to be performed. For instance, looking at markers in the damaged tissue itself could be more informative than systemic markers. However, there are three limitations to this approach: (i) the limited material available in biopsies, (ii) tissue biopsies are invasive and an ideal biomarker should be traceable without having to perform serial invasive biopsies, and (iii) tissue proteomics is reputed to be difficult.

Which proteomic tools can be used for biomarker discovery and rapid validation?

Antibody profiling for discovery

Antibody-based approaches are focused around immunoassays which use antibody-antigen interactions to identify proteins within a specific sample. The unique characteristics of antibodies are derived from their three important properties: (i) their ability to bind to an extremely wide range of natural and man-made chemicals, biomolecules, and cells due to the huge number of potential amino acid sequences at the paratope; (ii) their exceptional binding specificity that enables the measurement of picomolar (10−12) amounts of proteins in blood samples; and (iii) the strength of binding between an antibody and its target makes immunoassays accurate and precise, even at low concentrations. To screen for aGVHD biomarkers, antibody microarrays dotted with hundreds of antibodies have been employed, allowing for hundreds of proteins in complex biological matrices to be isolated and measured [54].

Mass spectrometry for discovery

Most non-antibody proteomic strategies are based on mass spectrometry (MS), which is a powerful tool for characterizing and assessing qualitative and quantitative changes in complex protein mixtures [55]. Two types of MS techniques have been used in clinical proteomics: (i) pattern profiles which identify peptide sets and (ii) detailed protein identification and characterization.

Pattern profiles compare polypeptide spectra obtained by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) or surface-enhanced laser desorption/ionization (SELDI-TOF) MS to show which patients suffer from a particular disease [56]. These MS methods do not require in-depth analysis because they do not identify individual components of the profile. These two techniques have been used to screen for aGVHD biomarker candidates in both serum [57] and saliva [58].

Techniques which identify and characterize individual proteins are detailed below and always rely on protein separation within samples. One approach for protein separation are gel-based techniques such as two-dimensional polyacrylamide gel electrophoresis [59] and two-dimensional differential gel electrophoresis (2-D DIGE) [60]. Three-dimensional separation of proteins that are differentially labeled with fluorescent dyes according to their charge, hydrophobicity, and molecular mass have been used to diagnose aGVHD [61] and heart ischemia [62].

Despite the utility of gel-based techniques, gel-free separation methods such as liquid chromatography (LC) [47, 6365] and capillary electrophoresis [66] provide better separation because they overcome several limitations of gel separation, such as lengthy analysis time; poor separation of proteins with low molecular weight, or an extreme isoelectric point; and difficult quantification of mixed spots. This analytical procedure reliably identifies proteins and determines their isoforms and post-translational modifications. MS also allows quantification, particularly when tandem MS (MS/MS) is used [67], and has been used most recently for quantification with label-free methods or isotopically labeled tags [47, 63, 68]. In addition, new instrumentation such as the ultra-high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top-down LC–MS/MS and versatile peptide fragmentation modes [69]. The mass spectra are then matched to a sequence database to identify proteins [70]. These approaches are not suitable for validation purposes because of the time required for analysis, but they remain the most efficient methods for biomarker discovery in clinical research.

Data mining

Massive amounts of complex and heterogeneous proteomic data are generated from a single experiment. However, there is no immediate solution when it comes to proteomics data analysis. The challenges encountered include: high-dimensional parameters measured that exceed the number of conditions in the experiment and the noise or false discovery rate. To address many of these challenges, the National Cancer Institute launched the Clinical Proteomic Technologies for Cancer Initiative to develop standard operating procedures, data analysis standards, and an open access proteomics database [7174]. Other proteomics workflows that have led to biomarker discovery include the ones detailed here [64, 65]. First, the acquired spectra are automatically processed by a Computational Proteomics Analysis System to identify proteins with a false discovery rate <5 % [75]. False discovery rates are now recognized as markers of significance in Omics studies, with standard p-values reserved for testing individual hypotheses in classic experiments [76, 77]. To sequentially refine the list of candidate proteins, proteins can be selected based on their relationship to the tissue of origin using the Human Protein Atlas website, which aims to annotate human proteins using antibodies to systematically analyze the cellular distribution of proteins in normal and pathologic tissues [78]. Another refinement can be obtained by using pathway analysis tools such as Database for Annotation, Visualization and Integrated Discovery; Onto-tools’ PathwayExpress; GeneGo Metacore; Ingenuity Pathway Analysis; and BIOBASE ExPlain Analysis. Data mining can be developed to meet specific needs. For instance, we have developed a data mining strategy that has enabled us to discover biologically meaningful proteins from proteomics sets [64, 65, 79]. These candidate proteins then require further validation by other biotechnological techniques, ideally those that allow for high-throughput testing as detailed in the next paragraph.

High-throughput immunoassays for validation

Of critical importance for all immunoassays is the ability to produce a large amount of results in a short period of time. Immunoassays that can analyze multiple quantities of a sample at the same time are considered high-throughput. The ability to multiplex is another desired quality in an immunoassay. Multiplexing is the ability to analyze large quantities of different proteins at the same time. Thus, the ideal immunoassay should have the quality of being both high-throughput and multiplexing [80].

The sandwich enzyme-linked immunosorbent assay (ELISA) is an immunoassay that relies on the principle of antibody-antigen binding for measurement of a known protein’s concentration within a sample. In singleplex ELISAs, the protein of interest in the sample can be measured with relatively little cross-talk between the antibodies and proteins. However, in multiplex ELISAs, antibody cross-talk is a major source of issue. Antibody cross-reactivity increases with the degree of plexing, increasing the number of false positives. Clearly there is a need for improvement on the ELISA, especially to increase its simplicity and multiplexing capabilities. A step in this direction was accomplished by our sequential ELISA workflow that allows for measurement of multiple proteins per plasma sample by reusing the same aliquot consecutively in individual ELISA plates [80]. Another technique is an on-chip immunoassay that will save time and labor with its inexpensive setup [81, 82]. A two-phase system that allows for the detection of multiple proteins in a sample without cross-talk has also been proposed. In this technique, the antibodies are introduced into a solution immiscible with the buffer. Antigens introduced into the buffer diffuse into the antibody phases for sandwiching and detection. This technique is still in its early stages and has shown promise in increasing specificity and sensitivity. It can be used for multiplexed ELISAs or multiplexed amplified luminescent proximity homogenous assays (AlphaLISA) which have the main advantage of being no-wash assays.

In summary, the advantages of immunoassays are their (i) suitability for the characterization of complex protein mixtures, such as human plasma, (ii) quantitativeness, (iii) high sensitivity for low abundance proteins such as cytokines, and (iv) high-throughput nature. The disadvantages are the restriction of the number of antibody pairs available and the high cross-reactivity between antibodies with each other and other non-target proteins. Multiplexing immunoassays without cross-talk have recently been developed and if validated on a large scale, may show promise in the years to come.

Mass spectrometry for validation

MS can also be used for biomarker validation. Recently, selected reaction monitoring-mass spectrometry (SRM-MS) also called multiple reaction monitoring (MRM) has emerged as a potentially useful technique for clinical diagnostics [8386]. SRM-MS is used for targeted, multiplexed quantitative proteomics to screen and quantify proteins in patient’s plasma samples with high sensitivity, absolute specificity and high-throughput. SRM-MS is primarily performed on triple quadrupole mass spectrometers. In SRM, researchers select which protein they want to observe and then subject the sample to MS to measure the mass of a specified molecule in the sample. The researcher pre-sets which fragments are recognized by the analyzer. In addition, SRM can be used to construct a calibration curve that can provide the absolute quantification of the native peptide, and by extension, its parent protein. This rapid SRM-MS technique enables the targeted monitoring and quantification of candidate molecules in complex samples. While immunoassays are the current standard for biomarker validation, SRM may become a complement addition within the next decade. SRM is attractive for its high reproducibility and multiplexing capabilities. However, it needs to become more standardized before use in the clinic.

In sum, which are the characteristics of an ideal biomarker?

The ideal aspects are based on its usefulness in the clinic. An ideal biomarker should be non-invasive while easy to test (e.g. not invasive biopsies). Its measurement should be reproducible and accurate within cost-effective, standardized techniques. Furthermore, as described below, the ideal biomarker should be diagnostic with high specificity and sensitivity while being able to distinguish patients with GVHD from those without the disease (e.g. GI GVHD vs. infectious colitis, skin GVHD vs. drug rash). If a biomarker correlates with treatment response, then it could be used to guide the treatment intensity in high risk patients and immunosuppression withdrawal in low risk patients. Probably the most useful biomarker will be able to risk stratify patients before GVHD clinical signs appear which will allow for its use in preemptive trials. Finally, if the biomarker indicates the pathophysiology of GVHD, it could be targeted and represent a novel class of treatment.

Statistical considerations

Sample sizes

The number of specimens that should be tested depends on the objective of the study and the extent of biomarker variability in the study. When the objective is to select a subset of biomarkers from a list of candidates, the following factors contribute to variability: the disease’s subtypes (i.e., skin GVHD, GI GVHD); the capacity of the biomarkers to discriminate among the different disease subtypes; the number of biomarkers being studied; the number of case and control subjects; and the statistical algorithm used to select promising biomarkers. Thus, as suggested by Pepe et al. [87], there are no simple methods for recommending samples sizes. In particular, traditional sample size calculations that are based on statistical hypothesis testing are not relevant. Pepe et al. propose that investigators use computer simulations to guide their choice of sample sizes. By varying the number of cases and controls, one can assess at what sample size a reasonable proportion of promising biomarkers is likely to be selected for further study [87].

Single versus multiple biomarkers of GVHD

In most cases, no single biomarker is sufficiently sensitive or specific on its own for either a diagnostic or predictive test. Thus, the simultaneous use of several markers may increase specificity, predictability, or diagnostic performance. To create a comprehensive GVHD biomarker panel, proportional odds logistic regression models are used to determine a composite panel. Presumably, for a GVHD diagnosis, a combination of tissue-specific and systemic biomarkers will be more informative than individual biomarkers. However, if a biomarker is not highly correlated to other biomarkers or clinical predictors, one or two biomarkers could be sufficient for either diagnostic or predictive tests.

Receiver operating characteristics (ROC) curves

Several statistical methods can be used to estimate the diagnostic likelihood ratio of a continuous biomarker. The ROC curve is primarily used [88]. It is a plot of the true-positive rate (sensitivity = 1 − false negative error rate) versus the false-positive rate (1 − specificity), and is associated with rules that classify an individual as “positive” if the marker value is above a threshold c for all possible thresholds. Because combinations of multiple markers are often required, combining the ROC curves of all biomarkers is an optimal way of estimating the risk score because the ROC curve is maximized at every cutpoint [89].

Training, validation, independent sets

One statistical approach for the validation of biomarkers is to randomly divide patients into training and validation sets. The statistical model is developed with the training set and subsequently tested in the validation set, therefore representing a blinded measure of biomarker performance [54]. Because of potential center effects, the biomarkers must also be tested in independent sets [64]. Finally, biomarkers must be validated in multicenter prospective studies [90, 91].

Risk stratification

Predictive models can compare different risk groups using several metrics [92]. The most frequently used are (1) hazard ratios from Cox proportional hazard analysis and their statistical significance, (2) AUC, the area under the ROC curve also called C-statistic, a scaled version of the Wilcoxon Rank Sum statistic, (3) model calibration, which is usually assessed with a Hosmer–Lemeshow test that requires a categorization of fitted probabilities, i.e. deciles, (4) net reclassification index as described by Pencina et al. [93] that requires risk categories, which might be arbitrary and not well-defined, and (5) continuous form of the net reclassification index [94] which does not require categories.

aGVHD biomarkers

GVHD is not only a systemic immunological disorder but can also affect specific organ systems, including the skin, GI tract, and liver. Due to the long-recognized “cytokine storm” that occurs early after donor graft infusion, cytokines and their receptors have been tested as potential aGVHD biomarkers [95].

Individual biomarkers identified using aGVHD’s pathology

Table 1 summarizes several studies that identify candidate biomarkers based on aGVHD’s pathology.

Table 1 Candidate aGVHD biomarkers based on aGVHD’s pathology

Soluble IL-2 receptor α chain (sIL-2Rα) concentrations were found to be increased in aGVHD patients in many studies [96100]. However, some studies found that sIL-2Rα concentrations were also increased in patients with other transplantation-related complications such as veno-occlusive disease and sepsis [100]. Concentrations of IL-18 are closely correlated with IL-2Rα concentrations [97, 101]. Recently, August et al. reported that patients with elevated sIL-2Rα, sTNFR1, and sCD8 had high predictive values for aGVHD occurrence. Using these three markers, the authors demonstrated the feasibility of detecting severe aGVHD prior to the appearance of clinical symptoms [102]. Similarly, TNF-α and its receptors, particularly TNFR1, were implicated in the pathology of aGVHD; their concentrations were found elevated in aGVHD patients [102106]. The same precautions used to evaluate sIL-2Rα elevation in other complications post-transplant should be used. The roles of TNF-α/TNFR1 and IL-2/IL-2Rα in aGVHD pathogenesis are supported by evidence that suggests that antibodies directed against TNF-α or TNFR1, or IL-2/IL-2Rα, are effective therapies for steroid-refractory aGVHD [107]. Acute phase reactants such as C-reactive protein (CRP) and IL-6 were found to be increased in patients with aGVHD [103, 108110]. Concentrations of IL-8 were clearly correlated with aGVHD in one study by Uguccioni et al. [111]. However, Schots et al. [103] showed that IL-8 was released due to all types of complications post-transplant, rather than specifically in cases of aGVHD. Similarly, increases in IL-8 and other cytokines (e.g., IL-6, and IL-18) as aGVHD diagnostic markers were not confirmed in a study that only included patients receiving a reduced intensity conditioning regimen [112]. Unexpectedly, anti-inflammatory cytokines such as IL-10, which operates to inhibit the function of Th1 cells while promoting regulatory T cells, were found increased in some studies [96]. IL-7 is the key cytokine implicated in the homeostatic proliferation of lymphocytes after the lymphopenia induced by the preparative regimen [113]. At days 7 and 14 post-HSCT, increased IL-7 concentrations have been correlated to the development of aGVHD in recipients of both myeloablative and reduced intensity conditioning [114116]. Chemokines and chemokine receptors that are mediators of lymphocyte trafficking to the target organs and lymph nodes were also found to be elevated in patients with aGVHD [117119].

Hepatocyte growth factor (HGF) is a multifunctional cytokine that is secreted by mesenchymal cells and acts primarily on cells of epithelial origin. Okamoto et al. observed increased serum HGF concentrations in patients who developed aGVHD. These authors also found that these increased HGF concentrations correlated significantly with the severity of aGVHD [120]. HGF appears to belong to a different category of biomarkers, representing a physiologic response to aGVHD damage. In this respect, HGF seems similar to cytokeratin-18 fragments (KRT18), markers of epithelial apoptosis that have been associated with intestinal and hepatic aGVHD damage [121]. However, HGF possesses anti-apoptotic properties and acts as a mitogen for hepatocytes, enhancing liver repair and regeneration. HGF administration has been shown to prevent aGVHD in a murine model [122]. HGF would therefore appear not only to indicate the extent of target organ damage from aGVHD, but may also reflect the physiologic response intended to limit further damage from aGVHD. KRT18 and markers of endothelial dysfunction (e.g., angiopoietin-2, VEGF, and thrombomodulin) were found elevated in steroid-refractory aGVHD patients [123]. Rezvani et al. [124] found that a decrease of 0.5 g/dl in serum albumin from the pre-transplantation baseline level to the onset of treatment for aGVHD predicted the subsequent development of aGVHD and survival in a cohort of 401 patients. Recently, fecal concentrations of calprotectin have been reported in two independent studies as an excellent prognostic value when measured at diagnosis of intestinal GVHD [49, 50].

Biomarkers identified using proteomics

Table 2 summarizes several studies that identify candidate biomarkers based on proteomics discovery.

Table 2 Candidate aGVHD biomarkers identified by proteomics

Four studies have identified the proteomic pattern of aGVHD using MS-based approaches [57, 58, 66, 118]. The peptide set from Kaiser’s study was used to screen 63 samples collected from 33 patients after allo-HSCT [125]. A subsequent blind evaluation of 599 samples from 141 patients enabled the prediction of aGVHD before clinical symptoms with a sensitivity of 83 % and a specificity of 75 %.

Clinical symptoms of the skin (e.g., maculopapular rash) and GI tract (e.g., nausea, diarrhea) caused by aGVHD can be difficult to distinguish from other causes (e.g., infectious, drug-induced). Thus, biomarkers that are target specific may improve the diagnosis of aGVHD. Plasma pooled from ten patients with skin-specific aGVHD was compared to plasma pooled from ten controls by proteomics. Elafin emerged as the lead biomarker candidate for skin aGVHD detection at the time of clinical diagnosis. Plasma elafin concentrations in samples from 492 patients had significant diagnostic and prognostic value, including long-term survival, as a biomarker for skin aGVHD [65]. Using the same proteomics strategy in patients with GI aGVHD, Regenerating-Islet-Derived-3-alpha (REG3α) was discovered and validated as a biomarker of lower GI aGVHD [64]. In a follow-up study, REG3α, that was compared to KRT18 and HGF, showed a better diagnostic precision for lower GI aGVHD than the other two GI aGVHD markers [126].

There are also shortcomings in the prediction of the response to aGVHD therapy. Recently, Luft et al. showed that KRT18 and markers of endothelial dysfunction are elevated in steroid-refractory aGVHD patients [123]. Six previously validated diagnostic biomarkers of aGVHD from samples prospectively obtained at the initiation of treatment, day 14, and day 28, in a multicenter, randomized, four-arm, phase II clinical trial for newly diagnosed aGVHD were measured. For each of the three time points, aGVHD onset, 2 weeks into treatment, and 4 weeks into treatment, the six-biomarker panel predicted the clinical outcomes of non-response at day 28 post-therapy and mortality at day 180 from onset [127]. However, none of these studies were designed to find biomarkers of glucocorticoid resistance. Thus, a plasma proteomic approach comparing responders and non-responders was developed and identified suppression of tumorigenicity 2 (ST2) as the most significant of the 12 markers of non-response to GVHD therapy and subsequent nonrelapse mortality. Patients with high ST2 at therapy initiation were 2.3 times more likely not to respond to treatment and 3.7 times more likely to die 6 months after therapy. Patients with low ST2 values experienced less nonrelapse mortality than patients with high ST2 regardless of the GVHD grade. In addition, when tested at day 14 post-transplant, prior to aGVHD diagnosis, plasma values of ST2 were associated with nonrelapse mortality at 6 months after transplant. Therefore, ST2 levels measured at initiation of GVHD therapy and early in the transplant course improve risk stratification for GVHD and nonrelapse mortality after transplantation. ST2 is thus considered as one of the most promising of all currently known aGVHD biomarkers [91].

GVHD biomarkers and personalized medicine

Given the progress being made in GVHD biomarker identification and validation, it is not surprising that clinical trial design will begin incorporating biomarkers. Target-specific diagnostic biomarkers that can differentiate skin GVHD from other rashes and GI GVHD from other forms of enteritis will allow for replacement of invasive biopsies. First, a simple observational trial during which samples and biopsies will be taken at the onset of GVHD should be performed. During this trial, physicians will treat according to symptoms and perform biopsies as usual. A retrospective analysis of the samples with different thresholds of biomarkers will determine whether the biomarkers can replace the invasive biopsies. If this study concludes that the biomarker does as well as the biopsies, then the next trial would be a randomized interventional trial; one-half of the patients will be treated according to the biopsy results, and one-half will be treated according to the biomarker results. The development of GVHD and other outcomes will then be evaluated.

The most important role of biomarkers is risk stratification for risk-adapted clinical trials. At present, given the absence of further risk stratification, the standard of care for all patients with GVHD is the prompt initiation of systemic steroid treatment, with the addition of second line agents reserved for patients who fail initial therapy. Unfortunately, most patients who require second-line therapy die, highlighting the need for refinement of risk beyond what the current grading system provides. Current biomarkers have enough sensitivity and specificity to allow for risk stratification in patients with newly diagnosed GVHD. In turn, early identification of patients at high risk for steroid unresponsiveness may permit alternative testing or additional therapies before the development of refractory disease. Equally important is the identification of low risk patients who will respond well to treatment. These patients may tolerate a more rapid tapering of steroid regimens for prevention of long-term toxicity, infection, and loss of the GVL effect. A schema for treating newly diagnosed GVHD using biomarkers is shown in Fig. 3.

Fig. 3
figure 3

Risk stratification management of a patient after allo-HSCT. Risk analysis will begin shortly after HSCT. Biomarkers will help predict the risk of GVHD and guide the therapy of those with an increased risk for severe GVHD. The biomarkers will indicate need for treatment intensification and the response to treatment. Patients transplanted with high risk leukemia or with significant minimal residual disease (MRD) will be considered for additional consolidative chemotherapy. Scheduled MRD analysis will assess the relapse risk and could lead to the use of more specific immunotherapies such as infusions of donor total lymphocytes (DLI) or chimeric antigen T cells (CAR) with suicide gene. The end goal is to create a more personalized, risk-adapted approach

The ability to identify patients at high risk for GVHD early in their transplant and treatment course also has important therapeutic consequences, including preemptive interventions. The success of preemption must include not only a reduction in the incidence of GVHD, but also in infectious complications and relapse. Ultimately, a randomized trial will be needed to assess the effectiveness of GVHD preemption. An example of a proposed preemptive clinical trial based on biomarker risk stratification is shown in Fig. 4.

Fig. 4
figure 4

Proposed clinical study for newly diagnosed GVHD. Biomarker cutpoints will determine if a patient is at high or low risk of treatment unresponsiveness at the diagnosis of GVHD. Low risk patients will receive the standard GVHD treatment; high risk patients will be randomized to receive either the standard GVHD treatment or an intensified GVHD treatment. Comparison of outcomes from the randomized high risk groups will show if intensified treatment at onset of GVHD improves response rates and lowers mortality in high risk patients identified by biomarkers

Challenges and pitfalls of biomarkers development

The daunting process of biomarker development and the huge logistical challenges for integration into clinical trials has limited the wide use of GVHD biomarkers so far. Indeed, the different type of conditioning given [full (including total body irradiation-based) or reduced] seems to have an impact on a biomarker levels [91, 128]. Another limitation is that GVHD biomarker studies have mostly been performed in T cell repleted HSCT. In addition, none of these biomarkers have been studied in large cohorts of recipients receiving umbilical cord blood transplant. Due to the increasing number of double cord transplants performed and the high rate of grade II-IV GVHD observed in these cohorts [129132], it has become even more crucial to study biomarkers in these patients. Another possibility for differences is the diverse type of prophylaxis administered as some are known to induce less GVHD such as the combination of sirolimus, tacrolimus and low-dose methotrexate [133]. All of these parameters will need to be evaluated in large scale multicenter studies.

Another step to overcome is to determine which time point or which combination of different time point measurements will be the most useful. This will be best achieved by realizing a kinetic of biomarker level changes mostly during the first month post-transplant such as published in some studies [91, 99, 102, 134].

Ideally, the validated biomarkers should be subjected to a multicenter clinical prospective study as they have not been tested in such a far-spanning population yet. This independent validation is important because a risk algorithm should take into account the variability between centers (center effect) and the individual risks. The successful design of subsequent trials, which is ideally performed through an institution such as the Blood and Marrow Transplant Clinical Trial Network (BMTCTN), should establish a unique resource for bone marrow transplantation (BMT) clinicians and provide a further national resource for investigators to explore BMT. However, this endeavor will be expensive and might include some variation in collection and interpretation of clinical data.

Future directions

As mentioned above, the first future direction is a blinded evaluation of these biomarkers from samples collected in a multicenter prospective study. A multicenter cohort reduces center effects and facilitates the successful design of subsequent trials. Because biomarkers may represent promising targets, other directions include new therapeutics. These drugs would target the appropriate effector T cell, thus increasing efficacy and decreasing toxicity. This approach represents the first step in a continuum of research that is expected to lead to the development of pharmacologic strategies for specifically treating aGVHD.

So far, development of biomarkers post-HSCT has focused on aGVHD biomarkers. Clinical diagnosis and current consensus criteria for cGVHD are labor-intensive and there is yet to be any widespread validation of them. Thus, future biomarker discovery efforts and validation for cGVHD could be particularly valuable.

Recurrent malignancy remains a major cause of mortality following HSCT thus risk stratification for relapse is a necessity and should be conducted in parallel with risk stratification for GVHD that is tightly linked to the GVL effect. Future approaches to minimizing the risk of relapse will consider factors from both the patients and the underlying malignancy. Possible ways to ameliorate the current standards will be the early identification of very high risk patients, and close monitoring for disease relapse particularly with minimal residual disease. Possible interventions to implementing personalized medicine could be (i) combination of targeted leukemic therapies and stem cell transplant, (ii) adapted GVHD prophylaxis that maintains GVL, (iii) improved disease-specific conditioning, (iv) effective maintenance therapy after transplant with new drugs, and (v) use of new promising cellular therapies such as specific antitumor T cells or chimeric Antigen Receptor (CAR) T cells. An example of personalized medicine based on risk stratification of GVHD and relapse after transplantation is shown in Fig. 5.

Fig. 5
figure 5

Proposed GVHD preemptive clinical study. Biomarker cutpoints will risk stratify patients at low or high risk of developing GVHD before occurrence of the clinical signs. Low risk patients will have no intervention; high risk patients will be randomized to receive either a standard GVHD intervention or none. Comparison of outcomes from the randomized high risk groups will show whether the preemptive intervention lowers aGVHD incidence in high risk patients identified by biomarkers. The expectation is that the subclinical aGVHD could be treated, which would defuse the full blown graft-versus-host reaction

Conclusions

Proteomics is a revolutionary field that can be used to detect the most proximal proteins to the real-time pathophysiology of aGVHD. In a short time, the use of proteomics has led to the identification of novel aGVHD biomarkers, which are unlikely to have been discovered by traditional hypothesis-driven research. A promising proteomics approach is to use protein biomarkers in risk stratification to better employ current disease treatment modalities. Furthermore, the biomarker findings presented above offer the potential for exploring targeted therapeutics. Unlike genes, protein levels may be influenced by several post-transcriptional modifications and other factors, such as the cytokine milieu. The principal barrier that must be circumvented is the validation of biomarker concentrations in different types of allo-HSCT settings [e.g., conditioning intensity, donor sources (particularly cord blood, T cell-depleted grafts)]. Achieving this aim will require a much larger validation study, ideally in a multicenter prospective trial. Once an algorithm for each setting is established, personalized medicine will be possible.