Background

Within fundamental and clinical research and in clinical practice, biomarkers play an important role to facilitate disease diagnosis, prognosis, and selection of targeted therapies in patients. As such, biomarkers are critical for personalized medicine to improve disease stratification: the identification of groups of patients with shared (biological) characteristics, such as a favorable response to a particular drug [1, 2]. Biomarkers need to fulfill a number of requirements, the most important of which is to show high predictive value. From a practical perspective, the detection method for a biomarker must be accurate, relatively easy to carry out, and show high reproducibility [3]. Over the last decade, there has been an increasing interest in biomarkers at the hand of rapid developments within high-throughput molecular biology technologies, capable of identifying “molecular biomarkers” [4, 5]. Molecular biomarkers possess a critical advantage over more traditional biomarkers during the exploratory phase of biomarker discovery, as many candidate molecular biomarkers can be assayed in parallel. This particularly involves screening of (epi)genomic features at a genome-wide scale, often making use of powerful next-generation sequencing (NGS)-based technologies. These screens can assess very large numbers of loci for the presence or absence of a certain (epi)genomic feature. Subsequently, these loci can be evaluated as a potential biomarker by determining their correlation between samples with different characteristics, for example, by comparing healthy versus diseased tissue.

To be suitable for biomarker discovery, (epi)genomic profiling assays need to fulfill a number of important requirements. To accommodate sample collection for batch processing, clinical samples are often preserved by freezing or by formaldehyde crosslinking. Therefore, an important requirement for (epi)genomic biomarker screening technologies is that these are compatible with processed samples. Additionally, this allows inclusion of clinical samples that have been processed for biobanking, or to use such samples for replication or validation. Biobanks collect large numbers of samples such as tissues or DNA (deoxyribonucleic acid) and the associated patient information, which is highly valuable for retrospective biomarker studies [69]. Exploratory screens for candidate biomarkers mainly rely on the use of patient specimens, which are obtained in small quantities, while also biobanks often contain limited quantities of patient material. Therefore, a second requirement is that assays used for biomarker discovery are compatible with miniaturization to allow processing of low-input samples. Furthermore, robust biomarker discovery is dependent on the screening of large numbers of samples due to the inherent clinical and biological variability between patient samples [10]. Assays used for biomarker discovery therefore benefit from automation and digitalization, facilitating upscaling while reducing the chance of errors due to human handling.

Genomic features that are utilized for molecular biomarker discovery can be separated in two categories: (i) changes in the DNA sequence itself, such as mutations and rearrangements, and (ii) changes in the epigenome, represented by molecules and structures associated with the DNA such as DNA methylation and post-translational histone modifications. This review will focus on the latter category, as recent developments in epigenetic profiling technologies have not only greatly increased our knowledge on epigenetic regulation, but also allow for large-scale discovery of molecular epigenetic biomarkers. The first section of this review provides an overview of epigenetic features and how these can be assayed. We discuss how misregulation of epigenetic processes may lead to disease, providing mechanistic rationale for the use of epigenetic features as biomarkers. The feasibility of applying epigenetic biomarkers in the clinic is demonstrated by examples of DNA methylation biomarkers that have reached clinical stages. In the second part of this review, we will focus on current genome-wide epigenomic profiling technologies, and whether these are already or will likely become compatible with biomarker discovery in the near future. We will evaluate these approaches with three criteria in mind: (i) the possibility to use frozen or chemically fixed material in these assays, (ii) compatibility with miniaturization and single-cell profiling, and (iii) the current level of automation.

Main text

The epigenome

Within a eukaryotic cell, the DNA is packaged to fit into the small volume of the nucleus in a highly organized fashion. The basic unit of chromatin involves the DNA wrapped around nucleosomes consisting of two copies of each of the core histones H2A, H2B, H3, and H4: the so-called beads-on-string structure [11]. Subsequent compaction leads to higher order structures including the formation of very dense arrays of nucleosomes observed in heterochromatin [12, 13]. Despite being tightly packed, the chromatin appears to be highly plastic to allow processes such as transcription, DNA damage repair, DNA remodeling, and DNA replication. This plasticity is facilitated by several factors that influence both local and global chromatin architectures. The most prominent features affecting chromatin structure are reversible covalent modifications of the DNA, e.g., cytosine methylation and hydroxymethylation mainly occurring within the genomic CG context (CpGs), and reversible post-translational modifications of histones, e.g., lysine acetylation, lysine and arginine methylation, serine and threonine phosphorylation, and lysine ubiquitination and sumoylation. These modifications are set by specific classes of enzymes: DNA methyltransferases (DNMTs) in case of cytosine methylation [14] or histone-modifying enzymes [15]. Besides facilitating chromatin compaction, modifications of the DNA and histones are read by adaptor molecules, chromatin-modifying enzymes, and transcription factors (TFs) that contribute to the regulation of transcription and other chromatin-related processes [15, 16]. Next to modifications of DNA and histones, the three-dimensional (3D) conformation of the DNA within the nucleus imposes an additional regulatory layer of gene expression [17].

The chromatin state of a cell, including the genomic localization of modifications of DNA and histones, TF binding sites, and 3D DNA structure, is generally referred to as the epigenome. The epigenome is an important layer that regulates which parts of the genome are accessible and thereby active and which parts are condensed and hence inactive. As such, epigenetic changes are a major driver of development and important to gain and maintain cellular identity. Each of the approximately 200 distinct cell types in the human body has essentially the same genome but has a unique epigenome that serves to instruct specific gene expression programs present within the cells. To gain insight in this variation, the epigenetic features of these cell types (Fig. 1) are comprehensively studied at a genome-wide scale using high-resolution technologies as summarized in Table 1. Most of the approaches are based on NGS, which generally yield higher sensitivity and resolution as compared to alternative readouts such as microarrays, and provide additional information such as allele specificity [18, 19]. The International Human Epigenome Consortium (IHEC; http://www.ihec-epigenomes.org) and associated consortia such as BLUEPRINT and National Institutes of Health (NIH) Roadmap use these technologies to generate human reference data sets for a range of epigenetic features [2023]. The aim of IHEC is to generate approximately 1000 reference epigenomes which are made publicly available. This data contains a wealth of information on the epigenetic mechanisms acting in healthy cells and serves as a valuable reference for comparisons with malignant cells and tissues [24, 25].

Fig. 1
figure 1

Main epigenetic features (indicated by orange arrows) that can be assayed on a genome-wide scale using sequencing-based technologies

Table 1 Summary of the main epigenetic features and the principles, caveats, and requirements of the main technologies used for their profiling

Comparative analyses of epigenomes are complicated by the epigenetic variability that is present between individuals within a population. Genetic variation such as SNPs (single-nucleotide polymorphisms) or indels in regulatory sequences or mutations in epigenetic enzymes will have a direct effect on the epigenome [2629]. Furthermore, environmental factors such as lifestyle, stress, and nutrition influence epigenetic patterns [3033]. Also, epigenetic patterns change during aging. In fact, DNA methylation markers in saliva and blood can be used for accurate estimation of age [3437]. Thus, epigenetic patterns are plastic and change during development and over time. The variability between individuals has to be accounted for in epigenetic studies including biomarker discovery and hence large cohorts need to be studied to overcome the intra-individual variation. In this respect, it is important to note that the extent of the intra-individual variation is much less as compared to the variation observed between tissues within individuals, at least for DNA methylation [3840].

It has become increasingly clear that misregulation or mutations of epigenetic enzymes are at the basis of a broad range of syndromes and diseases [41]. Mutations in epigenetic enzymes are frequently observed in cancer [42], intellectual disability [43], neurological disorders such as Alzheimer’s, Parkinson’s, and Huntington’s disease [44], and autoimmune diseases such as rheumatoid arthritis [4547] and type 1 diabetes [48]. Most studies have been performed in cancer: ~30% of all driver genes characterized in cancer are related to chromatin structure and function [42]. Well-known examples of genes in which mutations can promote or drive tumorigenesis include DNMT3A and TET2, involved in DNA methylation and DNA demethylation, respectively, and EZH2, which is part of the polycomb repressive complex 2 (PRC2) complex that trimethylates lysine 27 on histone 3 (H3K27me3) [4951]. Apart from mutations in epigenetic enzymes, mistargeting of epigenetic enzymes, such as the silencing of CDKN2A and MLH1 by aberrant promoter DNA methylation, is considered to drive tumor formation [52]. Given their prominent roles in cancer and various other diseases, epigenetics enzymes represent promising targets for therapeutic intervention. For example, small molecules targeting enzymes involved in the post-translational modifications of histones, such as SAHA (suberanilohydroxamic acid; Vorinostat) inhibiting histone deacetylases (HDACs), are effective as therapeutic drugs for a range of tumor types including T cell lymphomas in case of SAHA [5355]. See Rodriguez and Miller [56], Qureshi and Mehler [57], and various papers within this special issue for excellent recent reviews on the use of small molecules to target epigenetic enzymes and their current status in clinical applications.

Epigenetic biomarkers

Molecular diagnosis and prognosis is traditionally often based on (immuno)histochemistry or immunoassays, for example by assaying prostate-specific antigen (PSA) in case of testing for prostate cancer [58]. Also , changes in RNA (ribonucleic acid) expression, genetic alterations, and chromosomal abnormalities represent powerful biomarkers in various diseases including cancer [59]. Notable examples are mutations in the BRCA1 and BRCA2 genes in breast and ovarian cancer or the presence of the Philadelphia chromosome in leukemia [6062]. With the growing understanding that changes in the epigenome and chromatin are related with or causative in disease [41], it became clear that epigenetic alterations represent promising features to be used as biomarkers. An important characteristic for their use as biomarker is that epigenetic marks, in particular DNA methylation, are known to survive sample storage conditions reasonably well [63, 64]. Another convenient characteristic is that almost every biological tissue sample or body fluid such as blood or saliva can be used for analysis of DNA methylation and other epigenetic marks [22, 65, 66]. This robustness makes the application of epigenetic biomarkers in a clinical environment attractive.

Over the recent years, it has become clear that epigenetic features contain a high predictive value during various stages of disease. These analyses thus far mainly focused on DNA methylation. DNA methylation has been shown to be informative for disease diagnosis, prognosis, and stratification. Some of the DNA methylation-based epigenetic biomarkers, such as the methylation status of VIM and SEPT9 for colorectal cancer, SHOX2 for lung cancer, and GSTP1 for prostate cancer, are in clinical use and diagnostic kits are commercially available [6771]. In case of one of the best characterized biomarkers, GSTP1, a meta study (mainly using prostatectomy tissue or prostate sextant biopsies) showed that hypermethylation of the promoter allows to diagnose prostate cancer with a sensitivity of 82% and a specificity of 95% [72]. Importantly, the use of multiple DNA methylation biomarkers (combining hypermethylation of GSTP1, APC, RASSF1, PTGS2, and MDR1) resulted in a sensitivity and specificity of up to 100% [73]. See Heyn and Esteller [74] for a recent comprehensive overview of DNA methylation biomarkers and its potential use in the clinic. In addition to its diagnostic potential, it has been well established that DNA methylation is informative for patient prognosis in terms of tumor recurrence and overall survival. For example, the hypermethylation of four genes, CDKN2A, cadherin 13 (CDH13), RASSF1, and APC, can be used to predict tumor progression of stage 1 non-small cell lung cancer (NSCLC) [75]. In addition to disease prognosis, DNA methylation has been shown to be valuable for patient stratification to predict response to chemotherapeutic treatment. A well-known example is hypermethylation of MGMT in glioblastoma, which render the tumors sensitive to alkylating agents [76, 77] such as carmustine and temozolomide.

Together, these examples show the power and feasibility of using epigenetic features, and in particular DNA methylation, as biomarkers. Epigenetic biomarkers are complementary to genetic biomarkers. Whereas genetic mutations can (among others) disrupt protein function due to amino acid changes, epigenetic alterations can de-regulate mechanisms such as transcriptional control, leading to the inappropriate silencing or activation of genes. Notably, epigenetic changes occur early and at high frequencies in a wide range of diseases including cancer [78]. It has been suggested that epigenetic alterations occur at higher percentages of tumors than genetic variations, resulting in a higher sensitivity in the detection of tumors [79].

Genome-wide epigenetic profiling for DNA methylation biomarkers

Thus far, the discovery of the epigenetic biomarkers mostly relied on targeted approaches using individual gene loci known or suspected to be involved in the etiology or progression of the disease or other phenotype under study. Despite the challenges in the identification of biomarkers using such approaches, this yielded a number of important epigenetic biomarkers. However, these approaches require a priori knowledge for the selection of candidate biomarkers.

In order to perform unbiased screens in the exploratory phase of biomarker discovery, genome-wide profiling technologies have spurred molecular biomarker discovery (detailed information on epigenomic profiling assays is presented in Table 1). Using these technologies, the entire (epi)genome can be interrogated for potential biomarkers by comparing healthy versus diseased cells/tissue, malignant versus non-malignant tumors, or drug-sensitive versus drug-resistant tumors. This enables selection of candidate biomarkers that are most informative for disease detection, prognosis, or stratification. The use of genome-wide screens furthermore enables to detect and evaluate combinations of (many) candidate loci, which often results in increased sensitivity and specificity of the biomarker. Importantly, the identification of individual genomic loci or genes as biomarkers from large datasets requires robust statistical testing such as multiple-testing correction (although traditional tests like the Bonferroni correction are over-conservative since there is often correlation between loci, i.e., they are not independent) or stringent false discovery rate (FDR) control (for example, by the Benjamini–Hochberg procedure) [8082]. To define sets of biomarkers from large dataset, alternative statistical methods (such as sparse principle component analysis (PCA) or sparse canonical correlation analysis (CCA) [83, 84]) are available as well. In light of (i) challenges with the experimental setup when using patient material, (ii) costs, and (iii) the extensive computational analysis associated with the exploratory phase of biomarker discovery, genome-wide screens are often performed on relatively small cohorts. Independent of the (statistical) methods used, it is essential to validate (sets of) candidate biomarkers in follow-up studies on large cohorts using targeted epigenetic approaches before potential application in the clinic [85].

Recent years have seen an increasing number of studies using genome-wide epigenetic profiling to predict disease outcome. For a range of tumors, including childhood acute lymphoblastic leukemia [86], kidney cancer [87], NSCLC [88], rectal cancer [89], cervical cancer [90, 91], breast cancer [92, 93], and glioblastoma [94], DNA methylome analysis has been shown to be of prognostic value. Most of these studies define changes in DNA methylation at single sites or at small subsets of sites that represent potential disease signatures. Although these studies are often restricted to a subset of CpGs within the genome and mostly rely on relatively small sample sizes, they show the power of performing genome-wide biomarker screens.

Currently, the most popular platform used in the exploratory phase of DNA methylation biomarker discovery represents the Infinium HumanMethylation450 BeadChip array (further referred to as “450K array”; see a short explanation of the 450K array within Table 1). The probes on the 450K array mainly represent functional CpG islands and functional elements such as promoters, enhancers, and TF binding sites. Main advantages of the 450K array for the detection of DNA methylation as compared to other DNA methylation platforms include (i) its high reproducibility, (ii) the straightforward analysis methods, (iii) the large number of samples that have been profiled using the 450K array thus far (which can be used for comparative purposes), and (iv) the relatively low costs. A disadvantage, like with all bisulfite-based methods (unless combined with additional chemical procedures), is that the 450K array is unable to distinguish between DNA methylation and DNA hydroxymethylation. Hydroxymethylated cytosines represent an intermediate step during demethylation of methylated cytosines but is relatively stable and is therefore likely to have specific biological functions as well [95]. It should be noted that levels of DNA hydroxymethylation are generally much lower as compared to levels of DNA methylation (for example, DNA hydroxymethylation levels are >95% lower in case of peripheral blood mononuclear cell (PBMC) [96]). A further disadvantage of the 450K array is that  genetic differences between samples might result in false positives, in particular since a subset of probes on the 450K array target polymorphic CpGs that overlap SNPs [97, 98]. For association studies using large cohorts, computational methods (based on principle components) have been developed to account for population stratification resulting from differences in allele frequencies [98100].

To enable robust screening for a (set of) potential biomarker(s), most current studies apply the 450K array on up to several hundred samples. To narrow down and validate candidate biomarkers, more targeted DNA methylation assays are used on the same or a very similar-sized cohort [101]. Subsequently, the remaining candidate biomarkers are further validated on larger cohorts using targeted DNA methylation assays that are compatible with routine clinical use, for example, by amplicon bisulfite sequencing [85]. Using this powerful workflow, tumors for which prognostic biomarkers have been identified include rectal cancer [102], breast cancer [103], hepatocellular carcinoma [104], and chronic lymphocytic leukemia (CLL) [105, 106]. Interestingly, using a similar workflow, sets of DNA methylation biomarkers have recently been identified that are prognostic for the aggressiveness of tumors in prostate cancer [107, 108]. Such studies are very important for improving treatment of prostate cancer by avoiding (radical) prostatectomy in cases where careful monitoring of the tumor over time is preferred.

Biomarkers other than DNA methylation

The majority of epigenetic biomarkers identified thus far involve changes in DNA methylation. However, in light of the various types of epigenetic misregulation associated with diseases, changes in epigenetic features other than DNA methylation are likely to become powerful molecular biomarkers as well. ChIP-Seq profiling has revealed prominent differences in binding sites of post-translational histone modifications and other proteins between healthy and cancer tissue, both in leukemia as well as in solid tumors. For example, localized changes in H3 acetylation have been reported in leukemia (see, for example, Martens et al. [109] and Saeed et al. [110]). For solid tumors, differential estrogen receptor (ER) binding and H3K27me3 as determined by ChIP-Seq has been shown to be associated with clinical outcome in breast cancer [111, 112]. Also, androgen receptor (AR) profiling predicts prostate cancer outcome [113]. A recent study identified tumor-specific enhancer profiles in colorectal, breast, and bladder carcinomas using H3K4me2 ChIP-Seq [114]. Next to ChIP-Seq, DNAseI hypersensitivity assays have identified tumor-specific open chromatin sites for several types of cancer (see, for example, Jin et al. [115]). In terms of chromatin conformation, it has recently been shown that disruption of the 3D conformation of the genome can result in inappropriate enhancer activity causing mis-expression of genes including proto-oncogenes [116, 117]. These examples show that, besides DNA methylation, changes in (i) protein binding sites (including post-translational histone modifications), (ii) accessible (open) chromatin, and (iii) the 3D conformation of the genome represent epigenetic features that are potential effective biomarkers (Fig. 1). The near absence of biomarkers based on these epigenetic features is mainly due to practical reasons. ChIP-Seq as well as other comprehensive epigenetic profiling technologies traditionally require (much) more input material, up to 1 × 106 cells or more, to obtain robust results as compared to DNA methylation profiling (Table 1). This is particularly challenging for (banked) patient samples, which are often available in small quantities that might not be compatible with epigenetic profiling other than DNA methylation profiling. Also, profiling of such epigenetic features often require elaborate and delicate workflows (Table 1). Hence, quantitation and reproducibility of ChIP-Seq and other epigenetic profiling assays besides DNA methylation profiling are challenging. Furthermore, DNA methylation profiling is better compatible with (archived) frozen or fixed samples.

However, the last 2 years have seen a spectacular progress in miniaturization of epigenetic profiling assays. In various instances, this included automation of (part of) the workflow, improving the robustness of the assays and its output. Also, improved workflows for epigenetic profiling of frozen or fixed samples have been reported. Although this involved proof-of-concept studies in basic research settings, these efforts are likely to have significant impetus on genome-wide epigenetic screens for candidate biomarkers. The remainder of this review will provide an overview of the current status of genome-wide epigenetic profiling and the technological advances that facilitate miniaturization, automation, and compatibility with preserved samples.

New developments in epigenetic profiling: compatibility with preservation methods

Most epigenetic profiling assays have been developed using fresh material in order to preserve the native chromatin architecture. However, epigenetic biomarker screens require the use of patient-derived clinical samples that are generally processed to preserve the samples as well as to allow convenient sample handling, for example, for sectioning of biopsies. Also, samples present in biobanks are fixed to allow long-time storage. In particular for retrospective studies, epigenetic profiling technologies that are applied for biomarker screens should therefore be compatible with methodologies that are routinely used for sample preservation: freezing and chemical fixation (in particular FFPE fixation) [118].

Freezing

Freezing of tissue specimens is typically performed by snap-freezing with subsequent storage at −80 °C or in liquid nitrogen [119]. Freezing seems to maintain nuclear integrity and chromatin structure very well (Fig. 2). WGBS [120], ChIP-Seq [121123], ATAC-Seq [124, 125], and DNAseI-Seq [126, 127] all have been shown to be compatible with frozen cells or tissues. 

Chemical Fixation (FFPE)

Chemical fixation generally includes overnight crosslinking with formaldehyde at high concentrations (up to 10%), followed by dehydration and paraffin embedding (so-called “FFPE”: formalin-fixed, paraffin-embedded) [128]. Although procedures for FFPE fixation are time-consuming, FFPE fixation has the advantage that samples can be stored at room temperature and that samples can be evaluated by morphology or immunohistochemistry (prior to possible further processing such as epigenetic profiling).

Fig. 2
figure 2

Compatibility of commonly used sample preservation methods with current epigenome profiling assays. A dotted line indicates that these assays would benefit from further optimization

FFPE conditions do not affect DNA methylation, and also formaldehyde and paraffin do not interfere with the WGBS profiling procedure [129]. However, epigenetic assays other than bisulfite-based DNA methylation profiling are cumbersome with FFPE samples (Fig. 2). In case of ChIP-Seq, crosslinking generally occurs in much milder conditions (1% formaldehyde for 10 min) as compared to the harsh conditions used for FFPE fixation [120], which can complicate shearing and epitope accessibility. Pathology tissue (PAT)-ChIP has been reported to prepare FFPE samples for ChIP-Seq by the use of deparaffinization, rehydration, and MNase treatment followed by sonication at high power [130, 131]. However, PAT-ChIP comes with various limitations including the long running time of the protocol (up to 4 days) and the fact that it is not compatible with all ChIP-grade antibodies. Interestingly, some of these issues have been resolved in the very recently developed fixed-tissue (FiT)-Seq procedure, which might open up new avenues for ChIP-Seq profiling of FFPE samples [114]. DNaseI-Seq on FFPE samples has been reported at the expense of a drop in signal-to-noise ratios of around 50% as compared to the use of fresh material [115].

Despite new developments for ChIP-Seq and DNaseI-Seq, this overview shows that DNA methylation is still the most robust of all epigenetic marks for profiling of samples that are processed by freezing or chemical fixation. Although most other epigenetic profiling assays are compatible with frozen samples (at the expense of signal-to-noise ratios for some of the assays), they are generally not or poorly compatible with FFPE specimens (Fig. 2). This also implies that for these assays, it is much more challenging to make use of laser microdissection to select specific regions of interest from specimens for epigenetic analysis, for example, to separate tumor cells from stromal cells [132, 133]. An additional advantage of using DNA methylation for biomarker screening is that, in contrast to the other epigenetic profiling assays discussed, the profiling can be performed on isolated genomic DNA. This enables the use of genomic DNA from clinical DNA banks to be included in DNA methylation biomarker screens.

It should be noted that in contrast to retrospective studies, it might be feasible to use fresh or fresh-frozen patient material for screening in prospective biomarker studies. However, the use of fresh(-frozen) material in these studies could interfere with further development of potential biomarkers if it turns out that these biomarkers are incompatible with (FFPE-)fixed patient material present in the clinic. In all cases, when collecting patient samples for profiling of epigenetic marks, it is important to keep the time between surgical removal and fixation or freezing as short as possible to avoid epitope destruction and/or breakdown of the chromatin. It would therefore be helpful if the procedure time up to fixation would be documented for banked samples, so as to evaluate whether such banked samples are suitable for the epigenetic profiling technology of choice.

New developments in epigenetic profiling: miniaturization and automation

The recent years saw great progress in low-input epigenetic profiling without significantly affecting signal-to-noise ratios (Fig. 3). Also, all main genome-wide epigenetic profiling assays are now compatible with single-cell readouts. An overview of the main technological advances that allowed miniaturization and single-cell readout is described in Table 2. Besides miniaturization, various epigenetic profiling assays, in particular ChIP-Seq, have been (partly) automated to improve reproducibility and to allow higher throughput. In this section, we briefly evaluate these new technological developments with respect to biomarker discovery.

Fig. 3
figure 3

Level of comprehensiveness of epigenetic data from global epigenetic profiling assays using an increasing number of cells as input

Table 2 Overview of the main technological advances that allowed miniaturization and single-cell readout of genome-wide epigenetic profiling assays

Miniaturization of epigenetic profiling

As summarized in Table 2, Fig. 3, and Table 3, the amount of cells required for three of the main epigenetic profiling assays is currently well compatible with amounts present in patient-derived specimens or amounts present in banked patient samples. For bisulfite-based DNA methylation profiling, a starting amount of 7.5 × 104 cells for the 450K array or 3 × 103 cells for WGBS/RRBS is sufficient to obtain high-quality genome-wide profiles. For ChIP-Seq, the minimum amount of starting material is highly dependent on the protein to be profiled and the antibody that is used [134]. Although both histone modification and TF binding sites (such as ER [111, 112]) are potentially powerful as biomarker, the minimum number of cells required for histone modification profiling (~1–5 × 104 cells) is much more compatible with patient samples than the number of cells required for TF profiling (generally 1 × 105 cells or more; Tables 1 and 2). ATAC-Seq and DNAseI-Seq are compatible with as low as 200 cells and 1 × 103 cells, respectively (Table 2) [115, 135]. Together, this shows that the input requirements for bisulfite-based DNA methylation profiling, ChIP-Seq (in particular for histone modifications), and ATAC-Seq/DNAseI-Seq are well compatible with most clinical samples. The minimum number of cells currently required for 4C-Seq and HiC-Seq, at least 1 × 107 cells, is currently too high for clinical use.

Table 3 Overview of the number of cells required for the various epigenetic profiling assays

Interestingly, all main epigenetic profiling assays can now provide single-cell readouts (Table 2, Table 3). The possibility to assay individual cells within populations allows interrogation of heterogeneity which in “bulk” profiling would be averaged. This is very informative for clinical samples which can be highly heterogeneous [136]. Single-cell profiling has been shown to be powerful in obtaining molecular signatures of heterogeneous populations that shift in cell type composition [137]. As such, an important clinical application of single-cell profiling is to screen for resistant versus non-resistant cells after drug treatment [138] or to monitor disease progression [139]. In terms of biomarker discovery, the use of single-cell assays will allow to screen for cell types that are most informative for disease stratification. Also, the level of heterogeneity as measured by single-cell studies might possibly by itself be informative for disease stratification. From a practical perspective, epigenetic profiling of single cells is challenging. Since one cell only contains two copies for each genomic locus to be assayed, any loss of material during washing or enrichments steps such as immunoprecipitations will significantly impact the outcome of the assay. Similarly, background signals are hard to distinguish from true signal. One of the main strategies to account for false negative signals as well as for aspecific background is to include a large number of cells in single-cell epigenetic assays to enable proper statistics. However, this results in (very) large datasets, for which computational and statistical analysis are generally challenging. For single-cell epigenetic profiling of clinical samples, there are two additional issues to consider: (i) generation of single-cell suspensions from patient samples might be challenging, and (ii) the number of cells required as input for single-cell epigenetic profiling is generally higher than for miniaturized epigenetic profiling in order to enable capturing of single cells (Fig. 4), which might affect compatibility with patient samples. Since single-cell technologies emerged very recently, further developments in technology (to increase sensitivity and specificity) and in computational analysis (for more robust statistical testing and model development) are to be expected. Once single-cell epigenetic profiling has fully matured, it will be very powerful for biomarker discovery in heterogeneous cell populations such as human blood samples and biopsies.

Fig. 4
figure 4

State-of-the-art microfluidic systems capable of performing single-cell epigenomic profiling. Simplified representation of a Fluidigm C1 integrated fluidic circuit design capable of capturing 96 single cell for ATAC-Seq [151] (a). Droplet microfluidic workflow applying barcoding of single-cell chromatin to enable pooling for subsequent ChIP experiments [152] (b). Alternatively, single cells can be captured by FACS (not shown)

Automation of epigenetic profiling

The use of genome-wide epigenetic profiling for biomarker discovery strongly benefits from automated procedures that are compatible with upscaling to facilitate large-scale screens. Main advantages of automation include (i) a reduction in variability and batch effects, both of which are frequently observed in epigenetic profiling, (ii) increased throughput, (iii) reduced procedure and/or hands-on time, and (iv) lower error rates. In light of the limited number of cells within clinical samples, a combination of automation and miniaturization is likely to be beneficial in most cases. This comes with the additional advantage of reduced reagent cost, which can be substantial considering the high costs associated with epigenetic profiling. It should be noted that epigenetic profiling thus far is mainly being performed within basic research settings on relatively small sample sizes, which are well compatible with manual handling. Therefore, most automated platforms have been developed recently to cope with the increasing sample sizes and the profiling of more challenging (clinical) samples. In this section, we focus on automation of bulk and miniaturized epigenetic profiling; information on automation of single-cell technologies is included in Table 2.

Efforts to design automated workflows for epigenetic profiling have mainly been focused on ChIP-Seq and to a lesser extent on DNA methylation profiling. This can be explained by the fact that DNA methylation profiling, and chromatin profiling (ATAC-Seq/DNAseI-Seq) as well, is relatively straightforward and therefore well compatible with manual handling. Considering 4C-Seq and HiC-Seq, these are relatively new technologies for which automated workflows have not been reported yet. For DNA methylation profiling, (parts of) the workflow for MBD-Seq, MethylCap-Seq, and MeDIP-Seq have been designed on custom-programmed robotic liquid handling systems [140142]. For ChIP-Seq, immunoprecipitations and subsequent sample preparation for sequencing have been designed on the same or similar robotic systems [143146]. However, these robotic workflows require large amounts of starting material in the range of 1 × 106 cells or more. Clearly, with such input requirements, these platforms are not readily compatible with biomarker discovery.

More recently, miniaturized automated platforms have been described for ChIP-Seq using PDMS (polydimethylsiloxane)-based microfluidic devices that have been designed to perform automated immunoprecipitations. These platforms allow to perform ChIP-Seq using as low as 1 × 103 cells [147] or 100 cells [148] due to very small reaction volumes, providing proof-of-principle that automated low-input ChIP-Seq profiling is feasible. However, to facilitate high-throughput profiling, it would be important to increase the number of parallel samples to be profiled, as currently these platforms contain a maximum of assaying four samples in parallel [147, 148]. Furthermore, integration with the labor-intensive DNA library preparation procedure would be desirable; stand-alone library preparation platforms on microfluidic devices have been reported [149, 150]. For DNA methylation profiling, various commercial low-input bisulfite conversion kits have been shown to be compatible with automation. However, a fully automated miniaturized DNA methylation profiling platform has not been reported yet.

Conclusions

Biomarkers are highly valuable and desirable in a wide range of clinical settings, ranging from pharmacodynamics to monitoring treatment. Here, we have provided an overview of recent developments within genome-wide profiling technologies that may enable future large-scale screens for candidate epigenetic biomarkers. When comparing compatibility with miniaturization, automation and tissue preservation methods, bisulfite-based DNA methylation profiling is currently by far superior to other epigenetic profiling technologies for large-scale biomarker discovery. DNA methylation assays are technically less challenging than most other profiling assays, as it is not dependent on delicate enzymatic reactions or on immunoprecipitation, but on chemical conversion. A critical advantage of DNA methylation profiling over other assays is that is not affected by freezing or chemical fixation, and therefore very well compatible with (archived) clinical samples. DNA methylation profiling has the additional advantage that it requires a relatively low number of cells as input. In line with these advantages, most of the epigenetic biomarkers that have been identified thus far involve changes in DNA methylation.

Despite the advantages of DNA methylation, various other epigenetic marks are promising biomarkers. Histone-modifying enzymes are frequently mutated in a range of diseases, often directly affecting epigenetic patterns of post-translational histone modifications. The main methodology to profile these post-translational histone modifications is ChIP-Seq. ChIP-Seq is challenging on samples containing low numbers of cells as well as on archived samples, often resulting in variability in signal-to-noise ratios. However, in view of the continuous improvements in ChIP-Seq procedures for (ultra-)low input samples and for fixed samples, large scale ChIP-Seq-based screens for candidate biomarkers is likely to become feasible in the near future. These screens might benefit from the automated ChIP(-Seq) platforms that are currently being developed. The development of such automated platforms will also facilitate robust integration of ChIP assays as a diagnostic tool in clinical practice.

Of the remaining technologies discussed in this paper, ATAC-Seq and DNAseI-Seq seem most compatible with profiling of clinical samples, requiring as low as several hundred cells as input. Both ATAC-Seq and DNAseI-Seq are compatible with frozen patient samples [125128], while DNAseI-Seq was recently successfully applied on FFPE samples [115]. However, as compared to DNAseI-Seq, the workflow of ATAC-Seq is much more straightforward as the adaptors for sequencing are inserted as part of the transposition. Also, at least for single-cell ATAC-Seq, a fully automated platform has been developed [151]. For biomarker discovery, compatibility of ATAC-Seq with FFPE samples would be highly desirable, as this would enable to include clinical samples from biobanks in large-scale ATAC-Seq profiling studies. This might be achieved by incorporating critical steps from the FFPE-compatible DNAseI-Seq. Although the use of open chromatin as an epigenetic biomarker has been rare thus far, the flexibility and ease of the recently developed ATAC-Seq (and possibly DNAseI-Seq) will undoubtedly boost the use of open chromatin in clinical research and clinical practice.

Together, this review shows that genome-wide epigenetic profiling technologies have very rapidly matured over the past decade. While originally these technologies were only compatible with large numbers of (in vitro cultured) cells, most of these can now be applied on samples containing very low numbers of primary cells down to single cells. Combined with an increasing number of sophisticated workflows and (automated) platforms, this will pave the way for large-scale epigenetic screens on clinical patient material. Such screens are essential to fill the need for new biomarkers for disease diagnosis, prognosis, and selection of targeted therapies, necessary for personalized medicine.