Single-cell ATAC-seq: strength in numbers
- 16k Downloads
Single-cell ATAC-seq detects open chromatin in individual cells. Currently data are sparse, but combining information from many single cells can identify determinants of cell-to-cell chromatin variation.
KeywordsK562 Cell Replication Timing Open Chromatin Chromatin Accessibility GM12878 Cell
Assay for transposase-accessible chromatin
DNase hypersensitive site
From populations to single cells, ATAC-seq detects open chromatin
ATAC-seq (assay for transposase-accessible chromatin) identifies regions of open chromatin using a hyperactive prokaryotic Tn5-transposase, which preferentially inserts into accessible chromatin and tags the sites with sequencing adaptors . The protocol is straightforward and robust and has become widely popular. Up to this point, ATAC-seq and other methods for the identification of open chromatin have required large pools of cells [1, 2], meaning that the data collected reflect cumulative accessibility across all cells in the pool. Now, independent studies from the Shendure and Greenleaf laboratories have modified the ATAC-seq protocol for application to single cells (scATAC-seq) [3, 4]. These studies provide a first look at cell-to-cell variability in chromatin organization by gathering data on hundreds  or thousands  of single cells in parallel.
How were the single-cell chromatin measurements obtained?
Buenrostro et al.  used a programmable microfluidic device (C1, Fluidigm) to isolate single cells and perform ATAC-seq on them in nanoliter reaction chambers (Fig. 1a, right panel). Each nanochamber was analyzed under a microscope to ensure that a single viable cell had been captured. This approach is simple and has the significant advantage of a carefully monitored reaction environment for each individual cell, although the throughput was limited to processing 96 cells in parallel. Buenrostro et al. sampled 1632 cells from eight different cell lines, including GM12878, K562, and H1 cells, and obtained an average of 73,000 reads per cell, about 20 times the number of reads per cell obtained using the barcoding strategy.
Does scATAC-seq capture validated open chromatin signal from single cells?
It is important to assess (1) whether the methods generate interpretable open chromatin information, and (2) whether the data are actually from single cells. Regarding (1), both studies show that the distribution of fragment sizes was characteristic of nucleosome-based inhibition of transposase insertion. In addition, both studies showed good overall correlation with chromatin accessibility from traditional bulk datasets, particularly from the lymphoblastoid cell line GM12878 and myeloid leukemia K562 cells [3, 4]. Further, aggregated data from 254 individual GM12878 cells yielded an accessibility pattern highly similar to the pattern produced by population-based ATAC-seq and DNase-seq approaches . Thus, scATAC-seq data capture characteristic features of open chromatin.
Whether the data are actually from single cells is simple to assess in the case of the microfluidic approach because the number of cells captured in each chamber is verified visually . In contrast, combinatorial cellular indexing relies on the presumed low probability of two cells carrying the identical barcode. To test this presumption, the researchers mixed human and mouse cells and found that reads associated with a single barcode map almost exclusively to either the human or mouse genome (the “collision” rate was 11 %) . Therefore, there is strong evidence that both methods generate interpretable chromatin data from single cells.
Single-cell chromatin data require a new analytic framework to account for fundamental differences from population-based data
Open chromatin data derived from populations of cells exhibit a wide range of signal intensity across the genome. But at the level of single cells the signal is binary, comprising 0 or 1 independent reads in a region (counts of 2, 3, or more, corresponding to multiple insertions in a single region or to other alleles of a locus, are theoretically possible but would be rare). Due to the sparse nature of the data it is therefore impossible to tell if a region that went unobserved in a single cell but was observed in bulk samples is in fact inaccessible in that cell, or was simply missed by the transposase, or was lost in the amplification process. This limitation can be overcome for some purposes by sampling many cells in parallel or by analyzing sets of insertion sites with shared features. This type of aggregation allows one to summarize the binary observations in single cells as frequencies observed on the level of many cells or many sites, respectively. Both studies used this approach, and developed analytical frameworks that relied on chromatin accessibility information from pooled cells to interpret their scATAC-seq data (Fig. 1b).
Cusanovich et al. compared the reads from each cell to DNase hypersensitive sites (DHSs) from ENCODE to produce a binary map of chromatin accessibility, annotating each DHS region as “used” or “unused” based on the overlap. They compared these binary maps among all pairwise combinations of cells to determine similarities and differences among them. This information was sufficient to deconvolute mixtures of two cell lines into their cell types of origin. Further analysis focused on clusters of regions with coordinated chromatin accessibility within a cell type, identifying subpopulations of GM12878 cells .
The analysis by Buenrostro et al. focused on identifying factors associated with cell-to-cell variability of chromatin accessibility. They reasoned that trans-factors might influence variability in chromatin accessibility — for example, by binding to accessible chromatin. They first obtained regions of open chromatin using aggregate single-cell data and then grouped these regions into ensembles of sites that shared genomic features based on ChIP-seq data, DNA sequence motifs, or domains with similar replication timing. Using the signal across all cells, they then calculated a “variability score” for each factor to measure the associations of hundreds of trans-factors with cell-to-cell variability of chromatin accessibility.
What do data from single cells tell us that population-based approaches do not?
The data from these studies reliably separated cells based on their cell types, uncovered sources of cell-to-cell variability, and demonstrated a link between chromatin organization and cell-to-cell variation, all things that population-based approaches could not have told us. Specifically, Buenrostro et al. found that high cell-to-cell variability in chromatin accessibility was associated with binding of specific transcription factors and with replication timing. In K562 cells, GATA1 and GATA2, two central regulators of the erythroid lineage, were both strongly associated with high cell-to-cell variation. Some trans-factors acted synergistically to amplify variation, while others, including CTCF, seemed to suppress variability. Trans-factors associated with high cell-to-cell variability tended to be cell type-specific. For example, Buenrostro et al. found that NFκB binding was associated with cell-to-cell variability in GM12878 cells, but not in K562 and embryonic stem cells. Similarly, Cusanovich et al. found that NFκB binding regions were highly associated with accessible regions that drove the separation of 4118 GM128787 cells into four clusters. Further, the studies demonstrated that cell-to-cell variability is a dynamic phenomenon that can be tuned through extracellular signaling. This was shown by pharmacological perturbation; for example, treatment with tumor necrosis factor-α led to a marked increase in variability of NFκB-associated regions in GM12878 cells, and cell cycle inhibition in K562 cells led to a reduction in chromatin variability of regions associated with specific replication timing. Finally, a connection between chromatin accessibility in cis and chromosome organization was suggested by the finding that groups of adjacent peaks whose deviation correlates with other groups of adjacent peaks across cells mapped to interaction domains previously identified using Hi-C.
The promise and limitations of probing chromatin in single cells
These studies are important technical advances that demonstrate the promise of scATAC-seq. As one example, the method could be used to characterize cell-to-cell heterogeneity in tumor samples, and may even provide a way to map chromatin accessibility in all individual cells of an organism — for example, during development. One major limitation to current scATAC-seq approaches is that they capture only a tiny subset of the open chromatin sites in single cells, and it seems unlikely that comprehensive coverage can be achieved in the near term. However, higher per-cell coverage would allow new questions to be answered. For example, it is not clear how many open chromatin regions exist in a single cell, or how chromatin accessibility differs between the two alleles in an individual cell. A more comprehensive map would also provide a better understanding of the interplay and co-regulation of multiple regulatory elements associated with a single gene. Recently, single-cell RNA-seq studies were dramatically parallelized by processing thousands of individual cells in miniscule droplets . If a similar approach can be applied to scATAC-seq, one may be able to combine the advantages of the combinatorial indexing used by Cusanovich et al. with the higher per-cell coverage achieved by the microfluidic approach of Buenrostro et al.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.