The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions
Here, we introduce the 3D Genome Browser, http://3dgenome.org, which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with their potential target genes. Users can seamlessly integrate thousands of other omics data to gain a comprehensive view of both regulatory landscape and 3D genome structure.
The three-dimensional (3D) organization of mammalian genomes plays an essential role in gene regulation [1, 2, 3, 4]. At the DNA level, distal regulatory elements such as enhancers have been shown to be in spatial proximity to their target genes. At a larger scale, topologically associating domains (TADs) have been suggested to be the basic unit of mammalian genome organization [5, 6]. Several recent high-throughput technologies based on chromatin conformation capture (3C)  have emerged (such as Hi-C , ChIA-PET , Capture-C , Capture Hi-C , PLAC-Seq , and HiChIP ) and have provided an unprecedented opportunity to study the genome spatial organization in a genome-wide fashion.
As the volume of chromatin interaction data keeps increasing, efficient visualization and navigation of these data become a major bottleneck for their biological interpretation. Due to the size and complexity of these interactome data, it is challenging for an individual lab to store and explore them efficiently. To tackle this challenge, several visualization tools have been developed, and each of them has its unique features and limitations. The Hi-C Data Browser  was the first web-based query tool that visualizes Hi-C data as heatmaps. Currently, it does not support zoom functionalities and only hosts limited number of datasets. The WashU Epigenome Browser [14, 15] can display both Hi-C and ChIA-PET data, and it also provides access to thousands of epigenomic datasets from the ENCODE and Roadmap Epigenome projects. Due to the large file size of Hi-C matrices, which could reach hundreds of gigabytes, its speed for uploading and exploring Hi-C data is still not optimal. Furthermore, it does not offer an option to display inter-chromosomal interaction data as heatmaps. Users can also explore Hi-C data in Juicebox  and Hi-Glass  with great speed, but currently, neither of them provide other types of chromatin interaction data, such as Capture Hi-C or ChIA-PET. Delta browser  is another visualization tool with many features and can display both physical view of 3D genome modeling and Hi-C data. However, all the aforementioned tools except for the WashU Epigenome Browser only display Hi-C as a heatmap, which is convenient for visualizing large domain structures such as TADs, but may not be the most informative way for visualizing enhancer-promoter interactions.
Here, we present the 3D Genome Browser (www.3dgenome.org), which is a fast web-based browser that allows users to smoothly explore both published and their own chromatin interaction data. Our 3D Genome Browser features six distinct modes that allow users to explore interactome data tailored toward their own needs, from exploring organization of higher-order chromatin structures at domain level to investigating high-resolution enhancer-promoter interactions. Our browser provides convenient zoom and traverse functions in real time and supports queries by gene name, genomic loci, or SNP rsid. In addition, users can easily incorporate their UCSC Genome Browser and the WashU Epigenome Browser sessions and therefore can simultaneously query and supplement chromatin interaction data with thousands of genetic, epigenetic, and phenotypic datasets, including ChIP-Seq and RNA-Seq data from the ENCODE and Roadmap Epigenomics projects. So far, it has been visited by more than 60,000 unique users from 120 countries surpassing over 600,000 page views. In summary, the 3D Genome Browser represents an invaluable resource and ecosystem for the study of chromosomal organization and gene regulation in mammalian genomes.
Results and Discussion
Overall design and implementation of the system
Summary of number of datasets available on the 3D Genome Browser
Samples and conditions
Virtual 4C, derived from Hi-C
Same as above
Same as above
To facilitate a user’s unique interest, our 3D Genome Browser features six distinct modes that allow users to explore interactome data, including (1) intra-chromosomal Hi-C contact matrices as heatmaps, coupled with TADs and available genome annotation in the same cell type; (2) inter-chromosomal Hi-C heatmaps: this mode is particularly helpful for visualizing inter-chromosomal interactions and translocations; (3) compare Hi-C matrices: stacked Hi-C heatmaps from different tissues or even different species; (4) virtual 4C: Hi-C data is plotted as an arc for a queried gene or loci (bait), where the center is the bait region. This mode is particularly helpful for revealing chromatin interactions between two individual loci; (5) ChIA-PET or other ChIP-based chromatin interaction data such as PLAC-Seq and HiChIP; (6) Capture Hi-C or other capture-based chromatin interaction data. Below, we will use several examples to demonstrate these options and also illustrate how the 3D Genome Browser can be used to make novel biological discoveries.
Exploring chromatin interactions using Hi-C data
First, we demonstrate an example of exploring Hi-C data with the 3D Genome Browser for a large genomic region in Fig. 2a. It only takes ~ 5 s to show a 10-Mb region of GM12878 Hi-C interaction map on chr12 (~ 15–25 Mb) at a 25-kb resolution. The alternating yellow and blue bars are predicted TADs using the same in-house pipeline as in Dixon et al. . The dark red vertical bars are DNase I hypersensitive sites (DHS) in the same cell type. Users can also adjust the color scale to reduce the background signals and make the TAD structure more visible.
Identifying cell/tissue-specific chromatin interactions is important, as it has been shown that chromatin structure plays an important role in determining cellular identity [22, 23]. In Fig. 2b, we notice a chromatin interaction in the 5-kb resolution Hi-C contact map in K562 cell line  (marked by the black arrow). To interpret biological meaning of this chromatin interaction, we integrated the WashU Epigenome Browser with gene annotation; histone modification H3K4me1, H3K4me3, and H3K27ac; and chromHMM  in K562 cells. We found that the two interacting loci are the promoter of SLC25A37 and a putative enhancer predicted by histone modification patterns and chromHMM (Fig. 2b, vertical gray bar). This putative enhancer has been confirmed to exhibit enhancer activities that regulate SLC25A37 expression during late-phase erythropoiesis . Further, we checked the expression patterns profiled by the ENCODE consortium for SLC25A37 on our browser and it showed high tissue specificity to K562 cells (Additional file 1: Figure S1).
Discovering high-resolution promoter-enhancer interactions using Capture Hi-C and DHS-linkage
To further examine the predicted promoter-enhancer linkages, we also explored the linkage data by DNase I hypersensitive sites (DHS) in this region (blue curve line, second track in Fig. 3a), which represents another method of linking distal regulatory element with their target genes. It works by computing Pearson correlation coefficients between the gene proximal and distal DHS pairs across more than 100 ENCODE cell types, and only the pairs with PCC > 0.7 and within 500 kb are kept as the linked pairs . In the example shown in Fig. 3a, we observed several interactions involving the promoter of the PAX-5 gene and a potential enhancer (marked by both H3K4me1 and H3K27ac signals) downstream of the ZCCHC7 gene in the naïve B cell Capture Hi-C dataset . One region marked by enhancer-associated histone modifications has indeed been previously determined to be an enhancer for PAX5, and its disruption leads to leukemogenesis . By integrating multiple lines of evidence, our browser provides a valuable resource for investigators to generate hypotheses connecting distal non-coding regulatory elements and their target genes.
Investigating potential target genes for non-coding genetic variants
Resolutions at loci-specific levels also hold significance in the discovery of the functions of non-coding genetic variants, such as single nucleotide polymorphisms (SNPs), which may disrupt transcription factor (TF) binding sites of cis-regulatory elements. In this section, we will first demonstrate how to use virtual 4C mode for such analyses. The 4C (circular chromosomal conformation capture [30, 31]) experiment is a chromatin ligation-based method that measures one-versus-many interactions in the genome, that is, the interaction frequencies between a “bait” locus and any other loci. Its data is plotted as a line histogram, where the center is the “bait” region and any peak signals in distal regions indicate the frequency of chromatin interaction events. In our browser, we use the queried region (gene name or SNP) as the bait and extract Hi-C data centered on the bait region, hence, virtual 4C. To bolster the power of the virtual 4C plot, our browser also supplements ChIA-PET and DHS-linkage data. In Fig. 3b, we queried the SNP rs12740374 in the virtual 4C mode. This SNP has been associated with high plasma low-density lipoprotein cholesterol (LDL-C) , which could lead to coronary artery disease and myocardial infarction. We plotted virtual 4C and ChIA-PET data from K562 in this region, as high-resolution Hi-C and ChIA-PET data are only available for K562, but not for hepatic cell lines. Since LDLs are processed by the liver, we examined the histone modifications in the Hep2G cell line and found rs12740374 is located within a candidate enhancer region as marked by H3K27ac signals. Hence, virtual 4C, ChIA-PET, and DHS-linkage all support a putative interaction between the enhancer harboring this SNP and the promoter region of SORT1. Further, it has been shown that the rs12740374 minor allele creates a C/EBPα-binding site which enhances SORT1 expression leading to decreased LDL-C levels, thus suggesting that the minor allele confers a gain-of-function effect . Still, despite the unusual conclusions reached by the study—as most minor alleles are usually loss-of-function—the virtual 4C mode of our 3D Genome Browser could aid in the hypothesis generation of not only the cis-regulatory elements and their putative target genes but also the effects of non-coding variants.
Exploring conservation of chromatin structure across species
Uncovering structural variations in cancer genomes
New binary Hi-C data format allows faster data retrieval and visualizing users’ own Hi-C datasets
The 3D Browser supports a variety of features that allow users to browse unpublished data. First, our browser encourages integration with customized UCSC or WashU Epigenome browser sessions, wherein the users could add or modify existing tracks or upload their own genomic/epigenomic data. For example, to view a customized UCSC session, a user would only be required to enter the UCSC session URL. More importantly, the users could view their own Hi-C data by converting the contact matrices into a novel, indexed binary file format called Binary Upper TrianguLar MatRix (BUTLR file) developed by us. By hosting the BUTLR file on any HTTP-supported server and providing the URL to the 3D Genome Browser, a user can take full advantage of the features of our browser, without having to upload their Hi-C data since the browser would only query the selected region through binary indexing, rather than searching through the entire matrix. This capability is similar to the bigWig/bigBed mechanism invented by us and UCSC .
Additionally, BUTLR format dramatically reduces the file size of high-resolution Hi-C data not only through the binarization but also through the omission of redundant values (Additional file 1: Figure S3a; Additional file 2). The BUTLR file encodes an entire genome-wide chromatin interactions data into a binary, indexed format. While 1-kb resolution hg19 intra-chromosomal Hi-C contact matrices in the tab-delimited format require almost 1 TB, the BUTLR format of those same matrices would only take 11 GB (Additional file 1: Figure S3b). More importantly, the binary file format also greatly improves the query speed: using pre-loaded Hi-C datasets, the 3D browser generally returns the query results as a heatmap in a matter of seconds. We also want to note that our browser is designed as query-based to maximize its usability, and as a result, it excels at exploring locus of interest and gene-element relationship, but can be a little less dynamic than other tools when navigating Hi-C matrix for larger genomic regions.
In summary, we developed an interactive 3D Genome Browser that is defined by simple and easy-to-navigate graphical user interface, fast query-response time, and a comprehensive collection of publicly available chromatin interaction datasets. As our browser simultaneously displays the 3D chromatin interactions, functional (epi)genomic annotations, and disease/trait-associated SNPs, we provide an invaluable online tool for investigators from all over the world for the study of 3D genome organization and its functional implications in mammalian gene regulation.
Backend and user interface
In-house Hi-C data processing pipeline
We followed the pipeline in Dixon et al.  for Hi-C data processing. Briefly, raw fastq files were aligned to human reference genome GRCh38 with BWA aligner (0.7.15-r1140). Only uniquely mapped reads and properly paired reads on the same chromosome are retained. The genome is binned at different resolution (e.g., 40 kb and 10 kb) to generate Hi-C matrix. Paired reads were considered to be chromatin interactions connecting two bins. ICE (iterative correction and eigenvector decomposition) normalization was done using the “iced” Python package.
User query submission
The user may provide genomic coordinates or genome features such as gene symbols, RefSeq ID, Ensembl ID, or SNP rsid as queries for all modes of the 3D Genome Browser.
External genome browser integration and alignment
Determining homologous regions
For the compare Hi-C mode, we determine the homologous regions between two species by querying for homologous genes from the NCBI’s HomoloGene database  as well as utilizing known inter-species chains .
The BUTLR file encodes an entire genome-wide chromatin interactions data into a binary, indexed format. To compress the original contact matrices, BUTLR only stores the nonzero values of the upper triangular matrices of the intra-chromosomal data and the n × m, where n and m are the number of interrogated loci and where n < m of the inter-chromosomal data. The locations of each chromosome or chromosome-pair matrix, row indices of each matrix, and column indices of nonzero values along with nonzero values are binarized and indexed within the BUTLR file structure. Perl scripts that encode and decode BUTLR files are available at http://github.com/yuelab/BUTLRTools. All the Hi-C matrices in this manuscript are converted to BUTLR file format for visualization [5, 8, 19, 20, 22, 23, 24, 44-54].
We thank Dr. Jesse R. Dixon for the help with TADs and compartment calling. We are grateful to the members of Wang lab and Yue lab for useful discussions.
This work was supported by NIH grants R35GM124820, R01HG009906, and U01CA200060 (F.Y.). F.Y. is also supported by Leukemia Research Foundation, PhRMA Foundation, and Penn State CTSI. T.W. is also supported by NIH grants R01HG007175, R01HG007354, R01ES024992, U24ES026699, and U01HG009391. M.H. is partially supported by NIH U54DK107977. Y.L. is partially supported by NIH R01HG006292 and R01HL129132.
Availability of data and materials
All data are available at http://3dgenome.org. The source code of the website is deposited at https://github.com/yuelab/3dgenome  and Zenodo (DOI: 10.5281/zenodo.1402785) . The code for the 3D Genome Browser is freely available under an MIT license.
No new experimental datasets were generated within this study. Publicly available datasets included in the browser are listed in the supplementary tables.
The review history for this manuscript is available as Additional file 2.
YW, TW, and FY conceived the project. YW, FS, and FY designed and implemented the project. YW, TW, and FY wrote the manuscript with input from MC, YL, MH, and RCH. BZ, LZ, JX, DK, DL, and MC helped with the data processing and integration. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Bickmore WA. The spatial organization of the human genome. Annu Rev Genomics Hum Genet. 2013;14:67–84. https://doi.org/10.1146/annurev-genom-091212-153515 CrossRefGoogle Scholar
- 8.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–93. https://doi.org/10.1126/science.1181369 CrossRefGoogle Scholar
- 17.Kerpedjiev P, Abdennur N, Lekschas F, McCallum C, Dinkla K, Strobelt H, Luber JM, Ouellette SB, Ahzir A, Kumar N, et al. HiGlass: web-based visual comparison and exploration of genome interaction maps. bioRxiv. 2017; https://doi.org/10.1101/121889
- 24.Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–80. https://doi.org/10.1016/j.cell.2014.11.021 CrossRefGoogle Scholar
- 27.Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, Cairns J, Wingett SW, Varnai C, Thiecke MJ, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167:1369–84. e1319. https://doi.org/10.1016/j.cell.2016.09.037 CrossRefGoogle Scholar
- 31.Zhao Z, Tavoosidana G, Sjolinder M, Gondor A, Mariano P, Wang S, Kanduri C, Lezcano M, Sandhu KS, Singh U, et al. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat Genet. 2006;38:1341–7. https://doi.org/10.1038/ng1891 CrossRefGoogle Scholar
- 34.Dixon J, Xu J, Dileep V, Zhan Y, Song F, Le VT, Yardimci GG, Chakraborty A, Bann DV, Wang Y, et al: An integrative framework for detecting structural variations in cancer genomes. bioRxiv 2017. https://doi.org/10.1101/119651.
- 47.Nagano T, Lubling Y, Várnai C, Dudley C, Leung W, Baran Y, Mendelson Cohen N, Wingett S, Fraser P, Tanay A. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature. 547:61–7. https://doi.org/10.1186/s13059-015-0753-7
- 48.Rubin A, Barajas B, Furlan-Magaril M, Lopez-Pajares V, Mumbach M, Howard I, Chang H, Fraser P, Khavari P. Lineage-specific dynamic and pre-established enhancer-promoter contacts cooperate in terminal differentiation. Nat Genet. 2017;49(10):1522–8. https://doi.org/10.1038/ng.3935 CrossRefGoogle Scholar
- 53.Doynova MD, Markworth JF, Cameron-Smith D, Vickers MH, O’Sullivan JM. Linkages between changes in the 3D organization of the genome and transcription during myotube differentiation in vitro. Skelet. Muscle. 7:1–14. https://doi.org/10.1186/s13395-017-0122-1
- 55.Wang Y, Song F, Zhang B, Zhang L, Xu J, et. al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Github repository. https://github.com/yuelab/3dgenome. (2018).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.