ReplicationDomain: a visualization tool and comparative database for genome-wide replication timing data
- 6k Downloads
Eukaryotic DNA replication is regulated at the level of large chromosomal domains (0.5–5 megabases in mammals) within which replicons are activated relatively synchronously. These domains replicate in a specific temporal order during S-phase and our genome-wide analyses of replication timing have demonstrated that this temporal order of domain replication is a stable property of specific cell types.
We have developed ReplicationDomain http://www.replicationdomain.org as a web-based database for analysis of genome-wide replication timing maps (replication profiles) from various cell lines and species. This database also provides comparative information of transcriptional expression and is configured to display any genome-wide property (for instance, ChIP-Chip or ChIP-Seq data) via an interactive web interface. Our published microarray data sets are publicly available. Users may graphically display these data sets for a selected genomic region and download the data displayed as text files, or alternatively, download complete genome-wide data sets. Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets. Upon uploading, registered users may choose to: (1) view their data sets privately without sharing; (2) share with other registered users; or (3) make their published or "in press" data sets publicly available, which can fulfill journal and funding agencies' requirements for data sharing.
ReplicationDomain is a novel and powerful tool to facilitate the comparative visualization of replication timing in various cell types as well as other genome-wide chromatin features and is considerably faster and more convenient than existing browsers when viewing multi-megabase segments of chromosomes. Furthermore, the data upload function with the option of private viewing or sharing of data sets between registered users should be a valuable resource for the scientific community.
KeywordsUCSC Genome Browser Replication Timing Registered User Main Menu Floating Gene
In eukaryotic cells, segments of chromosomes replicate via the synchronous firing of clusters of replication origins . These segments or "replication domains" replicate in a defined temporal order during S-phase. This replication-timing program is cell type specific , and developmentally regulated changes in this program are associated with changes in chromatin structure and gene expression [2, 3, 4, 5]. In particular, a global re-organization of this replication-timing program occurs during the differentiation of mouse embryonic stem cells (mESCs), with changes occurring at the level of large (~600 kb) chromosomal domains reflecting global re-positioning of sequences within the nucleus . Moreover, pluripotent cells can be distinguished from differentiated cells not only by differences in their replication timing profiles but by their smaller and more numerous replication domains . Hence, replication timing is a unique epigenetic property of chromatin in that it is regulated at the level of megabase-sized domains. Establishing replication maps for various tissues is likely to provide a database of chromosome segments that undergo large changes in organization during differentiation.
The significance of a replication-timing program has remained elusive. In several model systems, defects in replication-timing are associated with defects in chromosome condensation, sister chromatid cohesion, and genome stability [6, 7]. Abnormal replication-timing control has become a clinical marker for predicting malignant cancers [8, 9, 10, 11, 12]. In particular, specific chromosome translocations result in a chromosome-wide delay in replication timing that triggers additional chromosome translocations at a high frequency [13, 14]. Cells from patients with several inherited human diseases show defects in replication-timing that correlate with mis-regulation of genes during development [15, 16, 17, 18]. Also, replication domains are separated by timing transition regions (the domain boundaries) that appear to be devoid of replication origins, requiring that a single replication fork travel very long distances between early and late replicating domains [2, 19, 20]. Evidence suggests that genes lying within these transition regions are prone to DNA damage [21, 22]. While very few such boundaries have been mapped, their cell-type specificity suggests the possibility that differential organization of replication domains may contribute to cell type specific predispositions to certain types of DNA damage. Hence, establishing a database of replication timing profiles for various tissues and their relationship to transcription and other chromosomal properties is a prerequisite for understanding the roles of replication timing in chromosome-based diseases. These roles may extend beyond epigenetic regulation of transcription: the locations and directions of replication forks, the organization of replication complexes that coordinate replication of large domains, and the locations of domain boundaries may constitute an epigenetic basis for tissue-specific or cancer-promoting differences in genome stability.
Surprisingly few genome-wide studies of replication timing have been performed . Early studies in Drosophila cells with cDNA arrays , or in human cells using BAC arrays  did not provide the resolution to define replication domains and their boundaries. A tiling array study of human ENCODE regions covering 1% of the genome was also not able to precisely delineate replication domains, likely because they are typically larger than the 500 kb segments queried by this study [26, 27]. Other high-resolution studies of specific chromosome segments in human and Drosophila [28, 29, 30] also did not delineate domain structure but noted that the relationship between replication timing and transcription was best described at the level of large multi-genic regions rather than individual genes .
We have recently mapped replication domain structure genome-wide in mouse embryonic stem cells (mESCs) and their differentiated counterparts . We recognized the need for a comprehensive database to display these profiles and compare them between cell lines as well as to other chromosome-wide properties such as transcriptional activity or other epigenetic marks. While this can in principle be done on other public web browsers such as the UCSC Genome Brower http://genome.ucsc.edu, in many cases it is desirable to quickly visualize one's data from individual replicates or using unpublished data in a format that is not appropriate for public viewing. Such comprehensive public databases are becoming complex with the number of tissue and cell-type specific data sets that the reader must browse through. Generally speaking, they are tailored toward the display of static features of chromosomes, rather than dynamic cell-type specific features. Moreover, because of the increasing complexity of genome browsers, viewing large chromosomal regions necessary to visualize replication domains tends to be very slow (e.g. a 5-Mb chromosome segment takes several tens of seconds to display on the UCSC Genome Browser, but 2–3 seconds on ReplicationDomain).
For these reasons, we developed ReplicationDomain as a centralized repository that enables rapid comparative analysis of the genomic landscape for replication domain organization, with the potential to compare these properties to any other genome-wide chromosome data sets, such as those from ChIP-Chip or ChIP-Seq experiments. Our published microarray data sets are publicly available for any non-registered user to view and download. Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets. Users have three options for data security, either to view their data sets privately without sharing (Über Private), share with other registered users (Private), or make their data sets publicly available on condition that they are published or "in press" in peer-reviewed journals (Public). In the future, we plan to implement additional data security mechanisms that will allow sharing of data sets only with designated registered users. The ability of registered users to upload data sets for private viewing facilitates confidential sharing of data prior to publication, or for quality control checks. ReplicationDomain uses an interactive interface designed to be intuitive for users familiar with the UCSC Genome Browser, with unique features optimized for rapid viewing of multi-megabase domain-wide chromosome properties and the option to jump to the same region of interest in the UCSC Genome Browser. Our recent demonstration of global re-organization of replication domains during ESC differentiation  predicts that replication profiling will provide a rapid and comprehensive means to evaluate cell-type specific features of global genome organization. ReplicationDomain will provide a valuable resource to consolidate replication profiles, making them more accessible to view and identify cell-type specific properties. We encourage users to begin uploading data sets and suggesting features that will improve this database.
Construction and content
System Architecture and Implementation
Software and Hardware
Utility and discussion
ReplicationDomain is designed to quickly and conveniently examine and compare properties of chromosomes that are important for their higher order structure and function, particularly as it relates to the organization of replication timing domains. It functions similar to other genome browsers, but has unique features that allow one to examine and compare microarray data for replication timing to steady state transcript levels (or any other genome-wide feature) with ease. Data can be downloaded as a tab-delimited text file or saved as a JPEG file for figure assembly. We also provide a link to the UCSC Genome Browser for quick access to additional data for the chromosomal region of interest. Members of the Gilbert lab utilize this site regularly as a tool to mine patterns in the genome indicative of functional changes in chromosome structure during differentiation (see "Downloading data for personal use"). Furthermore, we have implemented a user registration system that allows registered users to upload their own data sets, the details of which can be found under "Creating a ReplicationDomain account" and "Uploading your own data set" below. Regarding unpublished data sets, registered users can either view them privately without sharing or share with other registered users. For published or "in press" data sets, investigators have an option to release them for public access. We can accommodate data sets from any species with a physical genome map, but may need some time to assemble a new navigation path for each new species.
ReplicationDomain is freely accessible at http://replicationdomain.org. It provides a novel tool to examine and compare microarray data for replication timing to other chromosomal properties. Different types of data sets can be aligned, by virtue of the dynamically generated graphical output. This unique feature allows investigators to compare replication timing and other chromosomal properties at any region of the genome under different developmental, genetic (e.g. gene knockout cell lines) and/or experimental conditions. Non-registered users may freely view and download public data sets. Registered users with a ReplicationDomain account may upload their own data sets and view them privately or share them with other registered users. Published or "in press" data sets can be added to the series of data sets public available. Further details of these procedures can be found below.
Getting to a region of interest
Search for a Gene Name near the site of interest (Figure 1, button 4). You may be asked to choose from similar-named genes, particularly if you have typed in a partial gene name. When you arrive to the chromosome position, your gene of interest will appear in the center of the image. You will likely need to zoom out to see the context of your gene. Your gene will remain at the center as you zoom.
When you select a region of the chromosome, the Data Display page will display a floating gene name box (Figure 3, button 3) in the right column that contains the names and map positions of all genes within the selected region. If there are more than 16 genes, the first and last 8 genes are shown, with the number of genes between them indicated. Pointing your mouse cursor at a gene name will open a hover box that will provide its start and stop positions (Figure 3, button 4). Further information about this genomic region or the structure of the genes contained within it can be found by clicking the UCSC Genome Browser link (Figure 3, button 5).
Choosing data sets to view
Data sets on the Data Display page are chosen for viewing much as they are with the UCSC Genome Browser. All data sets are "Hidden" as a default, and can be viewed by using the dropdown menus (Figure 3, button 6). Relative transcription levels (Figure 3, button 7) can be plotted with either a linear or log scale (Figure 3, button 8) while replication-timing data is always plotted as a log2 ratio (Figure 3, button 9) by choosing "Show" (Figure 3, button 10). When these options are selected, graphical display of the data set will show up automatically. The y-axes are adjusted to the highest and lowest values in the entire data set (e.g. for transcription, the height of the y-axis provides an indication of the most highly transcribed gene in that data set). Details of each data set can be found by clicking on their Chip ID on the Data Display page (Figure 3, Chip ID column) or in the "Database" link in the main menu window (Figure 1, button 1). Definitions of the terms used to describe the data sets can be found under the "Documentation" link in the main menu (Figure 1, button 1).
You can move to the left or the right, jump to a specific nucleotide position, or zoom in or out (2–8 fold) using the buttons at the top of the Data Display page (Figure 3, button 2), similar to what is done with the UCSC Genome Browser. ReplicationDomain also allows you to grab regions of the chromosome using your mouse cursor; simply drag a rectangle around the chromosomal region of interest either on the chromosome ideogram (Figure 3, button 1) or on individual data sets (Figure 3, buttons 7 and 9) and you will jump to that region. To orient your position relative to genes in the area, you can: (1) use the floating gene name box in the right column (Figure 3, button 3); (2) use the link to the UCSC Genome Browser; or (3) download the corresponding "Transcription" data with all gene names and positions.
Downloading data for personal use
Comparing to other genome properties
At the top of the page is a link entitled "UCSC Genome Browser for this Region" (Figure 3, button 5). Clicking this button will open a new tab or browser page to the UCSC Genome Browser for the region viewed on ReplicationDomain. In addition, if you wish to upload your own data sets and compare to other data sets on ReplicationDomain, you will need to register yourself first (see "Creating a ReplicationDomain account" below for details). Registered users are entitled to upload their data sets as described below under "Uploading your own data set."
Creating a ReplicationDomain account
User registration is required for those who wish to upload their data sets or use the database interactively with other registered users. To create a ReplicationDomain account, please visit our ReplicationDomain Account Request page (a link is found on the User Guide page) and submit the Account Request Information form. A confirmation email will be sent to you with a ReplicationDomain username and a password, which will allow you to log in (Figure 1, button 5).
Uploading your own data set
Definitions of Data Entry Terms
Data sets are defined by a combination of 14 entries, as described below (see also Figure 5C). Upon uploading data sets, registered users can either select terms from the dropdown list, or create a new term by filling in the blank.
At present, we only have Mus musculus, but we intend to add Homo sapiens and Drosophila melanogaster soon. Contact us to create any new species page.
Microarray product supplier name.
This is the unique identifier for each data set. While a "Chip ID" normally represents a single replicate experiment (e.g. one microarray hybridization), most data sets currently displayed on the Data Display page are averages of multiple replicates. Therefore, we have re-defined Chip ID as a string of characters combining individual "Chip ID" numbers and description of the data set identity. Chip ID is not useful except to communicate comments regarding a particular experiment. On the Data Display page, however, the Chip ID for each data set is set up as a link to the "Data Set Details" (Figure 3, Chip ID column; also accessible from the "Database" link on the main menu). The Chip ID is also useful for identifying data sets when downloading entire data sets through the "Download Data" link in the main menu.
This indicates the version of the genomic sequence information that was used to assemble the microarray chip in the particular experiment (for example, mm7 or mm8 for the mouse). Builds change slightly as sequence information becomes updated, so the exact base pair position of any given DNA sequence will change as the sequence information becomes annotated. The build information indicated in each data set shows the build used for chromosomal coordinates of probes on the particular array type used.
For Nimblegen data sets only.
This is the tissue or tissue type represented by the cell line used and grown under the indicated conditions. Currently, there are four such differentiation states:
ESC: Undifferentiated ES cells
NPC/EBM9: The 9th day of differentiation following an established neural differentiation protocol that differentiates ESCs via embryoid bodies to Sox1 positive NPCs in conditioned medium . NPC/ASd6: The 6th day of differentiation following an established neural differentiation protocol that differentiates ESCs to Sox1 positive NPCs in monolayer cultures using defined medium .
iPS: "Induced Pluripotent Stem Cells" re-programmed from tail-tip fibroblasts derived from a 129xBL-6 hybrid strain of mice to the pluripotent state as described .
Array Design Name
Microarray supplier and catalog number.
Indicates the property being measured in the indicated experiment. At present, replication timing and transcription data are shown. In the future, data for other genome-wide properties of chromosomes may be displayed. Contact us to add new types of data such as ChIP-Chip or ChIP-Seq.
Data sets to be displayed publicly must include a reference.
We provide detailed microarray design information here but any additional comments can be added.
Present or Absent Column
For uploading transcription data sets that contain present-absent calls, specify here.
Data Security Level
Users can select Public, Private, or Über Private. Users can make their published or "in press" data sets publicly available by selecting "Public" and providing a reference under the entry term, Reference. Private data sets are viewable by all registered users with a ReplicationDomain account, while Über Private data sets are viewable only by the user who uploaded the data set.
Data Starts on Line
Usually starts on line 2, with line 1 being the column names.
ReplicationDomain provides a user-friendly platform to view replication timing data from any organism and compare that data to other properties in a manner that is optimized for rapid viewing of multi-megabase segments of chromosomes. In addition to providing a consolidated and devoted database, ReplicationDomain provides the opportunity for researchers to share and analyze preliminary data sets with colleagues prior to providing public access. Although not as comprehensive as other databases, ReplicationDomain allows rapid linkage to the UCSC Genome Browser for cross-referencing to other databases. At present, the site contains only data sets collected in the Gilbert laboratory. In the future, we expect to serve as curators of a substantial database, and we expect ReplicationDomain to contain data from other laboratories and for other species, as well as other specific properties that are relevant to higher order domain structure of chromosomes. We invite others to use the web site and to create an account and upload their own data sets so that ReplicationDomain can be used to advance our understanding of the functional significance of a dynamic, developmentally regulated replication-timing program.
Availability and requirements
ReplicationDomain is available at http://www.replicationdomain.org for use by academic or non-academic users without restriction or charge.
We thank S. Thompson (FSU) for technical lectures and suggestions in initial sessions. D. Adler (ZymoGenetics, University of Washington) for mouse chromosome ideograms. This work was supported by NIH award GM83337 to DMG and postdoctoral fellowships from the Rett Syndrome Research Foundation to IH and the Leukemia & Lymphoma Society to TY.
- 1.Gilbert DM, Gasser SM: Nuclear Structure and DNA Replication. In DNA replication and human disease. Edited by: DePamphilis ML. Cold Spring Harbor, New York: Cold Spring Harbor Press; 2006.Google Scholar
- 11.Amiel A, Elis A, Maimon O, Ellis M, Herishano Y, Gaber E, Fejgin MD, Lishner M: Replication status in leukocytes of treated and untreated patients with polycythemia vera and essential thrombocytosis. Cancer Genet Cytogenet 2002, 133(1):34–38. 10.1016/S0165-4608(01)00560-XCrossRefPubMedGoogle Scholar
- 16.Hansen RS, Stoger R, Wijmenga C, Stanek AM, Canfield TK, Luo P, Matarazzo MR, D'Esposito M, Feil R, Gimelli G, et al.: Escape from gene silencing in ICF syndrome: evidence for advanced replication time as a major determinant. Hum Mol Genet 2000, 9(18):2575–2587. 10.1093/hmg/9.18.2575CrossRefPubMedGoogle Scholar
- 19.Norio P, Kosiyatrakul S, Yang Q, Guan Z, Brown NM, Thomas S, Riblet R, Schildkraut CL: Progressive activation of DNA replication initiation in large domains of the immunoglobulin heavy chain locus during B cell development. Mol Cell 2005, 20(4):575–587. 10.1016/j.molcel.2005.10.029CrossRefPubMedGoogle Scholar
- 27.Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al.: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007, 447(7146):799–816. 10.1038/nature05874CrossRefPubMedGoogle Scholar
- 28.White EJ, Emanuelsson O, Scalzo D, Royce T, Kosak S, Oakeley EJ, Weissman S, Gerstein M, Groudine M, Snyder M, et al.: DNA replication-timing analysis of human chromosome 22 at high resolution and different developmental states. Proc Natl Acad Sci USA 2004, 101(51):17771–17776. 10.1073/pnas.0408170101PubMedCentralCrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.