Ribosomal DNA instability and genome adaptability
Ribosomes are large, multi-subunit ribonucleoprotein complexes, essential for protein synthesis. To meet the high cellular demand for ribosomes, all eukaryotes have numerous copies of ribosomal DNA (rDNA) genes that encode ribosomal RNA (rRNA), usually far in excess of the requirement for ribosome biogenesis. In all eukaryotes studied, rDNA genes are arranged in one or more clusters of tandem repeats localized to nucleoli. The tandem arrangement of repeats, combined with the high rates of transcription at the rDNA loci, and the difficulty of replicating repetitive sequences make the rDNA inherently unstable and particularly susceptible to large variations in repeat copy number. Despite mounting evidence suggesting extra-ribosomal functions of the rDNA, its repetitive nature has excluded it from traditional sequencing-based studies. However, more recently, several studies have revealed the unique potential of the rDNA to act as a “canary in the coalmine,” being particularly sensitive to genomic stresses and acting as a source of adaptive response. Here, we review evidence uncovering mechanisms of regulation of instability and copy number variation at the rDNA and their role in adaptation to the environment, which could serve to understand the basic principles governing the behavior of other tandem repeats and their role in shaping the genome.
KeywordsrDNA Transcription Replication Replication-transcription conflicts Instability Copy number variation Adaptive mutations
Ribosomes are large, multi-subunit macromolecular machines essential for protein synthesis. Ribosome composition is largely conserved across all eukaryotes, comprising of large (60S) and small (40S) ribosomal subunits, each made up of ribosomal RNAs (rRNAs) and several ribosomal proteins. Ribosomes are the most abundant cellular macromolecules, with over 60% of total cellular transcription devoted to their biosynthesis (Warner 1999). To meet the high cellular demand for ribosomes, the genes encoding rRNAs, the ribosomal DNA (rDNA), are present in numerous copies in all eukaryotes.
Remarkably, the organization of the rDNA is also largely conserved in the vast majority of eukaryotes studied. rDNA genes are arranged in clusters of tandem repeats that form the nucleolus, the site of rRNA transcription, processing, and ribosome assembly. The rDNA is the most highly transcribed genomic locus. The high rates of transcription and the tandem nature of the repeats render the rDNA highly susceptible to recombination-mediated repeat copy number variation. Additionally, rDNA loci are missing from genome assemblies owing to difficulties in the sequencing and assembly of large stretches of tandem repeats. Moreover, the rDNA loci are difficult to manipulate genetically, given their repetitive nature and essential role in cellular viability. As a result, the rDNA remained “the dark matter” (McStay 2016) of the genome for many decades. However, with the advent of cheaper whole genome sequencing technologies and sophisticated analysis techniques, it has become clear that repeat copy number variation at the rDNA and other repetitive regions is a significant source of genetic and phenotypic diversity in populations. Several studies over the last two decades have established that the role of the rDNA goes well beyond rRNA production for ribosome biogenesis. In this review, we focus on these extra-coding functions of the rDNA, particularly the unique adaptive potential conferred by a highly unstable, tandemly repeated array of genes in light of more recent evidence implicating rDNA instability and copy number variation in cellular response to the environment.
Structural organization of the rDNA is conserved
Eukaryotic ribosomes contain four RNA components that play critical structural and catalytic roles—the 25-28S, 5.8S, and 5S rRNAs (in the 60S ribosomal subunit) and the 18S rRNA (in the 40S ribosomal subunit). In all eukaryotes studied, the 25-28S, 18S, and 5.8S rRNAs are transcribed by RNA polymerase I (RNAPI) as a single precursor rRNA transcript from several copies of a single gene (ranging from 35S in the budding yeast, Saccharomyces cerevisiae to 47S in mammals). Copy number of the 35-47S rDNA repeat units per haploid genome ranges from ~ 100–200 copies in S. cerevisiae (Salim et al. 2017), Schizosaccharomyces pombe (Wood et al. 2002), mice, and humans (Xu et al. 2017; Gibbons et al. 2015) to several thousand copies in some plants (Rosato et al. 2016). While the 35-47S rDNA repeats are always arranged in tandem, in a head to tail fashion, the number of clusters and their chromosomal locations vary widely across species. For example, in S. cerevisiae, the 35S rDNA repeats are arranged in a single cluster on the long arm of chromosome XII (Petes 1979), whereas in the human genome, the 47S (also referred to as 45S) rDNA repeats are found on five acrocentric chromosomes, 13, 14, 15, 21, and 22 (Henderson et al. 1972). The 5S rRNA is transcribed by RNA polymerase III (RNAPIII) from a separate gene. Typically, the 5S rDNA is organized as a single cluster of tandem repeats physically separated from the 35-47S rDNA repeats in all eukaryotes. Notable exceptions to this organization are S. cerevisiae, where the 5S rDNA is part of the repeat unit containing the 35S rDNA, and S. pombe, where the ~ 30 5S rDNA genes are dispersed throughout the genome (Wood et al. 2002; Mao et al. 1982).
Several of these features are conserved in the human 47S rDNA repeat unit, which consists of a large, ~ 30 kb IGS separating the ~ 13.4 kb 47S rRNA coding sequences of adjacent repeats. ~ 2 kb upstream of the rRNA transcription start site is a spacer promoter whose transcription has been shown to play a role in silencing of rDNA repeats by enabling establishment and maintenance of heterochromatin (Santoro et al. 2010; Mayer et al. 2006). The IGS also contains origin(s) of replication (ORI) (Little et al. 1993) and a cluster of RNAPI transcription termination elements collectively called a Sal-box. The Sal-box motifs function as RFBs, with different Sal-box motifs causing either polar or bidirectional arrest of replication forks, when bound by the RNAPI termination factor TTF-1 (Little et al. 1993; Akamatsu and Kobayashi 2015).
Besides the organization of the rRNA coding sequences, several elements within the intergenic spacers have also been shown to be remarkably conserved in a variety of eukaryotes studied. For example, in addition to budding yeast and humans, DNA replication origins within the IGS and RFB activity at the 3′ end of rDNA genes have been identified in Tetrahymena thermophila (MacAlpine et al. 1997), S. pombe (Sanchez et al. 1998), frogs (Wiesendanger et al. 1994), plants (Hernández et al. 1993), and mice (Gerber et al. 1997; Diermeier et al. 2013) (also reviewed in (Dalgaard et al. 2011)). While RFB activity is independent of RNAPI transcription in S. cerevisiae (Brewer et al. 1992), the regulation of RFB activity within the rDNA genes is coupled to RNAPI transcription in higher eukaryotes. For example, in mice and S. pombe RFB activity also depends on the RNAPI transcription termination factor TTF-1 (Reb1 in S. pombe) and causes polar arrest of replication forks in the direction opposite to RNAPI transcription (Sanchez et al. 1998; Gerber et al. 1997; Sanchez-Gorostiaga et al. 2003). While the strength of the RFB and the mechanism and polarity of replication fork arrest vary, the role of the RFB in maintaining mostly unidirectional DNA replication co-oriented with RNAPI transcription in the highly transcribed rDNA appears to be conserved across eukaryotes. The observation that RFB activity in human rDNA repeats is restricted to actively transcribed rDNA repeats, with activity coinciding with the replication of these active repeats early in S-phase (Akamatsu and Kobayashi 2015) also supports this idea.
Additionally, the rDNA contains binding sites for several proteins like cohesin, condensin, DNA replication and transcription factors, and CTCF which play important roles in organization of the rDNA chromatin (see Potapova and Gerton for detailed review). Therefore, disproportionate binding of these proteins to rDNA arrays of varying size or chromatin states could result in altered concentrations of these proteins throughout the rest of the genome, affecting chromatin environments and transcription genome-wide. In fact, human population genome sequencing data analysis has revealed correlations between rDNA copy number and the expression of several genes, notably CBX1 (HP1β), CTCF, CDYL2, MYST1, RASL11A, CENPA, KTI12, INO80C, and KDM4B (Gibbons et al. 2014), all of which are known chromatin-modifying factors. Such a correlation between rDNA copy number variation and gene expression has also been demonstrated in Drosophila melanogaster (Paredes et al. 2011). The discovery that variation of rDNA copy number can modulate expression of SIR2 and cellular pools of Sir2 in budding yeast, which in turn affects Sir2-dependent silencing of specific loci in the rest of the genome (Michel et al. 2005), also supports the idea that the rDNA locus could titrate genome-wide levels of various factors. The demonstration of the importance of specific sequence elements within rDNA loci in the organization of nucleolus and the nuclear genome and the importance of these contacts in regulation of rRNA transcription, rDNA stability, and rDNA copy number (O’Sullivan et al. 2009; Mayan and Aragon 2010; Yu and Lemos 2018; Cahyani et al. 2015) in a variety of model systems, and the conservation of these functionally relevant elements of the rDNA across eukaryotes further supports the idea that the extra-ribosomal functions of the rDNA arrays in modulating genome dynamics may also be conserved across eukaryotes.
rDNA instability and copy number variation may confer adaptability
The rDNA array is the most highly transcribed genomic locus. The high rates of transcription at the rDNA combined with its repetitive nature makes the locus highly prone to replication-transcription conflicts. While DNA replication encounters many obstacles on the template DNA, transcription remains one of the most mutagenic obstacles to the replisome, particularly at highly transcribed genes (reviewed in (Kim and Jinks-Robertson 2012)). In S. cerevisiae, the RFB, when bound by Fob1, serves to prevent head-on collisions between the replisome and RNAPI by stalling replication forks that progress in a direction opposite to RNAPI transcription (Kobayashi 2003). These stalled forks eventually collapse and are processed into double-stranded breaks, which are likely repaired through one of many homologous recombination-dependent repair pathways (Kobayashi et al. 2004). Recombination has been shown to require RNAPI transcription (Kobayashi et al. 1998) and is further enhanced by non-coding transcription from E-pro, which is thought to clear cohesin from the rDNA repeats, promoting unequal sister chromatid exchange (Kobayashi and Ganley 2005). Given the presence of many identical repeats that can serve as a template for homologous recombination, the rDNA is highly susceptible to unequal recombination-mediated repeat copy number changes at every cell division (reviewed in (Kobayashi 2014)). This makes the rDNA one of the most unstable and hypervariable genomic regions. Despite this relatively high instability, repeat copy number at the rDNA locus is stably maintained in every species studied. The regulation of this inherent instability and the somewhat paradoxical, stable maintenance of normal repeat copy number have remained areas of intense scientific investigation. The last two decades have witnessed the discovery of many genes involved in the regulation of instability and maintenance of rDNA copy number, mainly through high-throughput genetic screens in S. cerevisiae (Salim et al. 2017; Ide et al. 2013; Saka et al. 2016; Smith et al. 1999). These genes generally fall into three broad categories—(i) regulation of DNA replication, (ii) regulation of RNAPI transcription, and (iii) regulation of recombination/repair (reviewed in (Kobayashi and Sasaki 2017)). While most of this work was done in budding yeast, recent years have seen the discovery of many orthologous factors that control rDNA stability and copy number in mammalian cells (reviewed in Tsekrekou et al. 2017). Given the conservation of key regulatory elements controlling replication and transcription of the rDNA repeats from yeast to humans, it is likely that at least some basic principles of the maintenance of this locus are conserved.
For many years, the extensive copy number variation at the rDNA was thought to be the inevitable consequence of the inherent instability at the locus. This was confounded by the observation that “normal” rDNA copy number was stably maintained under normal growth conditions. Further, while rDNA copy number and stability could be altered under stress, or in mutant genetic backgrounds, there seemed to be no apparent correlation between rDNA stability and copy number. While it was known that multiple copies of rDNA repeats are required for the high rates of rRNA biosynthesis observed in actively growing cells, the reason for the maintenance of rDNA repeats in over two-fold excess of the requirement for rRNA biogenesis also remained unclear. The first hints suggesting the importance of the extra, untranscribed rDNA repeats came from early studies in budding yeast showing that rDNA copy number could be reduced significantly without affecting rRNA output or cell growth (Kobayashi et al. 1998; French et al. 2003). These studies also showed that while only a fraction of the rDNA repeats is transcribed in a strain with normal (~ 150 copies) copy number, in the low (~ 40 copies) copy strain, both the number of transcribed repeats and RNAPI load per transcribed repeat increased. In fact, it had been observed that loss of extra copies of rDNA could rescue yeast temperature-sensitive mutants of the origin recognition complex, the key replication initiation complex (Ide et al. 2007), suggesting that under certain conditions, the highly repetitive rDNA could be a burden on cellular machinery. Work from Ide et al., also in budding yeast, subsequently established the importance of the extra, untranscribed rDNA repeats for efficient DNA damage repair in the highly transcribed array (Ide et al. 2010). Further, the DNA damage sensitivity of the low copy strains was shown to be dependent on RNAPI transcription (Ide et al. 2010), suggesting that while the extra copies of rDNA are not essential to meet cellular rRNA demands, they may serve to reduce transcriptional load on the rDNA, and allow replication-coupled repair, maintaining the integrity of this essential locus, especially under conditions of stress.
Subsequently, several studies showed that changes in rDNA copy number were frequent adaptive responses to stress. Kwan et al. (2013) identified a naturally occurring polymorphism in the budding yeast rDNA origin of replication (rARS) that results in a weakly replicating rDNA array. This “weak rARS” was shown to cause a contraction of the rDNA array and promote DNA replication in the rest of the genome (Kwan et al. 2013). Loss of regulation of rARS firing resulting from deletion of RIF1 was also shown to be rescued by a contraction of the rDNA array (Shyian et al. 2016), suggesting that control of the replication program at the rDNA was critical not only to rDNA stability but also for replication of the rest of the genome. More recently, a screen of temperature-sensitive mutants of 787 essential yeast genes revealed that mutants with compromised DNA replication often had smaller rDNA arrays (Salim et al. 2017). More importantly, data from this study showed that under conditions of DNA replication stress, loss of rDNA repeats allowed timely completion of DNA replication, allowing adaptation to replication stress (Salim et al. 2017). Work from Foss et al. (2017) provided direct evidence to support the idea that, in yeast, repetitive rDNA competes for origin firing factors with unique genomic sequences (in yeast, the rest of the genome). Foss et al. showed that tipping the balance in favor the rDNA can lead to replication gaps, or underreplicated regions, throughout the rest of the genome (Foss et al. 2017). Taken together, data from these studies suggest that while extra, untranscribed rDNA repeats are essential for maintaining rDNA stability under normal conditions, they become detrimental to cells under conditions of DNA replication stress.
Such adaptive rDNA copy number changes are not restricted to challenges to DNA replication. Early studies in budding yeast showed that loss of the Rpa135 subunit of RNAPI led to disintegration of nucleolar structure and a loss of rDNA repeats (Kobayashi et al. 1998; Oakes et al. 1993). Subsequently, Albert et al. (2011) reported similar phenotypes in yeast strains lacking the Rpa49 subunit unique to RNAPI. Interestingly, loss of rDNA repeats partially restored nucleolar organization in these mutants (Albert et al. 2011). Given the decrease in RNAPI loading rate in the rpa49Δ mutants, it was proposed that interaction between RNAPI subunits in a highly transcribed rDNA array is critical to nucleolar assembly (Albert et al. 2011). This suggests that a loss of repeats would increase RNAPI density at the rDNA, allowing interaction between RNAPI subunits and promoting nucleolar assembly. The dependence of nucleolar organization on RNAPI transcription supports this idea. While loss of rDNA repeats seems more frequent, expansions of the rDNA array have also been reported. For example, Ide et al. showed that loss of Rtt109, the key histone H3 lysine 56 acetyltransferase in yeast, was critical to maintain normal rDNA copy number, and loss of control of rDNA amplification in rtt109Δ mutants led to hyper-amplification of the rDNA array (Ide et al. 2013). Mutations which affect RNAPI transcription factors, for example, rrn9Δ and rrn10Δ, have also been shown to result in expansion of the rDNA array, presumably to compensate for the decrease in rRNA production (Ide et al. 2013; Oakes et al. 1999). Therefore, the rDNA locus may be designed to be a plastic array, whose size can be determined by selective cues from the environment.
rDNA copy number changes that could confer adaptive potential are also being discovered in mammalian systems. Xu et al. (2017) and Wang and Lemos (2017) showed through bioinformatic analyses of whole genome sequencing data from various cancers that 45S rDNA repeats are often lost in cancer. Mutational analyses of these cancer genomes showed correlations of the rDNA copy number changes with a hyperactive mechanistic target of rapamycin (mTOR) pathway (Xu et al. 2017) and somatic inactivation of the tumor suppressor gene, TP53 (Wang and Lemos 2017). Xu et al. further showed that mouse hematopoietic stem cells (HSCs) lacking PTEN, a negative regulator of mTOR, also had contracted 45S rDNA arrays, and like yeast cells with reduced rDNA copy number, these cells were more sensitive to DNA damaging agents such as bleomycin, MMS, and X-rays. Interestingly, this DNA damage sensitivity was independent of mTOR activity, and mainly attributed to low rDNA copy number. Although PTEN is a phosphatase widely known for its role as a tumor suppressor, nuclear PTEN is essential for maintaining genome stability by dephosphorylating MCM2 and modulating replication fork progression under conditions of DNA replication stress (Feng et al. 2015). DNA replication stress resulting from reduced MCM expression was also shown to cause accumulation of phosphorylated histone H2AX (γH2AX) at the nucleolus and drive functional decline in aging mouse HSCs (Flach et al. 2014). These data suggest that the loss of 45S rDNA repeats observed in the Pten−/− mouse HSCs could be attributed to DNA replication stress. Work on the effect of DNA replication stress on the yeast rDNA predicts that persistent DNA replication stress would select for a loss of rDNA repeats in mammalian systems as well, and this is in fact what was observed in thymic tumors derived from MCM2-deficient mice (Salim et al. 2017). Consistent with this, mouse embryonic fibroblasts derived from these MCM2-deficient mice also show increased levels of DNA damage at the 45S rDNA repeats and sensitivity to UV (Kunnev et al. 2010). Additionally, despite increased DNA damage sensitivity, the Pten−/− HSCs exhibited increased proliferation, rRNA production, and protein synthesis, suggesting the selective advantage of loss of rDNA repeats.
More recently, Udugama et al. showed that ATRX (alpha thalassemia/mental retardation X-linked)-mutated ALT (alternative lengthening of telomeres) positive human cancer cells also had low 45S rDNA copy number (Udugama et al. 2018). Through work in mouse embryonic stem cells, this group also showed that loss of ATRX affected chromatin assembly at the rDNA and was characterized by an increase in γH2AX levels at the rDNA, a loss of rDNA repeats, decreased binding of RNAPI and the key RNAPI transcription factor UBF (upstream binding factor), and reduced rRNA transcription (Udugama et al. 2018). Further, cells that had lost ATRX were also sensitive to RNAPI inhibition. A recent study by Malinovskaya et al. showed a loss of rDNA repeats and a decrease in the fraction of methylated (transcriptionally inactive) rDNA repeats in cultured human fibroblasts with replicative senescence (Malinovskaya et al. 2018). In light of the observation that budding yeast cells with low rDNA copy number have a higher fraction of actively transcribed repeats relative to cells with higher rDNA copy number (French et al. 2003), it is tempting to speculate that in the face of stress that selects for a loss of rDNA repeats, transcriptionally inactive repeats may be eliminated. However, it is impossible to distinguish between preferential loss of inactive, hypermethylated rDNA repeats and changes in transcriptional status of rDNA repeats based on existing evidence. Nevertheless, these results have very interesting implications for diagnosis and treatment of human diseases. rDNA copy number may be indicative of the history of the cell and thus be used to diagnose past stress. Additionally, the differential sensitivity of cells with altered rDNA copy number to various drugs may aid in the selection of more effective chemotherapeutic strategies. Altogether, these data, along with the discovery of the role of rDNA stability in a variety of diseases (Hallgren et al. 2014; Diesch et al. 2014), suggest that rDNA copy number may prove to be an important indicator in human disease.
New tools to measure rDNA copy number
Rising interest in the relevance of copy number variations at the rDNA loci has led to the rapid evolution of tools to measure rDNA copy number. Early studies in budding yeast relied on the estimation of rDNA copy number from the size of chromosome XII as determined by Pulsed-field gel electrophoresis (PFGE) of whole chromosomes and subsequent Southern blotting using an rDNA-specific probe (Kobayashi et al. 1998). Given the limitations of PFGE in resolution of DNA fragments larger than ~ 6 Mb, estimation of rDNA array sizes in mammalian systems required restriction digests that liberated intact rDNA arrays (Stults et al. 2008). Despite the usefulness of PFGE in observing allelic variation and size of individual rDNA clusters from different chromosomes, this technique is time-consuming, tedious, requires considerable starting material, and offers limited accuracy and resolution for arrays larger than ~ 6 Mb. Additionally, interpretation of PFGE results is complicated by conditions that alter migration of DNA on the gel (for example, DNA replication/recombination intermediates) and/or affect instability at the rDNA.
Array-based comparative genomic hybridization methods were instrumental in the identification of copy number variations genome-wide at a resolution significantly greater than that offered by cytogenetic approaches (reviewed in (Carter 2007)). However, these methods do not provide the resolution or sensitivity required for detection of relatively small copy number changes in repetitive regions like the rDNA. More recently, real-time PCR (qPCR) has been used to measure rDNA copy number (Jack et al. 2015), where fluorescent probes are used to monitor the progression of PCR amplification of the target of interest at each cycle. However, DNA copy number measurement by qPCR requires a standard curve for each experiment and is only useful to detect relatively large changes in copy number. This makes qPCR difficult to adapt to high-throughput experiments and limits detection of smaller rDNA copy number changes that are more common and may be functionally relevant. The last decade witnessed the revolutionization of quantitative PCR by the development of droplet digital PCR (ddPCR). The idea of “digital PCR” was developed in the 1980s and involved diluting DNA samples to a “limiting dilution” and partitioning the diluted DNA sample into different wells in a plate such that each well contained 1–2 molecules of the target of interest (Saiki et al. 1988). Early pioneers of digital PCR combined limiting dilutions with PCR amplification of the DNA to end-point, quantification using fluorescent probes, and Poisson statistics to achieve absolute nucleic acid quantification (Saiki et al. 1988; Vogelstein and Kinzler 1999). However, despite its usefulness, particularly in the detection of rare targets, digital PCR in its early form was very labor intensive, and expensive, and therefore was quickly replaced by qPCR.
ddPCR has been increasingly used to measure rDNA copy number in a variety of model systems (Salim et al. 2017; Xu et al. 2017). In a typical assay, primers and fluorescent probes specific to a small region of the rDNA repeat and to a stable, single copy control gene are designed. Amplicons for the rDNA target and single copy reference gene are chosen so that they are flanked by recognition sites for one restriction enzyme; restriction enzyme digestion of the genomic DNA allows separation of tandem copies of rDNA, reduces sample viscosity, and improves template accessibility. This ensures that target DNA is randomly partitioned into the ~ 20,000 droplets. The use of probes with different fluorophores for the rDNA and single copy reference targets allows absolute quantification of both targets in a single, duplexed ddPCR reaction. rDNA copy number per haploid genome can then be calculated from their ratio (Fig. 2b). Since each reaction is partitioned into ~ 20,000 droplets, technical error in an individual reaction can be calculated based on the droplet data from that well. Technical error in ddPCR comes mainly from errors due to sub-sampling and partitioning into droplets. In a good assay, this total technical error should be close to the standard error of the mean, and is typically within 5–10%, which eliminates the requirement for multiple technical replicates per sample. While biological replicates are critical due to natural biological variability in rDNA copy number, ddPCR allows rapid, accurate, and sensitive detection of rDNA copy number changes.
While data from some studies suggest that rDNA copy number measurements in human DNA samples with PCR-based methods may be confounded by a variety of factors including reduced amplification efficiency of the rDNA, particularly in damaged DNA, sequence polymorphisms at the rDNA, and heterogeneity in methylation status of various rDNA repeats (Chestkov et al. 2018; Zafiropoulos et al. 2005), the increasing relevance of copy number variation in diagnostics and therapeutics and the need for obtaining accurate copy number measurements in a quick, sensitive, and high-throughput manner from very small amounts of samples makes ddPCR an attractive choice. In fact, the potential application for ddPCR in clinical settings was demonstrated through its use in accurate measurement of germline copy number variation in breast cancer, detection of rare mutant alleles, and the absolute quantification of circulating DNA from cell-free plasma (Hindson et al. 2011). Therefore, ddPCR could facilitate the accurate characterization of copy number variations at repetitive regions of the genome and serve as an invaluable tool in this new era of molecular diagnostics.
Inducible copy number variation
The traditional view of evolution states that adaptive mutations occur at random under stress and are selected for during growth under that stress. However, several studies in bacteria have shown that the genome may be designed to direct mutations to loci that require rapid evolution. Analysis of several bacterial species has revealed that while a majority of essential genes are encoded on the leading strand, such that replication of these genes is co-oriented with their transcription, a small but significant fraction of genes is encoded on the lagging strand, where, presumably, the more mutagenic head-on collisions between the replisome and transcription machinery are more frequent (“head-on genes”) ((Merrikh and Merrikh 2018), reviewed in (Lang and Merrikh 2018)). Further, work in Bacillus subtilis showed that these head-on genes exhibit a higher mutation rate independent of gene sequence and chromosomal location, in a transcription dependent manner (Paul et al. 2013; Sankar et al. 2016). Moreover, it was observed that head-on genes are typically genes involved in stress response, antibiotic resistance, and virulence, which are rarely expressed during growth in rich media (Merrikh and Merrikh 2018). These data suggest that bacterial genomes evolved to be plastic such that they are able to tune the rates of mutation at relevant loci in the face of stress.
Similar to the CUP1 array, transcription and replication fork stalling are critical to recombination and copy number variation at the rDNA array, suggesting that replication-transcription conflicts and their resolution are a major source of instability at the rDNA. Further, modulating these conflicts and regulating mechanisms of their resolution could be one way of regulating instability at the locus. Additionally, several factors whose loss/gain result in more stable rDNA arrays have also been identified (reviewed in (Kobayashi and Sasaki 2017)), suggesting that cells may have evolved to optimize instability levels, rather than minimize them.
Such mechanisms are not restricted to budding yeast. In fact, mounting evidence suggests that despite the presence of elaborate mechanisms to avoid them, secondary structures produced by transcription, and the transcription machinery itself could frequently hinder replisome progression and are a major source of genomic instability in mammalian genomes as well (reviewed in (Hamperl and Cimprich 2016)). For example, the rDNA repeats in both prokaryotes and eukaryotes have been shown to be hotspots for the formation of unusual nucleic acid structures called R-loops, which consist of a nascent RNA-template DNA hybrid and the displaced non-template, single-stranded DNA (Nadel et al. 2015; Masse et al. 1997; El Hage et al. 2010; Ginno et al. 2012). A recent study reported that the coding regions of the rDNA repeats in human cells are strongly enriched for R-loops relative to the rest of the genome (Nadel et al. 2015). R-loops are natural by-products of transcription, particularly at GC-rich DNA sequences, which is likely why the GC-rich, highly transcribed rDNA repeats may be particularly prone to their formation. Collisions between R-loops and the DNA replication machinery are thought to be a major source of the replication-transcription conflicts that result in genomic instability (Garcia-Muse and Aguilera 2016; Gaillard and Aguilera 2016; Aguilera and Garcia-Muse 2012; Skourti-Stathaki and Proudfoot 2014), suggesting that R-loop formation and resolution contributes significantly to the inherent instability at the rDNA.
Importantly, transcription itself is mutagenic, irrespective of DNA replication, because torsional stress created by transcription is relieved by topoisomerases, which creates nicks and breaks in the template DNA. Therefore, transcription associated mutagenesis (reviewed in (Kim and Jinks-Robertson 2012)) may be particularly high at the highly transcribed rDNA. Significant portions of mammalian genomes are made up of repetitive elements; while many repeats are yet to be identified, it has been established that copy number variation at the rDNA is a significant source of genetic diversity in populations (Gibbons et al. 2015; Gibbons et al. 2014; Stults et al. 2008). Moreover, given the conservation of the structural organization of the rDNA repeats, particularly the tandem arrangement of repeats, the presence of a RFB, high rates of transcription, and pathways involved in maintenance of the locus, mechanisms of regulation of instability of the rDNA in budding yeast, may be conserved in mammals.
The fundamental principles that have been shown to influence the copy number of tandem repeats in model systems can be used to consider cancer genomes because (a) the rDNA array has been shown to be highly susceptible to recombination-mediated rearrangements in solid tumors (Stults et al. 2009), (b) rRNA transcription appears to be frequently dysregulated in cancers (Xu et al. 2017; Udugama et al. 2018; Lu et al. 2009), and (c) RNAPI transcription has emerged as an effective therapeutic target in a variety of cancers (Hannan et al. 2013). If variation in copy number can be induced by transcription, then instability at rDNA repeats may be induced as rapidly proliferating cancer cells increase transcription to meet the high demand for ribosomes. While increased variation in rDNA copy number could result in loss and gain of repeats, changes in steady-state rDNA copy number and their effects on tumor progression are poorly understood. Data from studies in budding yeast suggest that while an increase in rDNA copy number may facilitate increased rRNA production while reducing RNAPI transcriptional load on the rDNA loci, amplified rDNA arrays may also hinder faithful genome replication and repair. In fact, several independent studies of human cancer genomes and murine models of cancer described above have all revealed that rDNA repeats tend to be lost in cancer. Given the striking consistency of repeat loss, we propose that loss of repeats may be adaptive. We speculate that altered rRNA transcription may initially induce variation in copy number, but a loss of repeats may allow for easier genome replication and faster proliferation, and may thus be selected for. Cancer cells often contain an array of mutations, sometimes simplistically divided into driver and passenger mutations. Driver mutations help drive tumor development and progression, while passenger mutations are considered neutral. However, we suggest that there may be a third class of mutations that could occur in cancer, adaptive mutations, and loss of rDNA repeats could be an example of this type of mutation. Adaptive mutations could facilitate proliferation without necessarily acting as independent drivers. Given the prevalence of large stretches of repetitive DNA in the human genome, and the paucity of methods to study these sequences, functionally relevant copy number changes at other repetitive regions of the genome may have gone undetected in cancer.
Beyond rDNA—tandem repeats
Mammalian genomes contain many more large stretches of repetitive regions than yeast, and the role of variation at these loci in phenotypic diversity and adaptation is only just beginning to be appreciated. A recent genome-wide association study of 1011 natural isolates of S. cerevisiae showed that copy number variations not only contributed the most to genetic variation but also had the most significant effect on phenotype (Peter et al. 2018). In the human genome, there are approximately 235 copy number variants that have been associated with either well-established or emerging chromosomal syndromes (Wyandt et al. 2017). While the potential functional impact of copy number variations is beginning to be appreciated, the extent of copy number variation in the human genome remains far from fully determined. Early comparative genomic hybridization-based attempts identified copy number variations at over a 1000 regions, covering nearly 400 Mb of the genome (Redon et al. 2006). More recently, a computational attempt to discover tandem repeats in the human genome identified 25,000 arrays between 600 bp and 10 kb in length, with 503 arrays larger than 10 kb (Warburton et al. 2008). Furthermore, extreme variation in copy number has been reported for tandemly repeated genes in the human genome (Brahmachary et al. 2014). These data suggest that copy number polymorphisms may make significant contributions to genome function.
While the rDNA plays a central role in cell physiology, it accounts for < 1% of the human genome. However, it may serve as a model for understanding general principles governing copy number variation at other tandem repeats and the functional impact of such variation. For example, relatively little is known about the factors that determine and maintain 5S rDNA copy number. In higher eukaryotes, 5S rDNA genes are organized as one or more tandem clusters of repeats physically separated from the 45S rDNA genes. While there is evidence of mitotic and meiotic recombination within the 5S rDNA arrays in human genomes (Stults et al. 2008), the 5S rDNA has been observed to exhibit relatively low copy number variation in experimental systems. For example, although a modest amplification of the 5S rDNA was found in cancer genomes that exhibited a loss of 45S rDNA repeats, this amplification was found to be mainly due to amplification of the 1q42 segment on which the 5S rDNA repeats reside (Wang and Lemos 2017). Moreover, analysis of 5S and 45S rDNA repeat copy number in human and mouse whole genome sequencing data suggests that copy number changes at the two loci may be under selection by mechanisms that ensure maintenance of correct rRNA stoichiometry (Gibbons et al. 2015). Work in budding yeast suggests that ribosome biogenesis is regulated at the transcription level, with RNAPI transcription being the key determinant of the biogenesis of other ribosomal components (Laferte et al. 2006). These observations are confounding in light of high copy number variation at the 5S rDNA loci in human populations (Stults et al. 2008). Therefore, the selective pressures acting on the 5S rDNA repeats and the impact of 5S rDNA copy number variation on cellular and organismal adaptation may be different from those of the 45S rDNA repeats and remain largely unknown.
Another example of highly repetitive DNA in the human genome is alpha satellite DNA, the major type of satellite DNA in humans. Alpha satellite DNA is enriched at human centromeres and is known to be essential for several aspects of centromere function, including kinetochore assembly and heterochromatin formation. Despite the essential and universally conserved function of centromeres in chromosome segregation, the presence of several megabases of repetitive satellite DNA has excluded centromeres from human genome sequence assemblies. More recently, it was discovered that centromeric satellite DNA can be transcribed to produce non-coding RNA that plays important roles in maintaining centromeric repeat stability and function (reviewed in (Hall et al. 2012; McNulty and Sullivan 2018)). The observation that both centromeric satellite repeat DNA copy number as well as transcription could be altered in cancers (Bersani et al. 2015; Ting et al. 2011) suggests that variation at the centromeric repeats may have important functional consequences in normal physiology and disease. Interestingly, work on the repetitive, regional centromeres in S. pombe showed that the coordination of replication and non-coding RNA transcription is critical for the establishment of heterochromatin, which is essential for normal centromere function (Zaratiegui et al. 2011). Moreover, loss of the RNA interference (RNAi) machinery, which is essential for the maintenance of heterochromatin, also results in instability at centromeric repeats in S. pombe (reviewed in Forsburg and Shen 2017). Similarly, mouse embryonic cells lacking Dicer, a key component of RNAi, were also observed to have upregulated pericentromeric RNA transcription and defective centromeric silencing (Kanellopoulou et al. 2005). Based on the similarities between the regulation of stability of the rDNA repeats and centromeric repeats, we speculate that modulation of transcription and/or replication may play critical roles in the maintenance of stability and function of mammalian centromeres. While the role of the centromeric satellite DNA in establishment of centromeric chromatin and kinetochore assembly is clear, the nature and extent of copy number variation at these satellites and their effects on chromosome segregation remain unknown.
Ribosomal DNA copy number and instability may be regulated by the fundamental processes of transcription and DNA replication. The large body of work in budding yeast has provided many insights into the structural features of the rDNA locus and the mechanisms controlling these repeats. Recent studies have built on this work, providing new insights into how changes in rDNA repeat copy number may aid adaptation to stress associated with proliferation in cancer. Tandem repeats are common in the human genome, and while their variability may confer phenotypic consequences, they are understudied and poorly understood. Studies of rDNA repeats may serve as a touchstone for understanding the principles of regulation of additional repeat sequences. The identification and study of tandem repeats and mechanisms underlying their maintenance are critical to understand the important functional and evolutionary roles they play in genome biology.
We thank Mark Miller (Stowers Institute) for his help with illustrations. This work was done to fulfill, in part, the requirements for DS’s PhD thesis research as a student registered with the Open University.
DS and JLG wrote, reviewed, and edited the manuscript.
This work was funded by the Stowers Institute for Medical Research.
- Bersani F, Lee E, Kharchenko PV, Xu AW, Liu M, Xega K, MacKenzie OC, Brannigan BW, Wittner BS, Jung H, Ramaswamy S, Park PJ, Maheswaran S, Ting DT, Haber DA (2015) Pericentromeric satellite repeat expansions through RNA-derived DNA intermediates in cancer. Proc Natl Acad Sci U S A 112:15148–15153CrossRefPubMedPubMedCentralGoogle Scholar
- Brahmachary M, Guilmatre A, Quilez J, Hasson D, Borel C, Warburton P, Sharp AJ (2014) Digital genotyping of macrosatellites and multicopy genes reveals novel biological functions associated with copy number variation of large tandem repeats. PLoS Genet 10:e1004418CrossRefPubMedPubMedCentralGoogle Scholar
- Cahyani I, Cridge AG, Engelke DR, Ganley AR, O’Sullivan JM (2015) A sequence-specific interaction between the Saccharomyces cerevisiae rRNA gene repeats and a locus encoding an RNA polymerase I subunit affects ribosomal DNA stability. Mol Cell Biol 35:544–554CrossRefPubMedPubMedCentralGoogle Scholar
- Dalgaard JZ, Godfrey EL, MacFarlane RJ (2011) Eukaryotic replication barriers: how, why and where forks stall. DNA Replication-Current Advances. pp 269–304Google Scholar
- Flach J, Bakker ST, Mohrin M, Conroy PC, Pietras EM, Reynaud D, Alvarez S, Diolaiti ME, Ugarte F, Forsberg EC, le Beau MM, Stohr BA, Méndez J, Morrison CG, Passegué E (2014) Replication stress is a potent driver of functional decline in ageing haematopoietic stem cells. Nature 512:198–202CrossRefPubMedPubMedCentralGoogle Scholar
- Forsburg SL, Shen KF (2017) Centromere stability: the replication connection. Genes (Basel) 8Google Scholar
- Hindson BJ, Ness KD, Masquelier DA, Belgrader P, Heredia NJ, Makarewicz AJ, Bright IJ, Lucero MY, Hiddessen AL, Legler TC, Kitano TK, Hodel MR, Petersen JF, Wyatt PW, Steenblock ER, Shah PH, Bousse LJ, Troup CB, Mellen JC, Wittmann DK, Erndt NG, Cauley TH, Koehler RT, So AP, Dube S, Rose KA, Montesclaros L, Wang S, Stumbo DP, Hodges SP, Romine S, Milanovich FP, White HE, Regan JF, Karlin-Neumann GA, Hindson CM, Saxonov S, Colston BW (2011) High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal Chem 83:8604–8610CrossRefPubMedPubMedCentralGoogle Scholar
- Kobayashi T, Sasaki M (2017) rDNA stability is supported by many “buffer genes”—introduction to the Yeast rDNA Stability Database. FEMS Yeast Res 17Google Scholar
- Kwan EX, Foss EJ, Tsuchiyama S, Alvino GM, Kruglyak L, Kaeberlein M, Raghuraman MK, Brewer BJ, Kennedy BK, Bedalov A (2013) A natural polymorphism in rDNA replication origins links origin activation with calorie restriction and lifespan. PLoS Genet 9:e1003329CrossRefPubMedPubMedCentralGoogle Scholar
- Merrikh CN, Merrikh H (2018) Gene inversion increases evolvability in bacteria. bioRxiv 293571. https://doi.org/10.1101/293571
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME (2006) Global variation in copy number in the human genome. Nature 444:444–454CrossRefPubMedPubMedCentralGoogle Scholar
- Ting DT, Lipson D, Paul S, Brannigan BW, Akhavanfard S, Coffman EJ, Contino G, Deshpande V, Iafrate AJ, Letovsky S, Rivera MN, Bardeesy N, Maheswaran S, Haber DA (2011) Aberrant overexpression of satellite repeats in pancreatic and other epithelial cancers. Science 331:593–596CrossRefPubMedPubMedCentralGoogle Scholar
- Tsekrekou M, Stratigi K, Chatzinikolaou G (2017) The nucleolus: in genome maintenance and repair. Int J Mol Sci 18Google Scholar
- Wyandt H.E., Wilson G.N., Tonk V.S. (2017) Human chromosome variation: heteromorphism,polymorphism and pathogenesis. Springer, Singapore. https://doi.org/10.1007/978-981-10-3035-2
- Zaratiegui M, Castel SE, Irvine DV, Kloc A, Ren J, Li F, de Castro E, Marín L, Chang AY, Goto D, Cande WZ, Antequera F, Arcangioli B, Martienssen RA (2011) RNAi promotes heterochromatic silencing through replication-coupled release of RNA Pol II. Nature 479:135–138CrossRefPubMedPubMedCentralGoogle Scholar