Reduced intrinsic DNA curvature leads to increased mutation rate
Mutation rates vary across the genome. Many trans factors that influence mutation rates have been identified, as have specific sequence motifs at the 1–7-bp scale, but cis elements remain poorly characterized. The lack of understanding regarding why different sequences have different mutation rates hampers our ability to identify positive selection in evolution and to identify driver mutations in tumorigenesis.
Here, we use a combination of synthetic genes and sequences of thousands of isolated yeast colonies to show that intrinsic DNA curvature is a major cis determinant of mutation rate. Mutation rate negatively correlates with DNA curvature within genes, and a 10% decrease in curvature results in a 70% increase in mutation rate. Consistently, both yeast and humans accumulate mutations in regions with small curvature. We further show that this effect is due to differences in the intrinsic mutation rate, likely due to differences in mutagen sensitivity and not due to differences in the local activity of DNA repair.
Our study establishes a framework for understanding the cis properties of DNA sequence in modulating the local mutation rate and identifies a novel causal source of non-uniform mutation rates across the genome.
KeywordsDNA shape Intrinsic DNA curvature Mutation rate Mutational landscape Mutagen sensitivity
Mutation is the ultimate source of genetic diversity. Therefore, the measurement of mutation rate and, particularly, the identification of the trans factors and cis elements that influence mutation rate are a focus of intense interest in evolutionary biology. A large number of trans factors influencing mutation rate have been identified , such as chromatin remodelers, histone-modifying enzymes, and other DNA-binding proteins [2, 3, 4]. In addition, replication timing [5, 6, 7, 8, 9] and transcription rate [10, 11, 12, 13, 14] also affect mutation rate.
Cis elements may play a more important role in determining the local mutation rate, yet remain poorly understood. Studies of cis elements that determine local mutation rate have been limited to the scale of a few neighboring nucleotides around a mutation site for the past few decades [15, 16, 17, 18].
There is comprehensive cis information in the shape of DNA. Although the double-helix structure of DNA is usually described as a twisted ladder, the steps of the ladder are not rigidly aligned. The local shape of DNA is affected by the interactions of neighboring bases [19, 20]. For example, the depth and width of the minor and major grooves vary depending on the local sequence. Such variation in DNA shape affects the ability of proteins to bind to DNA and the accessibility of each nucleotide [20, 21] and, therefore, is under purifying selection . Through its effect on DNA-protein and/or DNA-solvent interactions, the shape of the double helix may influence the local mutation rate. However, the role of DNA shape in influencing local mutation rate has not been systematically studied. Here, we provided several lines of evidence that intrinsic DNA curvature affects the local mutation rate in a quantitative and predictable manner. Our study therefore expands our knowledge of cis elements that regulate mutation rate by integrating information regarding the physical shape of the double helix and develops a new framework to understand the evolution of local mutation rate.
Results and discussion
Characterization of the mutational landscape of URA3
To measure bias in mutation rate, we need to determine the number of observed mutations and to compare it with the number expected if the mutation rate was uniform. As the missense mutations that would permit growth on 5-FOA is unknown, we focused our analysis on nonsense mutants. There are 104 potential nonsense mutation sites in URA3. For each of them, we counted the number of 5-FOA plates where each nonsense mutation was observed (Fig. 1b). This number varied between 0 and 8 (Fig. 1b). To determine if this variation in frequency could be fully explained by the inherently stochastic nature of mutation, we randomly assigned each of the observed 154 nonsense mutations to a potential nonsense mutation site. We then calculated the standard deviation of the observed numbers of nonsense mutations on these sites and that in the permutation. The observed standard deviation was significantly greater than the random expectation (P < 0.001, Fig. 1c), suggesting the presence of cis elements that affect the local mutation rate.
A nonsense mutation may not always lead to a loss of function, especially when it occurs near the stop codon. This would also lead to a non-Poisson distribution of observed mutations. To exclude this confounding factor, we repeated the permutation test using only the first two thirds of the coding sequence. Again, the observed standard deviation was significantly greater than the random expectation (Additional file 1: Figure S1a). Similar results were also obtained when we performed the permutation test separately for the 54 nonsense transitions and the 100 nonsense transversions (Additional file 1: Figure S1b-c). Taken together, the variation in the frequency of nonsense mutations within URA3 suggests the presence of cis elements that modulate local mutation rate.
Mutations in URA3 tend to occur in DNA regions with a smaller intrinsic DNA curvature
Models on predicting the mutation rate of a potential nonsense site in URA3
Mutation rate ~ “0” *
Mutation rate ~ “0” + “+ 1” + “– 1”
Mutation rate ~ “0” + “+ 1” + “+ 2” + “+ 3” + “– 1” + “– 2” + “– 3”
Mutation rate ~ curvature **
Mutation rate ~ “0” + curvature
To identify additional DNA sequence features predictive of local mutation rates, we used a sliding window to divide the URA3 gene into overlapping regions of L nucleotides (L = 10, 20 …, or 100 bp). We calculated the average mutation rate in each region as the total number of observed nonsense mutations in this region normalized by the number of potential nonsense mutation sites (Additional file 1: Figure S3a). For each region, we then calculated 17 DNA properties such as GC content, thermodynamic characteristics, groove properties, and DNA shape features using well-established computational methods [19, 24] (Additional file 1: Figure S3b). Finally, for each window size, we calculated the correlation between mutation rate and each of the DNA properties.
The correlation between mutation rate and DNA curvature was not confounded by GC content [17, 26] which in our data was not correlated with mutation rate (Fig. 2a). We previously showed that nucleosome binding suppresses spontaneous C>T transitions . To quantitatively determine the relationship between mutation rate, nucleosome occupancy, and DNA curvature, we performed high-throughput sequencing on nucleosome-protected DNA fragments. The correlation between DNA curvature and mutation rate persisted after controlling for nucleosome occupancy (partial rURA3 = − 0.6, P = 1 × 10− 8), suggesting that the relationship between mutation rate and DNA curvature is not due to differences in nucleosome occupancy.
As a form of experimental cross-validation to determine if our results from URA3 are generalizable to other genes, we used an independently generated set of mutations in the yeast gene CAN1 , for which nonsense mutations were selected using the arginine analogue canavanine. Intrinsic DNA curvature is also predictive of mutation rate in CAN1 (Fig. 2c and Additional file 1: Table S2). In addition, nonsense mutations were reported to be unevenly distributed across sites within three human genes associated with Mendelian disease, MECP2, NF1, and RB1 , and within a tumor suppressor gene TP53 . Consistently, intrinsic DNA curvature around a potential nonsense site was also negatively associated with the mutation rate of the site in each of these four genes (Additional file 1: Table S3).
Mutations in yeast and in humans accumulate in DNA regions with a smaller intrinsic DNA curvature
To determine if DNA curvature affects mutation rate at the genomic scale, we used a mutation accumulation assay in which spontaneous mutations accumulate at ~ 100× the normal rate due to a mutation in a gene related to DNA mismatch repair, MSH2 . We retrieved all 882 mutations that were supported by an at least 20× coverage in the high-throughput sequencing data. We calculated the intrinsic DNA curvature of a region from 50 bp upstream to 50 bp downstream of each mutation. As a control, we randomly chose 882 sites with identical 3-nucleotide contexts (the mutation site, + 1, and − 1 sites) from the rest of the genome. We performed this random sampling procedure 1000 times. We found that the observed mutations were located in regions with a smaller intrinsic DNA curvature (P = 0.04, permutation test, Fig. 2d). It suggests that in the genome as a whole, regions with a smaller intrinsic DNA curvature have higher mutation rates.
The large number of somatic mutations in tumor cells permitted a more robust test of the effect for nucleotide context. We found that DNA curvature negatively correlates with mutation rate when controlling for the trinucleotide (Additional file 1: Figure S6) or heptanucleotide context (Additional file 1: Figure S7). In addition, we performed logistic regression to predict whether a site has a somatic mutation in at least one cancer sample. Variables used in the regression model include the nucleotide at a site, six flanking nucleotides (three upstream and three downstream) around the site , and intrinsic DNA curvature of the 101-bp region from 50 bp upstream to 50 bp downstream of the site. Consistently, we found that DNA curvature was a negative predictor of somatic mutations, and the effect of it is comparable to a nucleotide substitution in the six flanking sites (Additional file 1: Figure S8). The type of nucleotide at the site was a strong predictor of somatic mutations in human tumor cells (Additional file 1: Figure S8), likely because variation in DNA methylation among CpG sites plays an important role in determining mutation rate . In contrast, DNA methylation is virtually none in the budding yeast .
To determine if our results from somatic mutations in human tumors are applicable to germline mutations, we further retrieved 101,377 de novo point mutations identified from 1548 trios from Iceland . Again, we observed a smaller DNA curvature around these mutations (P < 0.001, permutation test, Additional file 1: Figure S9). Taken together, DNA curvature is a robust predictor of non-uniform mutation rates in both yeast and humans.
Genetic manipulation of DNA curvature affects mutation rate
We used an electrophoretic mobility shift assay to confirm that the intrinsic DNA curvature was altered in these variants [35, 36, 37]. Variants with a greater predicted intrinsic DNA curvature [19, 24] migrated more slowly than those with a smaller curvature (Additional file 1: Figure S11), presumably due to the different friction force that they encountered in the process of migration.
To determine if genetic manipulation of curvature alters mutation rate, we cultured cells with each of the five URA3 variants in SC media to allow mutations to accumulate, spread cells onto 5-FOA plates, and counted the number of colonies on each plate (Fig. 4b). We calculated the mutation rate of each variant from the fraction of plates without mutants  and found that variants with a 10% smaller intrinsic DNA curvature had a 70% higher mutation rate (Fig. 4c). It suggests that experimental decreasing DNA curvature increases mutation rate.
Intrinsic DNA curvature alters the mutation rate, not mismatch repair efficacy
There are two non-mutually exclusive mechanisms by which intrinsic DNA curvature can modulate the net mutation rate . First, intrinsic DNA curvature may reduce the supply of mutations. Second, intrinsically curved DNA may facilitate the recruitment of mismatch repair-related proteins, which can increase the DNA repair efficacy [3, 9]. To determine if intrinsic DNA curvature reduces the supply of mutations or affects repair efficiency, we knocked out MSH2 and repeated the mutation accumulation experiment (Fig. 4b). In the absence of Msh2, the effect of DNA curvature on mutation rate is even larger; a 10% decrease in curvature results in a 100% increase in mutation rate (Fig. 4d). This observation suggests that the altered net mutation rate by DNA curvature is due to differences in the supply of mutations and not to differences in DNA repair efficacy.
DNA curvature is negatively correlated with mutagen sensitivity in human cancer cells
Implications in evolutionary genomics
Understanding the variation in mutation rate is central to numerous questions in evolutionary genetics. Particularly, modeling the variability in mutation rate among sites of a genome is of key importance in studies of molecular evolution because it provides a null model that can be rejected when natural selection occurs. Sequence-intrinsic cis elements are more computationally tractable than trans factors in modeling mutation rate in molecular evolution studies, because with cis elements the expected mutation rate can be predicted directly from the surrounding sequences of a site . For example, the evolutionary rates of genes have been extensively studied, and particularly, comparisons between those of essential and nonessential genes have been made [43, 44, 45, 46, 47]. Previous studies focused on the difference in the strength of negative selection and neglected the potential difference in mutation rate, presumably because the latter was hard to model. In this study, we discovered that a key DNA shape feature, intrinsic DNA curvature, modulated local mutation rate. Interestingly, we observed that essential genes exhibit a greater DNA curvature in both yeast (Additional file 1: Figure S12) and humans (Additional file 1: Figure S13), suggesting that they have a lower mutation rate. This observation urges the need of considering the difference in mutation rate when comparing evolutionary rate among genes.
Furthermore, the high-density fitness landscapes of random mutations on a gene have been extensively characterized in previous studies [48, 49], aiming to understand the trajectory of biological evolution. However, evolutionary trajectories are determined by natural selection acting on mutations. Inherent biases in the generation of the random mutations must therefore be taken into account. Our study on mutational landscape complements these previous studies on fitness landscapes and will significantly contribute to the ultimate understanding of evolutionary trajectories .
We found that the shape of the DNA double helix plays a major role in determining the local mutation rate. In particular, we identified a key feature, intrinsic DNA curvature, that determines the local mutation rate in both yeast and humans. We genetically manipulated the intrinsic DNA curvature and observed an altered mutation rate consistent with the genome-wide data. We showed that this effect is due to increased mutation rate, likely due to increased exposure to mutagens, and not due to differential efficacy of repair machinery. Taken together, our study extensively expands our knowledge of elements that regulate mutation rate by integrating the valuable information of DNA shape, and develops a new framework to understand evolution and tumorigenesis at a nucleotide resolution.
Characterization of the mutational landscape of URA3
A haploid S. cerevisiae strain derived from the W303 background, GIL104 (MATa URA3, leu2, trp1, CAN1, ade2, his3, bar1Δ::ADE2), was used to characterize the mutational landscape of URA3. Cells from a single colony were cultured in 5 ml SC media with uracil dropped-out (SC-uracil) at 30 °C for 24 h. Cells were then transferred into 5 ml fresh SC media (at an initial OD660 ~ 0.1) and grown for 24 h to accumulate mutations. ~ 5.0 × 107 cells were spread onto SC-uracil plates containing 1 g/l 5-FOA to select for loss-of-function mutants of URA3. A total of ~ 1000 ura3 variants were isolated from 5-FOA plates and were Sanger sequenced separately. PCR and Sanger sequencing primers are listed in Additional file 1: Table S6.
Calculation of the mutation rate and the values of DNA properties in URA3 and CAN1
We identified a total of 452 mutations in URA3 (Additional file 1: Table S1), including 5 synonymous mutations, 293 missense mutations, and 154 nonsense mutations. We focused on these 154 nonsense mutations in this study for the sake of accuracy in estimating mutation rate. To be specific, we need to count the number of potential loss-of-function mutation sites, which would be used to normalize the number of observed mutations and hence to calculate the mutation rate. The number of potential loss-of-function missense mutations was difficult to estimate because it remains elusive which missense mutations lead to a loss of function and which do not. Mutation rate was determined using overlapping windows with size equal to L nucleotides (L = 10, 20 …, or 100 bp, Additional file 1: Figure S3). The window slid for 10 nucleotides each movement. The value of a DNA shape feature was calculated based on the frequencies of all 16 possible combinations of dinucleotide in a region, following previous studies [19, 24].
Estimation of nucleosome occupancy
The wild-type S. cerevisiae strain (BY4741 URA3) was grown to log-phase in YPD (1% yeast extract, 2% peptone, and 2% dextrose) liquid medium. We performed nucleus isolation, micrococcal nuclease (MNase) digestion, and chromatin preparation as described previously , with the following modifications. We adjusted NP-S buffer to 0.5 mM spermidine, 0.075% (v/v) NP-40, 50 mM NaCl, 50 mM Tris-HCl pH 7.5, 5 mM MgCl2, and 5 mM CaCl2, and used 100 units of MNase to digest the nuclei for 5 min. We performed Protease K digestion and exacted the core particle DNA. Paired-end libraries were constructed using Illumina-compatible DNA-Seq NGS library preparation kit from Gnomegen and were sequenced with Illumina HiSeq 2500 (PE125, paired-end 2 × 125 bp). ~ 10.6 million clean reads were aligned to the S. cerevisiae genome using bowtie2 with default parameters . Nucleosome occupancy of a nucleotide was defined as the number of read pairs uniquely mapped to the genome region covering the nucleotide. The raw sequencing data of MNase-seq have been deposited to the Genome Sequence Archive  in BIG Data Center (http://bigd.big.ac.cn/gsa), Beijing Institute of Genomics, Chinese Academy of Sciences, under accession number CRA000570.
Generation and analyses of the URA3 variants
We designed four synonymous variants of URA3 with different intrinsic DNA curvature (Additional file 1: Tables S4–S5). We estimated the minimum free energy (MFE) for all 20 nucleotide windows in the coding sequence with RNAfold  and defined the average MFE of them as the strength of the RNA secondary structure of a variant. Codon adaptation index (CAI) was calculated following our previous study . Four URA3 variants were synthesized by Wuxi Qinglan Biotech, and the wild-type URA3 DNA sequence was amplified from S288C. Primers are listed in Additional file 1: Table S6. Each of the five variants was introduced into the chromosomal location of URA3 in BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0) with homologous recombination.
We used electrophoretic mobility shift assay to confirm the difference in intrinsic DNA curvature of the five synonymous variants. We loaded an equal amount of PCR products of five variants into a 12% native polyacrylamide gel. We performed the electrophoresis experiment in the TBE buffer (89 mM Tris, 89 mM boric acid, and 2.5 mM EDTA, pH 8.0) for 12 h at 120 V.
Total RNA was extracted with hot acidic phenol (pH < 5.0) and was reverse transcribed with the GoScript™ reverse transcriptase. Quantitative PCR (qPCR) was carried out on the Mx3000P qPCR System (Agilent Technologies) using Maxima SYBR Green/ROX qPCR Master Mix. ACT1 was used as the internal control. Primers used are listed in Additional file 1: Table S6.
Estimation of mutation rate in yeast mutation accumulation (MA) lines
A previous study identified ~ 1000 single nucleotide mutations by sequencing the genomes of five MA lines of a mismatch repair-deficient S. cerevisiae strain (BY4741 msh2::kanMX4) . The mutation data from this study was used because the efficacy of purifying selection in MA experiments [17, 23] was further reduced in mutators. We analyzed the mutations supported by ≥ 20× coverage and retrieved 882 single nucleotide mutations that were identified in at least one of the five replicates from this study. As a control, we chose 882 random sites in the rest of the yeast genome and defined them as the pseudo-mutation sites. We calculated the average intrinsic DNA curvature around these pseudo-mutation sites and repeated this procedure for 1000 times. P values were calculated as the fraction of pseudo-mutation sets exhibiting a smaller average intrinsic DNA curvature than that of the observed mutation sites among 1000 permutations.
Estimation of mutation rate in humans
When multiple projects for a cancer type exist, we combined all SNVs in these projects. On average, ~ 100,000 SNVs were identified in a cancer type. For each cancer type, we calculated the average intrinsic DNA curvature of the flanking DNA sequences of all SNVs (from 50 bp upstream to 50 bp downstream of each SNV). We also randomly chose the same number of sites in the human genome and calculated the average intrinsic DNA curvature of their flanking sequences similarly. This procedure was repeated 1000 times to obtain the distribution of the expected average intrinsic DNA curvature. P values were calculated as the fraction of sets of random sites exhibiting a smaller average intrinsic DNA curvature than that of the observed SNV sites, among 1000 permutations. In TCGA, different methods were used to identify mutations (Mutect, Muse, Somaticsniper, and Varscan, Additional file 1: Figures S4, S6–S7). The SNVs in 5′ UTR, coding sequences, and 3′ UTR were also separately analyzed, with the expectation obtained by only sampling DNA sequences in the corresponding type of genomic regions. Because the number of SNVs in 5′ UTR and 3′ UTR were relatively small, SNVs in all cancer types were combined. In addition, 101,377 de novo point mutations in the human germline were retrieved from a previous study . Permutation test were performed as described in cancer cells (Fig. 3).
We thank Yuliang Zhang for technical support in data analysis, and Mengyi Sun and Jian-Rong Yang for critical reading of the manuscript.
The review history for this manuscript is available as Additional file 2.
This work was supported by grants from the National Natural Science Foundation of China to X.H. and W.Q. (91731302).
Availability of data and materials
Protein-protein interaction (PPI) data in yeast were downloaded from Saccharomyces Genome Database (https://www.yeastgenome.org/) . Lists of essential genes and haploinsufficient genes were retrieved from a previous study . Genes leading to significant growth reduction upon deletion were identified in a previous study with Bar-seq . Duplicate genes in the yeast genome were defined in a previous study . PPI data in humans were downloaded from Biogrid (https://thebiogrid.org/) . Human essential genes were retrieved from two previous studies [61, 62], respectively. The list of haploinsufficient genes in humans were retrieved from a previous study . The value of each dinucleotide for each DNA shape feature was obtained from the Dinucleotide Property Database (http://diprodb.leibniz-fli.de/ShowTable.php) . The data of SNVs in cancer cells were retrieved from The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/) . Chromosomal sequences surrounding these SNVs were retrieved from Ensembl release 87 (www.ensembl.org). Mutations in MECP2, NF1, and RB1 were retrieved from ClinGen (https://clinicalgenome.org/)  and mutations in TP53 were retrieved from IARC TP53 Database (http://p53.iarc.fr/) .
CD, LBC, XH, and WQ designed the experiments. CD, QH, and XC performed the experiments. CD, QH, and WQ analyzed the data. CD, SW, LBC, and WQ wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 34.Jonsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E, Hardarson MT, Hjorleifsson KE, Eggertsson HP, Gudjonsson SA, et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature. 2017;549:519–22 https://doi.org/10.1038/nature24018.CrossRefPubMedGoogle Scholar
- 56.Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K, Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, et al. Saccharomyces genome database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 2004;32:D311–4.CrossRefPubMedPubMedCentralGoogle Scholar
- 61.Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S, et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015;163:1515–26 https://doi.org/10.1016/j.cell.2015.11.015.CrossRefPubMedGoogle Scholar
- 66.Duan C, Huan Q, Chen X, Wu S, Carey LB, He X, Qian W. Reduced intrinsic DNA curvature leads to increased mutation rate. Genome Sequence Archive. 2018. http://bigd.big.ac.cn/gsa/browse/CRA000570. The release date: 19 Apr 2018.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.