G-quadruplexes may determine the landscape of recombination in HSV-1
Several lines of evidence suggest that recombination plays a central role in replication and evolution of herpes simplex virus-1 (HSV-1). G-quadruplex (G4)-motifs have been linked to recombination events in human and microbial genomes, but their role in recombination has not been studied in DNA viruses.
The availability of near full-length sequences from 40 HSV-1 recombinant strains with exact position of the recombination breakpoints provided us with a unique opportunity to investigate the role of G4-motifs in recombination among herpes viruses. We mapped the G4-motifs in the parental and all the 40 recombinant strains. Interestingly, the genome-wide distribution of breakpoints closely mirrors the G4 densities in the HSV-1 genome; regions of the genome with higher G4 densities had higher number of recombination breakpoints. Biophysical characterization of oligonucleotides from a subset of predicted G4-motifs confirmed the formation of G-quadruplex structures. Our analysis also reveals that G4-motifs are enriched in regions flanking the recombination breakpoints. Interestingly, about 11% of breakpoints lie within a G4-motif, making these DNA secondary structures hotspots for recombination in the HSV-1 genome. Breakpoints within G4-motifs predominantly lie within G4-clusters rather than individual G4-motifs. Of note, we identified the terminal guanosine of G4-clusters at the boundaries of the UL (unique long) region on either side of the OriL (origin of replication within UL) represented the commonest breakpoint among the HSV-1 recombinants.
Our findings suggest a correlation between the HSV-1 recombination landscape and the distribution of G4-motifs and G4-clusters, with possible implications for the evolution of DNA viruses.
KeywordsHerpes simplex virus-1 G-quadruplexes Recombination breakpoints Higher-order G4 s G4- clusters
Human Herpes Virus-I
Herpes Simplex Virus-I
Repetitive G-Quadruplex Motif
Herpes simplex virus-I (HSV-1), also known as Human Herpes Virus-I (HHV-1), is a double-stranded DNA virus with a genome size of about 150 kb. HSV-1 infects the epithelial (skin, mucosa) and neuronal tissues. The genome of HSV-1 is organized as long (L) and short(S) segments. Each segment (i.e. the L and the S segment) further comprises a central unique region (U) flanked by repeats (terminal repeats –TR; inverted repeats –IR) inverted with respect to each other. The HSV-1 genome is canonically represented as: TRL-UL-IRL-IRS-US-TRS. This genomic arrangement allows unique regions to invert, thus resulting in four isomeric forms of HSV-1 genome. It is widely reported that the junction between IRL and IRS, known as the ‘a’ region, promotes the intramolecular recombination leading to isomeric genomes [1, 2, 3]. Intermolecular recombination in HSV-1 is, however, less well-studied.
G-quadruplexes (G4 s) are nucleic acid secondary structures formed by a sequence motif consisting of four guanine trinucleotides interspersed by nucleotides of atmost 15 bases in length. They exert spatio-temporal effects on transcription, translation, replication, telomere extension and alternative splicing in the mammalian and microbial genomes [4, 5, 6].
Many are the evidences for potential involvement of G4sin recombination. Recombination hotspots in human genome, loci of antigenic variation in microbial genomes, nucleotide segments associated with fragility and chromosomal translocations in cancer, are all known to be spatially associated with G4 s [7, 8, 9, 10]. The proximity of G4 s to such sites of recombination may be relevant in the recruitment of protein factors necessary for the genetic rearrangement [11, 12, 13]. Among viruses, a role for G4 s in recombination has been studied only in HIV-1. Dimerization of HIV-1 genomes by formation of intermolecular G4 s in U3 region, DIS (dimerization site) and cPPT (central polypurine tracts) has been linked to switching-over of templates by reverse transcriptase in vitro, suggesting a role for G4 s in HIV-1 recombination [14, 15, 16].
The association of G4 s with recombination in the human genome is well-studied [8, 17, 18]. Although recombination is well-documented among DNA viruses infecting humans, the role of G4 s, if any, in the recombination of DNA viruses has not been investigated. We chose to investigate the role of G4 s in recombination in HSV-1 as (a) Several G4 s have been reported in HSV-1 [19, 20, 21] and (b) recombination among DNA viruses is most extensively studied in HSV-1 . In addition, Lee etal (2015) recently studied the recombination of two HSV-1 strains, OD4 and CJ994, both in in vitro and in in vivo conditions; they characterized the exact nucleotide position of 577 breakpoints by sequencing of 40 recombinant HSV-1 strains (Additional file 1: Table S1) and mapped them to HSV-1 strain 17 . The availability of 40 whole genome sequences of HSV-1 with over 500 intermolecular recombination breakpoints provided us with a unique opportunity to study whether a spatial association between recombination breakpoints and G4 s exists in the HSV-1 genome.
Results and discussion
The distribution of recombination breakpoints mirrors G4 densities in the HSV-1 genome
Enrichment of G4-motifs in regions flanking recombination breakpoints
Given that most breakpoints lie in repeat regions, it is possible that inherent differences in nucleotide composition of genomic regions within the HSV-1 genome may influence G4 densities. To investigate this possibility, we randomized the sequences of the flanking regions for each breakpoint 5 times in all the 40 HSV-1 recombinants (i.e. a total of 577 breakpoints) studied; this was done without changing the overall nucleotide composition of the randomized sequences. The median G4 density of the native flanking regions was found to be significantly higher than the median G4 density of the randomized sequences (Additional file 1: Figure S1; P < 0.0001), indicating the selective enrichment of G4-motifs near recombination breakpoints is independent of differences in mononucleotide composition of the flanking regions. Taken together, we infer that the breakpoints in the HSV-1 genome are localized in the vicinity of G4-rich genomic segments. G4-motifs within 500 bp of breakpoints have been suggested to be functionally relevant in the chromosomal rearrangement in cancer [9, 25]. Furthermore, G4 s are enriched within a 500 bp flanking region of double-stranded breaks (DSBs) in Saccharomyces cerevisiae . Our findings that G4-motifs are enriched in 500 bp flanking regions of recombination breakpoints corroborate the spatial relationship between G4 s and site of recombination in HSV-1 genomes.
It is well established that recombination and replication go hand-in-hand in HSV-1 [1, 27, 28]. Artusi et al reported the formation of G4 s in HSV-1 genome in concert with the virus’ replication cycle . Taken together, a conjoint theory of a ternary temporal association among replication, recombination and G4 s can be conceived and the work presented here supports this notion.. ICP8 and UL12 constitute a two-component recombinase system in HSV-1 . Of these two components, ICP8 is known to co-localize with G4 s during HSV-1 replication . The HSV-1 encoded UL12 binds the tripartite MRN complex that is capable of binding G4 s [13, 31] . Several host encoded recombination and repair proteins are reported to be essential for HSV-1 recombination . In light of these reports, the enrichment of G4-motifs in the flanking regions of breakpoints hints on their possible involvement as scaffolds that recruit the viral and host factors comprising the molecular machinery of recombination in HSV-1. The possible association between G4-motifs and recombination among other human herpes viruses (HHVs) merits further research.
Biophysical characterization of G4-motifs flanking the recombination breakpoints
To verify whether the G4-motifs predicted by Quadparser to lie within the 500 bp region flanking the breakpoints truly formed the G4 structure, eight G4-motifs from the recombinant genomes were chosen randomly (Additional file 1: Table S2 and Table S3) for in vitro biophysical characterization by circular dichroism (CD), and nuclear magnetic resonance spectroscopies (NMR).
Parallel orientation of strands appears to be a feature common to the G4-motifs associated with recombination. For example, G4-motifs near (a) sites of chromosomal rearrangement in cancer-related genes like HOX11 , BCL-2 , TCF-3  (b) loci of antigenic variation in Neisseria gonorrhoeae  and Treponema pallidum  and (c) the central polypurine tract (cPPT), a dimerization site in HIV-1 , formed parallel G4 s in vitro. The ability of such structures to promote strand exchange in vitro was also demonstrated in HIV-1. Moreover, Mre11p, a part of the eukaryotic MRN/X complex involved in repair of DSBs and in meiotic recombination, was reported to have a higher binding affinity for parallel G4 s . As already mentioned, MRN/X complex is known to interact with recombinases in HSV-1 and may be a host factor relevant in HSV-1 recombination . Collectively, these reports reiterate that the parallel G4 s identified herein have a potential functional role in HSV-1 recombination.
Recombination breakpoints are located near G4-motifs
G4-motifs themselves are potential hotspots for recombination
Recombination occurring within G4-motifs is G4-non-disruptive
Genomic segments encompassing the breakpoints are prone to indels and SNPs; their nucleotide sequence is hence dynamic [38, 39]. Thus, incidence of breakpoints within G4-motifs may affect the integrity of the G4-motif. In this regard, the preferential location of breakpoints within G4-motifs lends us to suppose that disruption of G4-motif may be a fitness cost associated with HSV-1 recombination. To analyze this possibility, we mapped the 577 breakpoint loci in the two parental strains, OD4 and CJ994.We focused on those recombination events where the breakpoint is contained within a G4-motif in both the parental strains. Events were categorized as ‘G4-conservative’ if the motif is retained in the recombinant progeny and as ‘G4-disruptive’, if the G4-motif is lost in the process of recombination. The proportions of G4-conservative and G4-disruptive events are plotted in Fig. 6b. Contrary to the supposition, recombination is predominantly G4-conservative (Fig. 6b). In other words, even when breakpoints are located within G4-motifs, majority of the G4-motifs are conserved (i.e. not disrupted) during recombination. This preservation of G4-motifs in the course of recombination-mediated evolution is interesting and is suggestive of a biological role of G4 s in HSV-1.
G4-clusters in HSV-1 are hotspots for recombination
We analyzed the subset of breakpoints lying within G4-motifs (i.e. a total of 64 breakpoints) and identified some unique features (Fig. 7b). Firstly, majority (about 76%) of these breakpoints are harbored within G4-clusters (Fig. 7a) as compared to in individual G4-motifs (24%). Secondly, some of the breakpoint-containing G4-clusters were the repetitive G-quadruplex motifs (RGQMs), characterized earlier in HHVs . RGQMs are G4-forming repetitive sequences with iterations across the genome; their functional roles are however unknown. Our finding, suggests a potential role for RGQMs in virus recombination. Thirdly, among the breakpoints present within G4-clusters, most of them (about 61%) are borne in higher-order G4-clusters (Fig. 7b). Recombination in HSV-1 is closely intertwined with replication. G4 s in HSV-1 have been shown to stall the progression of DNA polymerase under in vitro conditions . We hypothesize the higher-order G4 s to potentially exacerbate the polymerase stalling, leading to nicking and onward to introduction of double strand breaks and recombination.
The terminal guanosine of G4-cluster at the boundary of the UL region is the commonest breakpoint
A possible explanation for the overrepresentation of the two UL boundary G4-clusters in recombination may lie in their genomic position with respect to HSV-1 oriL (the origin or replication that lies in the UL segment). The two UL boundary G4-clusters are present on both sides of oriL. We speculate that the higher-ordered nature (their order value is 9) of the two UL boundary G4-clusters may represent a formidable challenge for the viral polymerase that stalls at the very first nucleotide (i.e. the terminal “G” nucleotide of both the UL boundary G4-clusters), thus making the oriL-proximal terminus of these G4-clusters a common recombination locus.
The junctions of the unique and repeat segments of the HSV-1 genome are known to be recombinogenic [42, 43]. While they are known to be preferred sites of intramolecular recombination leading to duplication and inversion of genomic segments, no such precedence of these sites in intermolecular recombination is known. Our report identifies the terminal “G” nucleotides which are part of the two UL boundary G4-clusters at the junction of the unique and repeat segments of HSV-1 to be common sites of intermolecular recombination.
In sum, our computational analyses strongly argue in favour of an association between G4 s and recombination breakpoints in the HSV-1 genome. However, we have not attempted to identify the possible underlying mechanisms. It is possible that some breakpoints could have been missed due to drawbacks in currently available sequencing techniques; although this number may be small, it represents another limitation of this study.
Association between G4 s and recombination has not been previously reported among DNA viruses. Here, we report multiple lines of evidences linking G4-motifs and recombination in HSV-1 genomes. We identified that recombination landscape is closely associated with the density of G4-motifs in the HSV-1 genome (Fig. 1). Encouraged by the spatial association between G4-motifs and recombination, we zeroed in on the individual breakpoints in the 40 recombinant strains and analyzed them on the basis of two fundamental questions (a) how do the genomic segments containing recombination breakpoints differ from the rest of the genome in terms of G4 demography? (b) How close are recombination breakpoints to G4-motifs? These questions allowed us to address two distinct aspects of the relationship between recombination breakpoints and G4-motifs, making our approach two-pronged. The former is based on a spatial association between G4-motifs and recombination breakpoints, while the latter is based on a one-dimensional variable of length. Our analysis revealed a selective enrichment of G4-motifs in 500 bp regions flanking the recombination breakpoints in HSV-1 (Fig. 2). Oligonucleotides from a subset of predicted G4-motifs in the flanking sites of breakpoints formed secondary structures in vitro (Figs. 3 and 4). We noted that recombination breakpoints of HSV-1 are specifically located close to G4-motifs (roughly 350 bp) as compared to randomly selected points on the HSV-1 genome (Fig. 5). An intriguing answer to question (b) is breakpoints can lie as close as within G4-motifs themselves. Interestingly, such type of recombination events (i.e. breakpoints contained within G4-motifs) have a more than expected representation in the recombination scenario of the 40 strains analyzed in our study (Fig. 6a). This finding emphasizes a role for G4 s in recombination in HSV-1. In addition, we found that G4-clusters are hotspots for recombination in HSV-1. Furthermore, we noted that breakpoints often lie in the terminal nucleotide positions of higher-order G4-clusters (Fig. 7). Importantly, the two most common recombination breakpoints of HSV-1, the boundary nucleotides of UL segment, are the terminal nucleotides of higher-order G4-clusters, indicating a significant role for higher-order G4-clusters in HSV-1 recombination (Fig. 8). Such roles for higher-order G4 s in microbial genomes have not been reported thus far. Our work provides a novel view of HSV-1 evolution which may be important in understanding its epidemiology, replication and virulence characteristics. Our findings also shed light on hitherto unknown roles for G4 s in the genomes of DNA viruses.
Retrieval of sequences
The whole genome sequences of strain 17, parental (OD4 and CJ994) and the 40 recombinant strains were retrieved from NCBI in FASTA format using the accession numbers reported by Lee et al  (Additional file 1: Table S1).
Identification of G4-motifs and computation of G4 density
The genomes of the 40 recombinant strains were mined for G4-motifs conforming to the motif, G3N1-7G3N1-7G3N1-7G3,using Quadparser . Both strands of the genome were searched for G4-motifs. The program lists the nucleotide positions and the sequences of the identified G4-motifs in the output. This output was used for identification of (a) the G4-motif nearest to a given breakpoint in a strain (b) the G4-motifs harboring recombination breakpoints in a strain. Only non-overlapping G4-motifs were considered in our analysis.
Sliding window analysis: A nucleotide window of 100 bp, advancing by one basepair, was slid along the length of the genome of strain 17 using an in-house program . A total of 152,162 100 bp windows were generated. The window sequences were then input to Quadparser and their G4 densities were computed.
Analysis of 500 bp region flanking the breakpoints in the recombinants’ genome: The whole genome sequences of each of the 40 recombinants was input to Range Extractor tool of Sequence Manipulation Suite, an online sequence analysis platform, for extraction of the 500 bp region flanking each of their respective breakpoints on either sides. If the flanking sequences of two consecutive breakpoints overlap, a single segment starting from (predecessor breakpoint - 500) to (successor breakpoint + 500) was considered to avoid double-counting of G4-motifs and the length of the region of overlap. The number of G4-motifs in the flanking regions of all breakpoints in a strain was summed up strand-wise, normalized to the total length of flanking regions and averaged to compute the G4 density of the flanking regions. Likewise, the segments of the rest of the genome (i.e. other than the flanking regions of the breakpoints) were also extracted from the respective recombinant’s genome and the G4 density was calculated.
Randomization of the 500 bp flanking regions: The 500 bp flank sequences of all breakpoints in the 40 strains were randomized 5 times without altering the overall nucleotide composition using Bioedit  with the software’s default randomization parameter of 10,000 shuffles. The G4 density of the randomized sequences was calculated for each randomization trial as described earlier and averaged over 5 trials strain wise.
Generation of random breakpoints
If recombination were a chance event, the breakpoints would be uniformly distributed throughout the genome and be independent of G4-motifs. To test this null hypothesis, random breakpoints were generated for each recombinant genome. The syntax used for generating random numbers within a defined range in Linux is as follows: shuf –i < lower limit –upper limit > −n < number of random numbers > −o < output file name >. The output is a .txt file. Seven hundred and fifty random breakpoints were generated for each of the 40 recombinant strains. These comprise the set of ‘randomized breakpoints’ referred to in Figs. 5 and 6.
Data analysis, graphical representation and statistics
Microsoft Excel was used for analysis of data and plotting of bar graphs. Violin plots were generated using the software R. The R packages used were ggplot, forcats, and R ColorBrewer. CD spectral curves were plotted using Graphpad Prism 5.0. Origin 9.1 was used for plotting of Fig. 1. Figure 7 was created using MS PowerPoint. Unless mentioned otherwise, statistical significance was determined using Wilcoxson matched-pairs signed rank test in Graphpad Prism 5.0. P values less than 0.05 were considered significant.
Circular dichroism spectroscopy
Among the G4-motifs present in the 500 bp flanking region of breakpoints of all 40 strains, 8 were selected at random using MS Excel. The sequences of the 8 G4-motifs chosen are listed in Additional file 1: Table S2. The oligonucleotides were purchased from Integrated DNA Technologies (IDT) for in vitro analyses.
Oligonucleotides prepared at 10 μM concentration in a buffer containing sodium cacodylate (10 mM) and KCl (100 mM) were heated at 95 °C for 5 min and cooled to room temperature on standing. A sample containing only the buffer components and treated in the same manner was used as blank. CD spectroscopy was performed using J 815 spectrophotometer (JascoInc, Japan) and a quartz cuvette with a pathlength of 1 mm. The following parameters were used for obtaining the spectra (a) Temperature: 20 °C (b) Wavelength range: 220 nm–320 nm (c) Accumulations: 3 (d) Bandwidth: 0.5 nm (e) Step size: 1 nm (f) Time per point: 1 s.
Oligonucleotides (Additional file 1: Table S2) at a concentration of 300 μM were prepared in 20 mM phosphate buffer (pH 7.0) containing 100 mM KCl and 10% D2O (v/v), heated to 95 °C and allowed to cool slowly to room temperature before measurement of spectra. 1D 1H NMR spectra were recorded at 20 °C on Bruker Avance III spectrometer operating at 500 MHz field strength. Topspin 3.5 was used for data acquisition, data processing and plotting of spectra.
Some of the equipments used for experiments in this study were funded by Kusuma Trust, UK. The funding body had no role in the design of the study, data analysis, data interpretation and in writing the manuscript. The authors thank the Department of Biotechnology (DBT), Government of India for providing financial support for the 500 MHz NMR spectrometer at the ICGEB, New Delhi.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
NS did the computational work, majority of the experimental work, analyzed the data and wrote the manuscript. BB contributed to the randomization analysis, was involved in interpretation of data and created the violin plots. AP did the NMR spectroscopy experiments and interpreted the results of NMR spectroscopy. PV conceived the idea, designed the study and edited the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they do not have any competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 32.Wilkinson DE, Weller SK. Recruitment of cellular recombination and repair proteins to sites of herpes simplex virus type 1 DNA replication is dependent on the composition of viral proteins within Prereplicative sites and correlates with the induction of the DNA damage response. J Virol. 2004;78(9):4783–96.CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Giacani L, Brandt SL, Puray-Chavez M, Reid TB, Godornes C, Molini BJ, Benzler M, Hartig JS, Lukehart SA, Centurion-Lara A. Comparative investigation of the genomic regions involved in antigenic variation of the TprK antigen among treponemal species, subspecies, and strains. J Bacteriol. 2012;194(16):4208–25.CrossRefPubMedPubMedCentralGoogle Scholar
- 44.Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. 1999;Ser. no. 41:95–8.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.