Key words

1 G-Quadruplex Nucleic Acids

G-quadruplexes (G4s) are noncanonical four-stranded nucleic acid structures formed in guanine-rich DNA and RNA sequences (Fig. 1). They have emerged as one of the most exciting nucleic acid secondary structures. DNA is widely recognized as a double-helical structure essential in genetic information storage. Results from the ENCODE project [1] indicate that only ~3% of the human genome is expressed in protein and that RNA and DNA may form noncanonical secondary structures that are functionally important. G-quadruplexes are one such example which have gained considerable attention for their formation and regulatory roles in biologically significant regions. G-quadruplexes are found to be involved in a number of critical cellular processes, including gene transcription, translation, DNA replication, and genomic stability. G-quadruplexes can readily form under physiologically relevant conditions and are globularly folded structures. Many proteins have been identified to interact with G-quadruplex DNA or RNA, including G-quadruplex-stabilizing or destabilizing/unfolding proteins (see reviews: [2,3,4,5,6]). As such, G-quadruplexes have emerged as a new class of molecular targets for drug development. In addition, there is considerable interest in the use of G-quadruplexes for biomaterials [7, 8], biosensors [9, 10], and biocatalysts [11].

Fig. 1
figure 1

(a) Schematic illustration of a G-tetrad, four guanine bases arranged in a square plane with Hoogsteen hydrogen bonding. Monovalent cations (K+ or Na+, shown as blue spheres) are required to stabilize G-quadruplexes by coordinating with the O6 atoms of the adjacent G-tetrad planes. (b) A schematic intermolecular (tetrameric) G-quadruplex with three G-tetrads. (c) Examples of intramolecular G-quadruplexes with different folding structures and loop conformations. The experimentally determined molecular structures are shown as examples for parallel, hybrid, and basket G-quadruplexes. (d) Example NMR molecular structures of ligand complexes with the c-MYC promoter G-quadruplex and the human telomeric G-quadruplex

First observed in 1910 [12], the G-tetrad structure was not determined until 1962 [13]. The core structure of a G-quadruplex consists of stacked guanine-tetrads (G-tetrads), a square planar platform of four guanine bases that are held together by Hoogsteen hydrogen bonds (Fig. 1a). G-quadruplex structures require cations, particularly K+ or Na+, to stabilize stacked G-tetrads by coordinating with tetrad-guanine O6 atoms [14,15,16]. The tetrad-guanines can adopt anti or syn glycosidic conformation; tetrad-guanines from G-strands with the same direction, i.e., parallel strands, adopt the same glycosidic conformation, whereas those from G-strands with the opposite direction, i.e., antiparallel strands, adopt different glycosidic conformations [17].

G-quadruplexes can be intramolecular (monomeric) or intermolecular (multimeric), which are formed with one or more than one nucleic acid molecules, respectively. Tetramolecular G-quadruplexes (Fig. 1b) are usually parallel-stranded with tetrad guanines adopting anti glycosidic conformation. Most biologically relevant G-quadruplexes are intramolecular G-quadruplexes, with three-tetrad cores being the most common (Fig. 1c). In contrast to tetramolecular structures, intramolecular G-quadruplexes form quickly and exhibit great conformational diversity, such as in folding topology, loop conformation, and capping structures. Based on G-strand directionality, a G-quadruplex can be parallel with all four G-strands in the same direction, hybrid/mixed with both parallel and antiparallel strands, or antiparallel with all adjacent G-strands antiparallel to each other. G-strands in intramolecular G-quadruplexes are connected by different types of loops, such as propeller for connecting parallel strands, lateral for connecting adjacent antiparallel strands, and diagonal for connecting antiparallel strands across the G-tetrad core. Not only can different sequences adopt distinct topologies, but a given sequence can also fold into different conformations, as in the case of the human telomeric DNA, or form multiple structures, as in the case of human gene promoter sequences [18]. While a number of principles of G-quadruplex folding have been recognized, a G-quadruplex conformation is difficult to predict and requires experimental structure determination.

2 G-Quadruplex Occurrence and Functions

G-quadruplexes have been found to form in specific human guanine-rich sequences with functional significance, such as telomeres, oncogene-promoter regions, and 5′- and 3′-untranslated region (UTR) of mRNA, as well as in nonhuman genomes.

2.1 G-Quadruplexes in Telomeres

The first biologically relevant G-quadruplex was observed in telomeric DNA. Telomeres are specific DNA–protein complexes at the ends of linear chromosomes, providing protection against gene erosion from cell divisions, chromosomal nonhomologous end-joinings, and nuclease attacks [19,20,21]. Telomeric G-quadruplexes were first reported as novel intramolecular structures containing guanine–guanine base-pairs in single-stranded telomeric sequences of several organisms [22], and as guanine-tetrads between hairpin loops of the Tetrahymena telomeric DNA [23]. The importance of monovalent cations in the stabilization of G-quadruplex structures was revealed by Williamson, Cech in their monovalent cation-induced square-planar G-quartet model using the Oxytricha and Tetrahymena telomeric DNA [14]. Human telomeres consist of tandem repeats of the hexanucleotide d (TTAGGG)n 5–10 kb in length, which terminate in a single-stranded 3′-overhang of 35–600 bases [24]. Telomeres of cancer cells do not shorten upon replication, mainly due to the activation of a reverse transcriptase, telomerase, that extends the telomeric sequence at the chromosome ends [25]. Telomerase is activated in 80–85% of human cancer cells to maintain telomere length and malignant phenotype [25,26,27]. The G-quadruplex formation can inhibit the activity of telomerase [28], making it an attractive target for cancer therapeutic intervention. In addition to the formation at the telomere end, which most likely involves intramolecular G-quadruplex structures, intermolecular G-quadruplex formation may also be involved in the T-loop invasion complex [29, 30]. Recently, telomeric repeat-containing RNA (TERRA) G-quadruplex was identified and found to inhibit telomerase [31, 32].

Human telomeric DNA is structurally polymorphic and may adopt different intramolecular G-quadruplex conformations, including two equilibrating hybrid-type structures [33,34,35,36,37,38,39] and a 2-tetrad structure [40,41,42] in K+ solution, a parallel structure in the crystalline form in the presence of K+ [43], and a basket-type structure in Na+ solution [44]. The hybrid-type structures can effectively form packed multimers at the telomere ends [33, 45]. Although different human telomeric G-quadruplexes appear to have small energy differences relative to each other, interconversion between them is kinetically slow, indicating a high-energy intermediate (s) [33, 41, 46,47,48]. The structure polymorphism appears to be an intrinsic property of the highly conserved telomeric sequence in higher eukaryotes, particularly the TTA loop sequence [49]. On the other hand, telomeric repeat-containing RNA was shown to adopt parallel-stranded G-quadruplexes [50,51,52].

2.2 G-Quadruplexes in Gene Promoters

More recently, DNA G-quadruplexes were found to form in the gene promoter regions and function as transcriptional regulators [53, 54], which has been the most active area for G-quadruplex DNA in the past decade. The first experiments that suggested the existence of unusual forms of DNA associated with runs of guanines in gene promoters were reported in 1982 for the chicken β-globulin gene based upon the nuclease hypersensitivity of promoter elements [55,56,57]. Since then, the occurrence of these elements in the human gene promoters has been reported, including in those of insulin [58], c-MYC [54, 59], VEGF [60, 61], HIF-1α [62], BCL-2 [63,64,65], MtCK [66], K-RAS [67, 68], c-KIT [69, 70], RET [71], PDGF-A [72], c-MYB [73], hTERT [74], and PDGF-Rβ [75, 76], in addition to mouse α7 integrin [77]. The potential occurrence of DNA G-quadruplexes has been discovered in the promoter regions of human genes involved in growth and proliferation [53, 78, 79]; these genes all contain G-rich/C-rich tracts in the proximal regions of promoters and are mostly TATA-less. In addition, the potential for quadruplex formation is higher within oncogenes as compared to tumor suppressor genes [80]. Computational analyses showed significant enrichment of G-quadruplex-forming sequences in the promoter regions of human genes near transcription start sites (TSS) [81]. The driving force of the formation of promoter G-quadruplexes appears to be the transcription-induced dynamic negative superhelicity [82,83,84,85]. The c-MYC gene promoter is the most extensively studied system for the promoter G-quadruplex [54, 86]. A highly conserved G-rich nuclease hypersensitivity element III1 in the proximal region of the c-MYC promoter controls 80–90% of the transcriptional activity regardless of whether the P1 or P2 promoter is used [87,88,89,90]. This element in the c-MYC promoter is highly dynamic in its conformation [91], and can form G-quadruplex structures, which function as a transcriptional silencer [54, 59].

In contrast to the repeating tandems in the telomeric sequence, the promoter G-quadruplex-forming sequences are each unique in their number and length of G-tracts and intervening bases. The promoter G-rich sequences often contain more than four G-tracts with unequal numbers of guanines and can form multiple G-quadruplexes through utilizing varying combinations of G-tracts or different loop isomers through utilizing varying guanines on one G-tract [18]. Parallel structures are common to the promoter G-quadruplexes, usually with a three-tetrad core. Structural studies showed that each promoter G-quadruplex adopts unique capping and loop structures determined by its specific sequence, such as c-MYC [92,93,94,95], BCL-2 [63, 65, 96, 97], KRAS [98], c-KIT [99,100,101], VEGF [102], and PDGFR-β [103, 104]. A notable feature in the promoter G-quadruplexes is the prevalence of the G3NG3 motif, a robust parallel-stranded structural motif with a 1-nt propeller loop. This motif was first observed in the major G-quadruplex structure formed in the c-MYC promoter, which showed that the 1-nt propeller loop conformation is highly favored [94]. By having two such motifs, parallel promoter G-quadruplexes can have a long and variable middle loop [65, 97, 105]. In addition, parallel G-quadruplexes exist in variant forms, such as with broken-strand [99, 103], end-insertion [104], or even with an additional hairpin loop conformations [65, 74]. Furthermore, certain promoter sequences can form multiple G-quadruplexes on one overlapping region or on separate regions. For example, the BCL-2 proximal promoter contains two G-quadruplex-forming regions that are separated by 13 nt (Pu39 and P1G4), with two competing G4s, i.e., a hybrid structure [63, 96] and a parallel structure [97], formed in Pu39, and two equilibrating parallel G4s formed in P1G4 [65]. Similar phenomenon was observed in the promoters of KRAS [67, 68, 106,107,108], c-KIT [69, 70, 99,100,101], PDGFR-β [75, 76, 103, 104], and hTERT [74, 109,110,111]. The variations in promoter G-quadruplexes give rise to different overall structure properties that could be specifically recognized by proteins or small-molecule ligands for transcriptional regulation. Moreover, inherent polymorphism and equilibrium between different conformations may provide an additional layer of transcriptional modulation.

2.3 G-Quadruplexes in Other Regions of Genome and in RNA

G-quadruplexes have been found in other regions of the human genome, such as immunoglobulin class switch regions [112,113,114], ribosomal DNA [115], mitochondrial DNA [116,117,118,119], replication initiation regions [120], the LINE-1 retrotransposon [121,122,123], DNA:RNA hybrid-G-quadruplexes in transcription [124], as well as in the extended repeat sequences in neurodegenerative diseases at both DNA and RNA levels, such as the (CGG)n repeat in the 5′-UTR of the FMR1 gene in the Fragile X syndrome (FXS) [125,126,127] and the hexanucleotide repeat expansion (HRE) (GGGGCC)n in C9orf72 of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) [128]. In addition, G-quadruplexes have been found to form in RNA. G-quadruplexes formed in 5′-UTR have been shown to inhibit translation [129, 130], such as NRAS [131], Zic-1 [132], TRF2 [133], Yin Yang 1 [134], or in internal ribosomal entry sites (IRESs) to initiate cap-independent translation, such as VEGF [135]. In addition, G-quadruplexes are found in 3′-UTRs [136,137,138] as well as in RNA introns to regulate the alternative splicing, such as TP53 [139] and Bcl-XL [140].

Notably, DNA G-quadruplex structures are recently shown to be involved in genomic instability and DNA damage [141,142,143].

2.4 G-Quadruplexes in Nonhuman Genomes

G-quadruplexes have been identified in nonhuman genomes. As in the human genome, these G-quadruplexes predominantly occur in regions of regulatory importance. Particularly, G-quadruplexes are found in genomes of human pathogens [144]. Many examples of G-quadruplexes were identified in viruses [145, 146], including human immunodeficiency virus (HIV) [147,148,149,150], multiple species of herpes virus [151,152,153,154,155], human papillomavirus (HPV) [156], hepatitis C [157], Zika [158], Ebola [159], and a G-quadruplex-binding protein was found in severe acute respiratory syndrome (SARS) coronavirus [160]. In bacteria, G-quadruplexes are found in Escherichia coli [161], Neisseria gonorrhoeae [162], Neisseria meningitidis [163], Mycobacterium tuberculosis [164], and Deinococcus radiodurans [165]. G-quadruplexes were also found in ciliates [14], malaria parasites [166, 167], and yeasts [168, 169]. Notably, helicases that resolve G-quadruplex structures, such as RecQ [170, 171] and Pif1 [142] families, were found in both nonhuman and human systems. Most recently, the presence of G-quadruplexes in plant genomes has also emerged [172].

3 G-Quadruplex Detection In Vivo

There has been significant progress in the detection of G-quadruplexes structures in vivo [173]. The first direct evidence of the in vivo existence of G-quadruplexes was established by using G-quadruplex-specific single chain variable fragment (scFv) of an antibody to detect G-quadruplexes formed at telomeres in macronuclei of the ciliate Stylonychia lemnae, which was shown to be cell cycle-dependent [174] and controlled by telomere end-binding proteins TEBPα and TEBPβ phosphorylation [175]. More recently, using a G-quadruplex-specific antibodies BG4 (scFv) [176] and 1H6 (monoclonal antibody) [177], G-quadruplex structures were visualized in human cells at both telomeric and non-telomeric sites on chromosomes, and G4-loci number increased after exposure of live cells to G-quadruplex ligands [176] or in the absence of FANCJ, a G-quadruplex DNA-specific helicase [177]. Using BG4 to map endogenous G-quadruplex structures by G4 ChIP-seq in human cells, ~10,000 endogenous G-quadruplex structures were detected in immortalized precancerous HaCaT cells, 10 times higher than in normal human NHEK cells [178]. G-quadruplex structures were found to be enriched in nucleosome-depleted regulatory regions including the promoters, such as c-MYC, and 5′ UTRs, of highly transcribed genes. The detected G-quadruplexes in cells account for less than 1% of the genomic G4-sites identified by G4-seq [179] or predicted by G4 algorithms [81], suggesting the in vivo formation of G-quadruplex is highly context-dependent. In addition, the endogenous potential G4 sites were detected in live human cells by chemical footprinting combined with high-throughput sequencing and were found to enrich in regions involving chromatin reorganization and gene transcription [180]. G-quadruplex formation in vivo was also detected by small molecules probes, such as the radiolabeled G-quadruplex-ligands 360A [181], and fluorescent G-quadruplex-ligands BMVC [182, 183] and DAOTA-M2 [184].

4 G-Quadruplex-Interactive Small Molecules

Recognition of the biological significance of G-quadruplexes has promoted research and development of G-quadruplex-interactive small molecule ligands (G4-ligands). The identification of genomic G-quadruplex structures in regions of functional importance, such as human telomeres and oncogene promoters, has created the opportunity to selectively target these globular DNA structures for cancer-specific drug development [17, 185,186,187,188,189,190]. The therapeutic possibilities of targeting telomeric G-quadruplexes to inhibit telomerase were first reported in 1997 [191] and have been actively pursued [185,186,187,188]. G-quadruplex-ligands were also shown to inhibit the alternative lengthening of telomeres (ALT) pathway which maintains telomere stability in a telomerase-independent manner in ~15% of cancer cells [192,193,194,195]. The discovery of the perylene derivative PIPER to inhibit helicase Sgs1-mediated G-quadruplex unfolding suggested the existence of a broader mechanism for G-quadruplex-ligands [196]. In 2002, a small molecule that stabilizes the G-quadruplex formed in the c-MYC promoter was shown to inhibit c-MYC expression, suggesting therapeutic opportunity of targeting promoter G-quadruplexes for transcriptional modulation [54, 197]. Different groups of compounds, such as quindolines and ellipticines, were reported to suppress c-MYC transcription by stabilization of the c-MYC promoter G-quadruplex [198,199,200,201]. Subsequently, transcriptional repression of other oncogenes was shown by compounds stabilizing promoter G-quadruplexes, such as c-KIT [202], BCL-2 [203], KRAS [204,205,206]. More recently, G-quadruplex-stabilizing compounds were shown to cause DNA damage and genomic instability and exhibit synergistic effect with inhibitors or deficiency of DNA-repair mechanisms [207,208,209,210,211]. Specifically, G-quadruplex-stabilizing compounds were shown to induce selective lethality in BRCA-deficient cancers by targeting the inherent DNA double-strand break (DSB) repair deficiency [212, 213].

A G-quadruplex targeting drug, Quarfloxin (CX-3543) [115], based on the fluoroquinolone compounds developed by Laurence Hurley [214, 215] showed excellent in vivo activity in various solid tumors and had reached Phase II clinical trials. Its second-generation compound, CX-5461, is currently in clinical trials for BRCA1/2 deficient tumors (Canadian trial, NCT02719977) [213]. Diverse families of other small molecule compounds that interact with G-quadruplexes were developed and studied. For example, TMPyP4, a tetra-(N-methyl-4-pyridyl)-porphyrin, is a structure-based designed compound that exhibits significant selectivity for quadruplex DNA over duplex DNA and inhibits telomerase and ALT [216, 217]. Its positional isomer TMPyP2 is a poor G-quadruplex-interactive compound and can be used as a negative control of TMPyP4 [218]. Later studies revealed that TMPyP4 interacts with the c-MYC promoter G-quadruplex and downregulates c-MYC [54, 197]. TMPyP4 and TMPyP2 have been one of the most widely used molecules in G-quadruplex research. Telomestatin is a natural product isolated from Streptomyces anulatus 3533-SV4 to be a highly potent inhibitor of telomerase [219] and active against human cancers by stabilizing telomeric G-quadruplex and inhibit telomere-protein binding [220,221,222,223]. BRACO19 is a rationally designed trisubstituted acridine to directly target telomeric G-quadruplex [224] and was shown to inhibit telomerase and induce telomere uncapping in human cancer cells [195] and have high in vivo activity against different cancer xenograft models [225, 226]. 12459 is a triazine G4-ligand that exhibits anti-telomerase activity but also appears to involve BCL-2 and hTERT splicing [192, 227, 228]. Closely related pyridine dicarboxamide derivatives 360A/307A and bisquinolinium compounds Phen-DC(3)/Phen-DC(6) are highly selective G-quadruplex ligands and were shown to be active against both telomeric and c-MYC G-quadruplex and inhibit c-MYC gene transcription in tumor cells, as well as bind to a G-quadruplex formed in the 5′-UTR of TRF2 mRNA to repress translation [133, 229,230,231,232]. Phen-DC was also shown to trigger genetic instability in Saccharomyces cerevisiae [207]. PDS is a pyridostatin compound which shows potent G-quadruplex-stabilization and has been widely used to study G-quadruplex functions and G-quadruplex-induced DNA damage [176, 210,211,212, 233]. In addition, G-quadruplex DNA was also shown to be potential cancer therapeutics. AS1411 (Antisoma, London, UK) is an unmodified 26-nt G-quadruplex forming oligonucleotide that has been in Phase II trials for treatment of renal cancer and acute myeloid leukemia [234].

G-quadruplex-interactive compounds have contributed immensely to understanding G-quadruplex functions and potential as a therapeutic target. Different G-quadruplex-ligands show various levels of selectivity, between G-quadruplex structures over other forms of DNA and between different G-quadruplexes, and this selectivity is likely to be related to their biological activity. Conventional and in silico screening methods as well as structure-based rational drug design were actively pursued in the development of G-quadruplex-targeting small molecules. A common feature among the G-quadruplex-ligands is the presence of a fused ring system that is capable of stacking with the terminal G-tetrads. In addition, a crescent-shaped asymmetric pharmacophore that can recruit a DNA base and cationic side chain substituents that have the propensity to interact with G-quadruplex grooves can give rise to specific interactions (Fig. 1d) [235, 236]. Structural data of G-quadruplex-ligand complexes has been playing an important role in the understanding of small molecule recognition of G-quadruplexes and the design of G-quadruplex-ligands [18, 237]. This includes a handful NMR solution structures of intramolecular G-quadruplex-ligand complexes, including c-MYC G-quadruplex-ligand complexes [235, 238,239,240] and telomeric G-quadruplex-ligand complexes [236, 241,242,243], and X-ray crystallographic structures of intramolecular and intermolecular telomeric G-quadruplex-ligand complexes [43, 237, 244,245,246,247,249,249,250].

5 Methods to Study G-Quadruplexes

A wide variety of experimental tools and methods have been utilized or developed for studying G-quadruplex DNA and RNA. These methods play a pivotal role in enabling researchers to gain an understanding of G-quadruplex structures, properties, and functions. The methods commonly used for studying G-quadruplexes include biophysical, biochemical, molecular biology, and cellular methods, as described in this book.

Biophysical methods are widely used to study physical properties of G-quadruplex such as structures, stability, and binding interactions with ligands and proteins. Circular dichroism (CD) is widely used to study G-quadruplex conformations and stability. Isothermal titration calorimetry (ITC) can directly measure binding enthalpies and provide thermodynamic characterization of G-quadruplex-ligand interactions. Biosensor-surface plasmon resonance (SPR) is a quantitative approach for the study of small molecule and protein ligand-quadruplex nucleic acid interactions in real time. Analytical ultracentrifugation (AUC) method can be used to characterize G-quadruplex formation and to monitor ligand binding. Mass spectroscopy can also be used to characterize G-quadruplex structures and ligand binding. Differential scanning calorimetry (DSC) can be used to obtain thermodynamic and sometimes kinetic parameters of G-quadruplexes. X-ray crystallography and solution NMR spectroscopy provide structural information of G-quadruplexes and ligand complexes, while molecular dynamics simulation can also be used to study G-quadruplex structures and small molecule binding.

Biochemical and molecular biology methods are used to study G-quadruplex formation, functions, and protein interactions. Electrophoretic mobility shift assay (EMSA), dimethyl sulfate (DMS) footprinting, and DNA polymerase stop (Pol-stop) assay are widely used to study G-quadruplex formation, protein complexes, and ligand interactions. Chromatin immunoprecipitation (ChIP) assays are used to probe protein interactions with G-quadruplex-forming DNA sequences. A combination of biochemical and biophysical methods can be used to monitor co-transcriptional formation of G-quadruplexes (transcription assay) and to quantitatively analyze the effects of G-quadruplex formation on DNA replication (replication assay). Single-molecule methods such as optical and magnetic tweezers, atomic-force microscopy (AFM), and single-molecule fluorescence resonance energy transfer (FRET) microscopy can be used to investigate G-quadruplex conformations, ligand interactions, and protein interactions. In addition, methods are used to discover and develop G-quadruplex-targeting molecules, such as FRET-based high-throughput screening of small molecule ligands, and peptide nucleic acid (PNA) oligomers that are designed to bind to G-quadruplexes. G-quadruplexes are also used in nanoparticle-based assays, and as biocatalysts such as G-quadruplex DNAzymes.

More recent and exciting developments include in-cell methods to study the G-quadruplex formation in vivo, such as in vivo chemical footprinting, G-quadruplex detection and visualization, and in-cell NMR. Chemical probing for G-quadruplex formation inside living cells combined with high-throughput sequencing can provide a snapshot of the DNA conformation over the whole genome in vivo. G4-specific antibodies and fluorescence probes are used to detect and visualize G-quadruplexes in cells. NMR spectroscopy is used to study G-quadruplex structures inside living Xenopus laevis oocytes, while 19F NMR can be used to study G-quadruplex conformation in vitro and in living cells.

In conclusion, it is our hope that the protocols described herein will be found both informative and useful.