1 Introduction

Guanine has hydrogen-bond donors and acceptors present on both the Watson–Crick and Hoogsteen faces. Four guanines can readily self-associate together via Hoogsteen hydrogen bonding in a co-planar array to form structures termed as G-quartets1. A total of 8 hydrogen bonds stabilise a quartet (Fig. 1). G-quartets, in the presence of cations, can stack on top of one another to make a four-stranded G-quadruplex (G4) DNA or RNA structure. All G4 structures comprise two distinct structural features, namely a stem formed from a set of G-quartets stacked together and unpaired bases which link the quartets and form the loops. If the two strands are antiparallel, then the loop between them is positioned above the plane of the stacked G-quartets. However, if the two strands are parallel, then the connecting loop is positioned within the groove formed between two sugar-phosphate backbones of the guanines.

Figure 1:
figure 1

A quartet formed by a co-planar association of four guanine residues.

G-quadruplexes are highly polymorphic in nature, with regard to its sequence, nature of cation stabilisation, strand orientation and the glycosidic conformation of the bases. They can be formed from a single strand (monomeric), two strands (dimeric) and four strands (tetrameric). When monomeric, at least four-tandem guanine repeats are needed with the general sequence: G2–5 L G2–5 L G2–5 L G2–5, where G represents guanine in the range of 2–5 nucleotides and L represents the intervening loop sequence. These sequences were first identified at the ends of eukaryotic chromosomes. In case of intermolecular dimeric quadruplexes, two strands come together having a consensus sequence of G2–5 L G2–5. Finally, a tetramolecular quadruplex is formed when four separate strands come together. The guanine tracts need not be of identical length and the quadruplexes can be formed within single-stranded or double-stranded DNA or RNA. Of these, the monomeric intramolecular quadruplex is the most important form in terms of occurrence and biological importance.

The human genome contains between 350,000 and 700,0002,3,4 distinct putative quadruplex forming sequences, of which the telomeric and those present in the oncogenic promoter sequences have been extensively studied. However, experimental trapping of actual quadruplex structures in human chromatin has found a much lower prevalence, of ca 10,0005. The functional relevance of quadruplex formation at different sites has been supported by the isolation of proteins that bind and promote the formation of quadruplex structures, such as transcription factors, nucleases, topoisomerases, resolvases and helicases. Unambiguous evidence to their existence in vivo has been demonstrated recently, employing monoclonal antibodies as probes6, 7. Currently, there are over 200 crystal and NMR-derived structures of quadruplexes present in the protein data bank. These G-quadruplex structures and their ligand complexes have been reviewed elsewhere in detail8,9,10,11,12,13,14,15,16,17,18.

Positively charged alkali metal ions are essentially required for the stabilisation of the quadruplexes19. The cations are coordinated to carbonyl oxygen (O6) atoms of the guanine, at the centre of the quartet, thereby forming a central ion channel within the quadruplex stem20,21,22. The ions are considered to be an integral part of all quadruplexes 19. The role of mono- and divalent cations in stabilising quadruplexes has already been reported19, 23. K+ is the strongest stabilising cation and binds in a square antiprismatic bipyramidal geometry20, 21. The order of stabilisation by cations is K+ > Rb+ > Na+ > Li+ = Cs+ > Sr+ > Ba2+ > Ca2+ > Mg2+19. Ions have also been implicated in determining the structure of quadruplexes and multiple cation coordination geometries have also been reported24. G-quartets with K+ ion coordination are larger than their Na+ counterparts, and their loops are more flexible25. The interconversion between antiparallel and parallel quadruplexes can be controlled by K+/Na+ balance, even though K+ favours formation of the antiparallel form26. Studies on thrombin-binding aptamer (TBA) have demonstrated that ions can influence structural conformation in addition to stabilisation27. Molecular simulations have been crucial in explaining the rapid exchange of cations between the quadruplex and the solvent. The spontaneous exchange prevents any destabilisation to the quadruplex structure27.

A wide range of small molecules have been reported that bind to quadruplexes and a large number of structure–activity relationships were derived (Fig. 2)13, 14, 28. The majority of these ligands share a common structural feature of a planar aromatic chromophore. The design is based on the hypothesis that large aromatic chromophores can enhance quadruplex stabilisation and, therefore, interrupt cellular processes such as transcription, translation and replication as well as DNA damage repair, in the absence of mechanisms to unwind them15. A major focus is thus to design small molecules with enhanced affinity and selectivity to bind to quadruplexes and inhibit progression of these cellular processes15.

Figure 2:
figure 2

Some ligands that have been studied using computational methods (1. TmPyP4, 2. RHPS4, 3. BRACO-19, 4. Telomestatin, 5. BMVC, 6. Berberine, 7. Phen-DC3, 8. Daunomycin, 9.MM41, 10. MMQ1).

BRACO-19 is a 3,6,9-trisubstituted acridine derivative, designed by Neidle et al. to stabilize the telomeric quadruplex DNA structure29. It is the first rationally designed telomerase inhibitor 30 that exhibited strong antitumor activity.31 MMQ1 quinacridine-based ligand was described as an anti-cancer agent in 2007 by Hounsou et al.32 The G-quadruplex stabilising properties of the MMQ1 series were confirmed by Hou et al.33 and Gabelica et al.34 MM41 is a tetra-substituted naphthalene-diimide derivative described first by Micco et al. 35 as a potent stabilizer of human telomeric and gene promoter DNA quadruplexes and can inhibit the growth of human cancer cells in vitro and in vivo. Recently, Ohnmacht et al. reported MM41’s significant in vivo anti-tumour activity against the MIA PaCa-2 pancreatic cancer xenograft model.36 TmPyP4 is a tetracationic porphyrin, which has been extensively studied by Hurley and co-workers.37 Investigation of this ligand gives interesting results describing high affinity for G-quadruplex binding and anti-telomerase activity, and downregulating the expression of the oncogenes on promoter region of c-myc.38 It has been demonstrated that TmPyP4 favours parallel-stranded G-quadruplex and can convert antiparallel to parallel forms of G-quadruplexes.39 However, this ligand does not bind selectively to G-quadruplex structures. 3,6-bis(1-methyl-4-vinylpyridinium)carbazole diiodide (BMVC) was one of the first selective ligand described as a G-quadruplex binder by Chang et al. 40 It is a diphenylcarbazole derivative, which stabilizes telomeric G-quadruplex and exhibited remarkable inhibition of telomerase (IC50 = 0.05 µM).41 Phen-DC3 is a phenanthroline analogue described by De Cian et al. 42 and has a perfect geometrical match with a G-quartet resulting in high affinity and selectivity to human telomeric G-quadruplex.43, 44 Daunomycin has a good affinity to B-DNA structure and has also been crystallised as a trimer with G-quadruplex.45 Isoquinoline alkaloids including Berberine, which exhibit anti-bacterial activity46, have also been shown to bind to G-quadruplex DNA and inhibit telomere elongation.47 One of the most interesting G-quadruplex ligands is telomestatin, a natural compound isolated from Streptomyces annulatus in 2001 by Shin-ya’s et al.48. Telomestatin is a ring-shaped polyheteroaromatic compound, which selectively binds to quadruplex without any affinity towards B-DNA. This compound showed significant activity as a telomerase inhibitor (IC50-TRAP = 5 nM). Telomestatin induced and stabilised G-quadruplex structure even in salt-deficient environment.49,50,51 RHPS4 is an N-methylated pentacyclic acridinium, which has shown preference for binding and stabilizing quadruplex isoforms over DNA duplexes.52 This ligand has shown to decrease telomere length and acts in synergy with the classical anti-cancer agent Taxol.53, 54

The ever-increasing use of structural methods for quadruplexes and their ligand complexes provides a strong foundation to carry out molecular modelling and computational approaches. These approaches have been established to predict quadruplex–ligand interactions and rationalisation of biological and biophysical data. This review focuses on the molecular modelling and computational methods to study G-quadruplex–ligand complexes.

2 Automated Molecular Docking

Automated molecular docking (AMD) is a computational method that is used to predict ligand binding to a target receptor 55. It is a widely used approach to screen chemical libraries to identify potential selective ligands that bind to quadruplexes.14 The results define the putative binding site and highlight atomistic interactions of the docked ligand to its receptor. This has been proven useful in understanding several biochemical processes. Employing such an approach has two-fold advantages. Firstly, it is a rapid method for ligand optimisation and secondly it is relatively cheap when compared with experimental approaches.56 As a result, AMD is the first step in the study of quadruplex–ligand interactions. Automated docking is carried out in two stages.57 In the first stage, sampling algorithms are used to predict the position and orientation of the ligand within a defined binding site. The second stage takes into account the chemical interactions between the ligand and the receptor and estimates the binding affinities of the ligand for the receptor. Ideally, the results from AMD should reproduce the experimentally determined binding modes and affinities among the highest generated conformations. Even if this is not the case, AMD has been invaluable in identifying new hit compounds for subsequent experimental validation and lead compound generation.

Virtual screening has become an integral component for hit identification and lead compound optimisation. It is primarily divided into two main categories, namely structure-based and ligand-based virtual screening. The structure-based approach employs a 3D structure of a target quadruplex, while in the ligand-based screening the structure activity data from a set of known active compounds are employed. A novel form of ‘dynamic docking’ has been developed by Neidle and co-workers58. They extracted multiple conformations of the quadruplex (from molecular dynamics simulations). Diverse conformations of the ligand were placed in a grid-like manner and in multiple orientations in the binding site. Multiple starting positions for subsequent MD simulations were generated, allowing large conformational space for both ligand and the quadruplex to be explored.

Interactions of cationic tetraoylporphyrin derivatives with antiparallel G-quadruplex (PDB id 143D) were studied by AMD using DOCK6.59 The first virtual screening of a library of 3000 FDA-approved drugs as cmyc quadruplex stabilising ligands was done using ICM software.60 The difference in selectivity of a group of naphthalene diimides for telomeric RNA vs DNA quadruplexes was explained by docking using AFFINITY program.61 Natural products like alkaloids have also been identified by screening a Chinese herbal database of 10,000 compounds with previously generated pharmacophores using Catalyst, Accelrys.62 A ligand-based screening approach was reported by Chen et al., which centred on a pharmacophore model generated from acridine derivatives.63 A synthetic substituted indole was identified by screening ~ 100,000 drug-like compounds.64 Fonsecin B, a natural napthopyrone compound, was identified by screening 20,000 compound natural product database.65 AMD, when used in conjunction with NMR experiments, provide in depth description of interactions with the quadruplex.66 Randazzo et al. screened a small but structurally diverse set of 6000 compound library against parallel [d(TG4T)4] using Autodock. Subsequent NMR screening and ITC measurements of the top hits identified six groove binding ligands that displayed strong quadruplex groove binding properties.67 A large-scale integrated in silico and in vitro screening platform has been developed by Chaires group and is being used to discover new quadruplex binding scaffolds.68

Despite significant advances in method development, a number of pit-falls still exist. The most important being the scoring function to rank various binding poses is still inaccurate. Additionally, the polymorphic nature of quadruplex structures, resulting from different experimental methods and conditions, can lead to conflicting results. Therefore, the choice of the quadruplex for modelling and structure-based screening should be done with care. Another challenge is the selection of the most favourable pose of the quadruplex for AMD or virtual screening. This is due to the presence of the charged backbone, the presence of the stabilising cations and the conformational flexibility of the loops, all of which can influence the binding of the ligand to the quadruplex.

3 Molecular Dynamics Simulations

Molecular dynamics (MD) simulations, in conjunction with automated docking, have been the preferred method to study the dynamic quadruplex–ligand interactions.69,70,71 It has been used to supplement other experimental approaches. In MD simulations, the classical Newtonian equation of motion is solved. The force on a simple atomistic system is represented by F = ma. Covalently bonded parameters like bond length, bond angle, out-of-plane distortions, and torsional terms and non-bonded terms like van der Waals interactions, electrostatic interactions and hydrogen bonds describe contributions towards the force on any individual atom. The position of the atoms is a distribution of velocities and the acceleration is determined by the gradient of the potential energy function. The position vectors are then integrated and their course is traced as a function of time, resulting in a trajectory. The determination of a suitable potential function is extremely important for an MD simulation.69 The interatomic behaviour of atoms is represented by Lennard–Jones potentials, which describe the attractive and repulsive forces on atoms. The systems are immersed in an explicit solvent, contained in a box that satisfies periodic boundary conditions. MD simulations can be carried out in various ensembles, e.g. canonical (NVT) or isobaric–isothermal (NPT) where the number of atoms, volume, pressure and temperature are kept constant during the simulation.

4 Current Force Fields

In molecular mechanics and dynamics, force fields are a set of parameters that describe the molecular behaviour of the system. These parameters can be derived using quantum–mechanical, empirical or experimental methods. All atom contributions like Columbic, polarisation, dispersion and repulsive energies are used to calculate the total energy of a molecule.72 While there are multiple force fields available, very few are available to accurately simulate quadruplexes. Orozco et al. presented the parmbsc0 force field.73 This was a refinement of the AMBER parm99 force field, which focussed on the α/γ torsional terms to resolve various inaccuracies of previous parm94/94 and parm99 force fields.74 They identified very large α/γ transitions to the gauche+, trans geometry resulting in severe distorted DNA geometry in 50 ns MD trajectories.75 The new parmbsc0 force field improved the representation of the α/β conformational space, and can recognise and repair larger structural errors while still preserving the flexibility of duplex DNA. Parmbsc0 force field has been successfully used for telomeric quadruplex simulations,76 reproducing the overall features of the both parallel and antiparallel quadruplexes. A newer and more updated version of the parmbsc0 force field has been introduced. In parmbsc1 force field, modifications to the sugar puckers, the χ glyosidic torsion angles, and the ε/ζ torsion angles have been made.77 While the force field can adequately represent guanine stems, the agreement of loop conformations with that observed in experimentally determined structures has still not been achieved due to multiple force field related issues. The Czech group led by Jurecka introduced the OL15 force field.78 They have made similar improvements to the parmbsc0 force field with refinements in the χ79, ε/ζ80 and β torsion angles.78 A subsequent report by Sponer group to study loop conformations in parallel-stranded quadruplexes highlighted the limitations of these force fields to accurately reproduce experimental structure in all these force field versions.81 However, it is worth mentioning that modifications of the AMBER force field to bsc0, bsc1 and OL15 were developed to improve the accuracy of simulating double-stranded DNA.74 In spite of their limitations, these force fields have helped in improving the reliability to simulate quadruplexes. Their robustness will be further tested when extended simulations are run, which are closer to biophysical and biological timescales.

The parameters for ligands in the ligand–quadruplex complex can be assigned using the Generalised AMBER force field (GAFF1/2).82 These force fields can be combined with the bsc1 or OL15 force fields. Several long timescale simulations of ligand–quadruplex complexes have been reported recently. The results from MD simulations corroborated with the experimental data implying bsc0, bsc1 or OL15 force fields are suitable for MD simulations of ligand–quadruplex complexes along with GAFF1/2 method of ligand parameterisation.83 It is worth mentioning that free energy calculations should be avoided using ligand force fields. The ligand–quadruplex complexes can be best used to interpret experimental data or make rationalised predictions.

5 Long-Range Electrostatic Interactions

In MD simulations, the impact of all contributing factors on energies has to be taken into account. However, it would be ineffective to sum all the non-bonded interactions in an MD simulation setup. Therefore, spherical cutoff distances have been employed to ignore long-range electrostatic interactions beyond a specified distance, without affecting the quality of the simulations.84 Long-range electrostatics calculations have a significant impact on the stability of the charged quadruplex backbone, cation–quadruplex interactions and the interactions of a charged ligand with the quadruplex. When inaccurately treated, the trajectories of these simulations are unstable; cations are dislodged from the quadruplex stem, resulting in the collapse of the structure.69 The introduction of the particle-mesh Ewald (PME) truncation method eliminates this fundamental problem and produces stable DNA simulations.85, 86 In spite of being more time consuming, the PME, with a cutoff of 10 Å, is the preferred method of choice to accurately simulate quadruplex–ligand complexes.70

6 Base stacking interactions and backbone descriptions

The force field parameters also provide descriptions for inter-base interactions. Since there is no quantum mechanical operator or direct experiment to determine atomic charges on the bases, arbitrary charges are assigned by fitting molecular electronic potential charges.87 These charges are essential to reproduce the electrostatic potential around the molecules.88 The amino groups of nucleic acid bases are non-planar due to their partial sp3 pyramidal hybridisation.87 This can result in the stabilisation of bifurcated H-bonds, close amino group contacts, non-planar G/A base pairs and some other specific interactions.87, 89, 90 The force fields assume purely sp2 amino hybridisation for ring nitrogen atoms. This should be sufficient for most interactions, as primary H-bonds stabilize the sp2 electronic structure. However, the current force fields do not support out-of-plane H-bonds or amino acceptor interactions. As a result, the force fields are unable to accurately reproduce bifurcated H-bonds or amino group contacts.

Backbone descriptors are very important in simulating nucleic acids for two main reasons. (a) The backbone is highly flexible and negatively charged and (b) the charge on the backbone changes with the conformation and solvation dynamics. This is not handled by the non-polarizable atom–atom pair-additive force field. The current force fields like parmbsc073, parmbsc177 and OL1578, 80 have introduced several correction terms for the DNA backbone profile, which have been able to keep the fluctuations within acceptable limits over the entire course of nano- or microscale simulations.

7 Automated Molecular Docking and Dynamic Simulations of Quadruplex–Ligand Complexes: Case Studies

The impetus for automated molecular docking on quadruplex targets first came from the BRACO series of compounds.29 This series was the first to be rationally designed as quadruplex DNA stabilisers and telomerase inhibitors.29 Their selectivity and potent telomerase inhibitory activity were subsequently demonstrated experimentally. The NMR structure of the intramolecular human telomeric DNA quadruplex (PDB id 143D) in sodium solution was used to generate a pseudo-ligand-binding site between the T2A loop and the top G-quartet. The ligands were manually docked in the generated binding site. The simulations of quadruplex-BRACO-19 complex were then compared to four other ligands, which differed in their functional groups on the three side-chain arms. Later, the crystal structure of quadruplex-BRACO-19 complex91 validated these predictions and the subsequent MD simulation that followed detailed the stacking interactions between BRACO-19 and the terminal G-quartet.33, 71, 83 Molecular structures of other quadruplex–ligand complexes are present and have been simulated including those with daunomycin (PDB id 1L1H),45 RHPS4 (PDB id: 1NZM),92 MMQ1 (PDB id: 2JWQ),32 and TMPyP4 (PDB id: 2A5R).93 Short MD simulations (of up to 200 ns) are sufficient to stabilise the ligand in the binding site and explore the interactions it makes with the quadruplex.61, 70, 71 In order to obtain meaningful binding affinity calculations to study ligand selectivity, it is essential to fully equilibrate the system. Long timescale MD simulations of quadruplexes have shown that quadruplexes are transitionally stabilised on a shorter timescales and are far from equilibrium states.76, 81 A fully converged system requires extended sampling of the quadruplex–ligand systems.76, 81 A good example is the MD simulation used to study the interactions of tetra-substituted naphthalene diimide compound MM41 and quadruplexes formed in the gene promoter sequences, employing parmbsc0 force field supplemented with χOL4 modifications.36 The results from the MD simulations were able to demonstrate the favourable binding mode of MM41 to the BCL-2 quadruplexes. The interactions that MM41 made were analogous to that made by the ligand with human telomeric quadruplex.94 This demonstrated that MD simulations could provide useful insights on overall shape, size and nature of ligand skeletons. The substituents can then be optimised and then fine tuned to increase ligand selectivity.

The binding of TMPyP4 to different human telomeric quadruplex topologies has been reported.95 Automated blind docking was employed to identify the most potent binding sites and MD simulation was performed using parmbsc0 force field. The results showed that the systems were stable over 50 ns simulation time and that TMPyP4 binds most efficiently with a parallel propeller quadruplex topology due to the pronounced effect of the favourable ππ stacking interaction of the G-quartets with the core aromatic moieties of the ligand.

MD simulations evaluating the interactions of Telomestatin with human Tel-22 quadruplex have also been reported, using ff99 force field in K+ solution.96 Telomestatin was manually docked over the terminal G-quartet, on both 5′ and 3′ ends of the quadruplex core.97, 98 A short simulation of just 5 ns highlighted the instability of K+ ions within the quadruplex core.96 This indicated that K+ ions have a significant impact on stability via interactions between telomestatin and several conformations of telomeric quadruplexes. Subsequently, Mulholland and Wu employed the GAFF and parmbsc0 force fields to systematically explore the binding of telomestatin to human telomeric DNA.99

More recently, combined experimental and computational methods100 have been used to investigate the interactions between the isoquinoline alkaloid ligands and a 3 + 1 hybrid-type human telomeric quadruplex structure (PDB id 2MB3).47 In this study, Noureini et al. docked the ligands using the ICM-Pro MolSoft molecular modelling suite. MD simulations were then performed on the quadruplex–ligand complexes selected on the basis of their interaction energy ranking. The results from the simulations indicated that the isoquinolines are stacked on the terminal quartet with the aromatic chromophores maximising the ππ stacking interactions. The nitrogen atom in the ligand molecules prefer to position on top of the central electronegative channel that runs along the central axis of the quadruplex. This behaviour was observed in all the five isoquinolines ligands, indicating that it might be a physical reality.

Distamycin-A, a classic duplex DNA groove binding compound, binds to a tetra-molecular parallel DNA quadruplex (PDB id: 1S45) in the grooves formed between the sugar-phosphate backbone.101,102,103 Different models have been proposed for Distamycin-A binding. It can either (a) bind in the groove or (b) stack on a terminal G-quartet or (c) possibly adopting a mixed groove/G-quartet binding mode.104 The specificity, affinity and binding modes of potential quadruplex groove binders’ ligands have been explored by virtual screening docking calculations.66 The docking results were in sync with the NMR experimental screening and suggested that appropriate groove binders can interact with high affinity to at least this quadruplex.67

In addition to DNA quadruplexes, RNA (TERRA) sequences that exhibit quadruplex forming ability have also been explored for ligand binding using computational methods.61 A comparison of BRACO-19 and naphthalene diimide interactions between DNA and RNA human telomeric quadruplexes has been systematically made by molecular modelling.61 The 22-mer DNA parallel-stranded human telomeric quadruplex was used (PDB id: 1KF1) in this study and an RNA 22-mer telomeric quadruplex was generated from the DNA structure by direct addition of 2′-OH groups. The differences between DNA and RNA quadruplexes revealed that RNA complexes are more stable and rigid than their DNA counterparts. The stability is due to the interactions made the by 2′-OH groups with the O5′ groups and water molecules in the grooves, making the nucleotide backbone more rigid. Additionally, the presence of 2′-OH group in the RNA quadruplex constricts the space that is available to the ligand side chains to interact with the loops, thereby reducing the depth and the width of the UUA loops. There was just one naphthalene diimide ligand that forms a strong complex with a telomeric RNA quadruplex. MD simulations showed that the substitution of one side-chain group (–NMe2 for an –OH) increased RNA quadruplex–ligand affinity by 15-fold. The study suggested that the RNA telomeric quadruplex is less likely to bind ligands with side chains terminating in bulky and/or inflexible functional groups, as are present in the two ligands with reduced affinity to the RNA quadruplex. These differences highlighted that it is possible to rationally design small molecule ligands that have the ability to discriminate between RNA and DNA quadruplexes.61

8 Enhanced Sampling Methods

8.1 Simulated Annealing

Simulated annealing (SA) is a local search algorithm for solving unconstrained and bound-constrained optimisation problems.105 This method carries out random optimisation based on a Monte Carlo iterative strategy. The name of the algorithm is derived from the physical process of heating and then slowly lowers the temperature to decrease the defects. During the changes of temperature, an SA algorithm searches randomly for the solutions using the metropolis criterion. The process is repetitive while the temperature descends gradually until the global optimisation solution is found. Due to the effectiveness of non-linear combinatorial optimisation methods, simulated annealing algorithms are used in many application fields.106

Simulated annealing has been used to determine the optimum ligand orientation and key molecular interactions107 of telomestatin with a human telomeric intramolecular antiparallel topology quadruplex. The results revealed that two bound telomestatin ligands are most favourable and it can bind via loops or terminal G-quartets to the quadruplex. Furthermore, the quadruplex–telomestatin complex undergoes a conformational rearrangement that leads to significant changes in the relative position, orientation and potential energy of both the telomestatin and the quadruplex.97

A simulated annealing docking approach has been used to find the most probable conformation of a novel triazatruxene derivative azatrux binding to the parallel form of human telomeric quadruplex.108 The method identified ten possible conformations of the ligand bound on the terminal G-quartet face, following which the orientations were evaluated to identify the most appropriate conformation. This protocol was repeated until a stable conformation with optimal G-quartet stacking was identified.108 A simulated annealing approach has also been used to find reasonable positions for the research compound GQR bound to a quadruplex.109

8.2 Principal Component Analysis

The principal component analysis (PCA) method is a statistical method, which is used in data reduction. It transforms a number of possibly correlated variables into a smaller number of uncorrelated variables termed principal components. In a simulated system, not all motions are important. PCA separates large amplitude motions from irrelevant fluctuations. The translation and rotation of the structure through the trajectory are removed by changes in the simulated structure, which is translated to the geometrical centre of the molecule by least-square fit superimposition onto a reference structure.110 The appropriate configurational space is then constructed using a simple linear transformation in Cartesian coordinate space to generate a 3 N × 3 N covariance matrix. The matrices are summed and averaged over the entire trajectory. The resulting matrix is then diagonalised, generating a set of eigenvectors that gives a vectoral description of each component of the motion by indicating the direction of the motion. Each eigenvector describing the motion has a corresponding eigenvalue that represents the energetic contribution of that particular component to the motion. The eigenvalue is the average square displacement of the structure in the direction of the eigenvector. Projection of a trajectory on a particular eigenvector highlights the time-dependent motions that the component performs in the particular vibrational mode. The time average of the projections shows the contribution of components of the atomic vibrations to this mode of concerted motions.111 The eigenvalues are placed in descending order where the first eigenvectors and eigenvalue describe the largest internal motion of the structure. The eigenvalues decline sharply, highlighting the possibility of separating the dynamics into a small essential space and a relatively large space containing only small atomic fluctuations. On average, only about 5% of eigenvectors are necessary to describe 90% of the total dynamics. Although PCA is a convenient method to visualise trajectories, its limitations should also be taken into consideration when interpreting results.112 PCA is most suited to analyse trajectories of system that undergo transitional changes instead of trajectories that highlight thermal fluctuations to flexible molecules.112

PCA has been used to study the interactions of dimeric quadurplex formed by the sequence d(GGGTGGGTGGGTGGGT)113 with planar perylene derivative Tel03. The analysis revealed that the flexibility of the loops was reduced by the ligand, leading to more stable Tel03-quadruplex complex. A general conclusion is that using PCA reveals that the dominant motions in the free dimer occur mostly in the loop regions and that the presence of ligand reduces loop motion.114

8.3 Free Energy Calculations

Accurately calculating free energy of a ligand binding to a quadruplex remains a challenge. There are several methods available, some of which have been used to study quadruplex–ligand interactions.

9 Implicit Solvent Methods (MM/PBSA)

A continuum solvent method has been used to assess the structure and free energies from an MD simulation. The electrostatic contribution to the solvation energies is calculated by Poisson–Boltzmann methods.115 This method combines molecular mechanics, implicit solvation and a hydrophobic contribution, which is represented by a solvent accessible surface area term for energy calculations. The PB model is used to calculate polar contribution while the SASA solvent accessible surface area is used to estimate the non-polar energy component. The molecular mechanics Poisson–Boltzmann surface area (MM/PBSA) relies on calculating the difference in free energy of two states, namely bound (complex) and unbound (ligand and quadruplex).115 This approach extracts estimates of the free energies from the MD trajectories based on the averages of molecular mechanical energy of the solute with an estimate of solvation energies from a PB continuum model. This method has been used to calculate the free energies of quadruplex–ligand complexes.116

Li et al. used MM/PBSA method to study the free energy of binding between Tel03 ligand and an intramolecular parallel-stranded quadruplex.113 The K+ ions in the central channel of the quadruplex are treated as an integral part of the quadruplex structure and included in all calculations. The results suggested that Tel03 binds efficiently via end stacking and the interactions formed with the 3′ terminal thymine was suggested to be the most favourable binding mode.113

MM/PBSA calculations of the binding free energy of tri- and tetra-substituted naphthalene diimides are in excellent accord with the experimental results.117 They have shown to have greater binding affinity than BRACO-19 for a human telomeric quadruplex.117 MM-PBSA has also been used to compare the energies for various topologies of the human telomeric quadruplex.118 However, variations in energy within a simulation have been a major limitation hindering one from making any more general conclusions from such a study.118

10 Thermodynamic Integration

In thermodynamic integration (TI), when the two discrete states (e.g. initial state A and final state B) with potential energy UA and UB are dependent on a different way from the spatial coordinates, then one can compare the difference in free energy between them. The paths between the two states are defined as a thermodynamic cycle, which is used to estimate the energy difference (Fig. 3). Even though TI is computationally expensive, it is commonly used to calculate binding energies and gives accurate results. The binding of TMPyP4 and Phen-DC3 to a human telomeric DNA quadruplex has been characterised using TI.119 The results revealed that the thermodynamics of ligand–quadruplex binding is complicated and requires a large number of adjustable parameters to fully define the complex.

Figure 3:
figure 3

A thermodynamics cycle.

11 Umbrella Sampling

Free energy along a reaction coordinate can be estimated using Umbrella sampling. The method applies a bias potential along a reaction coordinate, which drives the system from state A to B. Any intermediate steps need to treated as separate systems (bins) and a molecular dynamics simulation run for each of them. This method prefers sampling of many bins with shorter times than fewer bins with longer.120 The bias potential in the bins depends only on the reaction coordinates and its ability to connect energy minima gives the method its name of umbrella sampling. A weighted histogram analysis (WHAM) can then be used to estimate the free energy of the system.121

Umbrella sampling has also been effectively used with other modified molecular dynamics methods like Steered molecular dynamics (SMD). A force is applied along reaction coordinates to remove the ligand from its binding site, following which the initial coordinates for Umbrella sampling calculations were generated. These combined methods explained the molecular mechanism and kinetics of how BMVC ligand enters and exits the terminal G-quartet binding site and forms a stable ligand–quadruplex complex.122

12 Metadynamics

Ligand binding to quadruplexes is a long timescale event that occurs between micro- to milliseconds, which is extremely difficult to sample using standard molecular dynamics simulations. Metadynamics is an enhanced sampling method that explores long timescale events in a reasonable computational time.123 The sampling permits the calculations of the free energy landscape of the system. The sampling is accelerated by introducing a history-dependent bias potential on a few degrees of freedom of the system, called collective variables. Parinello et al. applied a modification of this method (funnel metadynamics) to study the binding of berberine to quadruplex and calculating the absolute binding free energy.124 It took 800 ns to fully explore the free energy landscape of berberine binding to quadruplex DNA and accurately computing the free energy estimate. A comparison of the binding free energy calculated from funnel metadynamics and that from experiments was within 1 kcal/mol.

In another work, Di Leva et al. studied the binding of 3-(benzo[d]thiazol-2-yl)-7-hydroxy-8-((4-(2-hydroxyethyl)piperazin-1-yl)methyl)-2H-chromen-2-one to parallel-stranded tetramolecular TG4T quadruplex.125 The authors propose a hopping binding mechanism, which explores the grooves and the 3′ terminal end of the quadruplex. The results are fully able to explain the experimental data at a relatively short timescale.

13 Conclusions

There are over 200 quadruplex structures present in the protein data bank. These structures are static time-averaged ensembles that need to be simulated before any meaningful information can be extracted from them. Molecular simulations are invaluable tools to study dynamics of systems such as interactions with solute, solvent and ions, exploring the free energy landscape to identify intermediate, metastable and lowest energy states that are almost inaccessible to study using any other experimental methods. There is no other method that can provide time-coursed dynamic details of interactions and, therefore, molecular dynamics simulations are the method of choice to study ligands in complex with quadruplexes. The potential of computational methods is realised at best, when they provide explanation and interpretation of experimental data and supplement it with information, which cannot be accessed by experimental methods.

14 A Note on Software

Several software is available to carry out automated molecular docking. The most commonly used and freely available are Autodock (http://autodock.scripps.edu), Autodock Vina (http://vina.scripps.edu) and Dock6 (http://dock.compbio.ucsf.edu). While Autodock comes with a graphical user interface (GUI), Dock6 files can be prepared using Chimera molecular visualisation tool (http://cgl.ucsf.edu/chimera). The compatibility of Chimera and Dock6 with ZINC database (http://zinc.docking.org) makes it an excellent streamlined suite for virtual screening. In addition to these, several proprietary softwares such as MolSoft-Pro (http://www.molsoft.com), Schrodinger suite (http://www.schrodinger.com) and Discovery studio (http://www.accelrys.com) have also been used for docking ligands to quadruplexes.

AMBER software (http://www.ambermd.org) has been the engine of choice to run molecular dynamics simulations. This might be due to (a) the internal implementation of the Barcelona (parmBSc) and its modified variant force fields used to parameterise quadruplex systems, (b) the ease of parameterising ligands using ANTECHAMBER code employing the GAFF/GAFF2 force fields and (c) the fast and efficient PMEMD code that runs on the graphical processing units (GPUs), having demonstrated that long timescale simulations provide meaningful results for quadruplex systems. Recently, latest force fields have been made available and can also be effectively ported for use in GROMACS (http://www.gromacs.org), DESMOND (http://www.schrodinger.com/desmond) and ACEMD (http://www.acellera.com) MD software. CHARMM force fields and molecular simulation program (http://www.charmm.org) have also been used to simulate quadruplexes. For scientific groups that do not have access to computing infrastructure, Acellera Inc. (http://www.acellera.com) are now providing cloud-based services to run molecular dynamics simulations.