Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Until 1976, the study of nucleic acid structure was exclusively the domain of fiber diffractionists. Between the original Watson-Crick structure in 1953 and this date, there was considerable activity in refining the original B-form model of DNA and extending the approach to other polymorphs and a number of synthetic, repetitious polynucleotides, all of which were based on data from fiber-diffraction samples. These studies reached their zenith with the development and use of a “linked-atom,” least-squares refinement procedure for the optimization of mono-or dinucleotide repeat units against the relatively sparse para-crystalline diffraction data from ordered fibers. It is a tribute to the sophistication of these analyses, in spite of the inherent limitations of fiber data, that the refined “canonical” A- and B-DNA double helices are still major reference points for many studies (1).

The overriding limitation of structural studies of fibrous polynucleotides is their inherent inability to address questions of structure and sequence at the individual nucleotide level, as opposed to the averaged mono- or dinucleotide repeating units defined and refined by the poly-nucleotide modeling procedures. This limitation is not relevant to single-crystal studies, which are unequivocally able to determine structural features at all points along a sequence, without having to define artificial averaged residues. Since 1976, there has been a progressive increase in the number of single-crystal analyses of short (<16 nucleotide residues in length) oligonucleotides, following the determination of the structures of dinucleotide monophosphate duplexes (APU)2 and (GPC)2 (2,3). The first real oligonucleotide determination was that of the Z-DNA structure in the d(CGCGCG)2 duplex (4,5). This has been followed by a large number of analyses of duplex helices, which have shown:

  1. 1.

    That the earlier fiber structure is correct in general terms;

  2. 2.

    Ways in which a number of drug molecules interact with DNA;

  3. 3.

    Details of a number of types of base mispairing; and

  4. 4.

    A wealth of information on detailed sequence-dependent and other features, such as hydration. Almost all of these are DNA oligonucleotide sequences, with relatively few RNA sequences.

Single-crystal studies have thus provided unequivocal confirmation of Watson-Crick base pairing and of the double-helix itself, with structures having been determined ab initio and without recourse to any assumptions from models. A number of these studies have revealed novel features of DNA structure and conformation. Much knowledge has now been obtained on sequence-dependent structural and conformational features; however, it is increasingly clear that several independent determinations (i.e., in differing crystal lattice and flanking sequence contexts) of a particular sequence run need to be obtained in order for these findings to be generalized into sequence-structure relationships with a high degree of confidence (6). As yet, only some of the possible di- or tetra-nucleotide combinations have been observed in sufficient sequence contexts (7,8). Under some circumstances, crystal-packing forces can induce local distortions. However, with care, it is possible to dissect out their effects and even to appreciate that one can obtain information on deformability from such regions (9). Early attempts (10) to formulate general rules governing sequence-dependent structural features were based on just a single structure (the Dickerson-Drew dodecamer d[CGCGAATTCGCG]2), and suffer from obvious limitations.

This chapter does not attempt to review the details of oligonucleotide structures. Instead the interested reader is referred to several recent reviews (1,7,11). Rather, we survey the scope, extent, and problems of oligonucleotide crystallization and structure determination. Details of unit cell dimensions, space groups, resolution of diffraction data, and so forth, have been extensively tabulated elsewhere (7) and are not duplicated here. Similarly, routine procedures common to other areas of macromolecular crystallization and crystallography are not covered. The refinement of oligonucleotide structures is discussed in detail elsewhere in this volume (Chapter 9).

2 Oligonucleotide Crystallization

2.1 Synthesis and Purification

The development of efficient large-scale methods for DNA oligonucleotide synthesis has been vital to the development of structural studies, both NMR and crystallography. X-ray crystallography requires relatively large quantities of material for crystallization trials. A minimum of 5–10 mg is typical. High purity is essential-impurities from the synthetic chemistry, which are irrelevant for molecular biological applications, are in general inimical to successful crystallization. Side reactions, such as base deamination, can occur, and their products are usually only separable from the required sequence by HPLC.

Synthetic methods involve three stages: the preparation of suitably protected nucleotide monomers of the common bases (and increasingly of modified ones as well), their coupling in the order defined by the sequence required, and finally the deprotection of the final oligomer. A variety of protecting groups have been developed, all of which ensure that the sensitive hydroxyl and other substituents on a nucleotide are not themselves reacted during coupling stages. Original development of these stages employed phosphotriester chemistry in solution (12); such solution-phase synthesis has the advantage of large-scale production being possible. However, a skilled chemist is required for this time-consuming approach, in contrast to automated machine synthesis. This latter method is now in common use in many laboratories; modern DNA synthesizers, with microprocessor control, provide very considerable ease and speed of use. They have been responsible for the current wide availability of synthetic oligonucleotides. Cyanoethyl phosphoramidite chemistry (13) is almost universally employed in synthesizers, with coupling times of 12–30 min/nucleotide and the monomers being purchased in fully-protected form, A single overnight synthetic run at the 10-µmol level can produce approx 10 mg of pure oligomer after purification. The machine approach can produce oligomers at this scale of up to 30 bp in length, whereas solution-phase coupling is very inefficient beyond about 10–12 bases. Recent advances in base-protection chemistry have led to significant decreases in the thus far slow deprotection step, reducing it from 8 h to 1, As yet, automated RNA synthesis has not quite reached an equivalent stage of reliability (14), hence the lack of RNA structures in spite of the obvious biological importance of having structural data on features, such as short stem loops. Longer RNA sequences, such as in ribozymes, are best obtained by in vitro transcription with T7 bacteriophage RNA polymerase (15,16).

Modified DNA oligomers can in principle be synthesized by machine, provided that the appropriate protected modified nucleoside is available. In practice, only a restricted range is commercially available, and therefore, others of interest may have to be synthesized and protected de novo. Several starting blocks with methylated bases are available, including O 6-methylguanine. Nucleosides that are base-halogenated with either bromine or iodine may be of special use as heavy-atom markers for isomorphous replacement or anomalous scattering phasing; they are available commercially in protected form. The machine chemistry for routine laboratory synthesis of some backbone modifications (methylphos-phonates and phosphothioates) is also available owing to their interest as antisense agents with nuclease resistance, although to date few structural studies have been reported on them.

Extensive purification of synthesized oligomers, to at least 95% purity, is essential in order to optimize the chances of obtaining high-quality crystals. Procedures used are usually reverse-phase, high-pressure liquid chromatography (to both purify and judge purity on the basis of a single sharp peak) or gel-filtration chromatography. These procedures remove unwanted blocking reagents, precursors, and any truncated sequences. Superior purification may be obtained by an initial pass through HPLC of the 5′-dimethoxytrityl end-protected oligomer, followed by its removal with glacial acetic acid and further HPLC. Several manufacturers supply purification cartridges packed with gel-filtration beads; with these, impure oligonucleotides are directly passed through by means of a syringe, thus greatly speeding up the purification process, although on a small scale. These methods are optimal for oligonucleotides up to 20 bp in length. Beyond this limit, gel electrophoretic methods are probably more effective, with separated bands being easily cut out and pure material excised.

Ease and extent of purification are often dependent on the sequence itself. Those containing runs of guanines are notoriously difficult to handle, with solubility often being a problem. This most probably arises from interstrand aggregation of guanines into, for example, four-stranded helical bundles.

The extent of purity of an oligonucleotide is often indicated by its physical appearance. Pure material should be white in color and floccular rather than yellow and gel-like. Correctness of the sequence can be checked by sequencing methods; one procedure uses enzymatic digestion with snake venom phosphodiesterase followed by two-dimensional electrophoresis on DEAE-cellulose. The differences in mobilities between different nucleosides is such that the sequence can usually be directly read off the plate (12). Another method end-labels the oligomer with 32P-ATP, followed by running on a gel against a standard.

2.2 General Aspects of Crystallization

All DNA structures reported to date are in the duplex form; several also have looped-out bases and one has a hairpin loop. The overwhelming majority of structures have self-complementary sequences. In large part, this has reflected a desire to minimize cost and effort, in that one rather than two sequences are required. However, especially when studying biologically relevant sequences and such phenomena as mispairing, non-self-complementary sequences should be the systems of choice.

Crystals of oligonucleotides are often difficult to produce in the quality required for structure analysis. It is a typical experience that success is limited to 10–20% of sequences examined. Examination of the literature also shows that apparently only certain lengths of oligomer are amenable to crystallization in forms that diffract to better than 2.5-Å resolution, probably on account of crystal-packing factors. Often, although several sequences of a given length may crystallize well, slight changes in sequence can result in total failure to crystallize. A general rule of thumb appears to be that both 5′ and 3′ terminal base pairs should be CG ones to minimize fraying and thus stabilize the sequence from melting. Exceptions to this rule are when the sequence is very GC-rich, so that single AT base pairs at the ends can be tolerated. In addition, high-CG content, in general, will improve prospects for crystallizing at ambient temperatures and possibly for higher resolution data (since thermal motion is reduced). It is always advisable prior to expending effort on crystallizing to obtain a melting curve for the oligomer being studied so that (1) one can ensure that crystallization attempts are made well below the mid-point of the helix-to-coil transition and (2) that there are no hairpins in the structure, shown by multiple transition points. Annealing a sequence in order to remove possible hairpins prior to crystallization, is always to be recommended. In general, both short sequences and decamer and longer sequences with a 5′-end cytosine are often crystallizable, but those with a 5′ guanosine can only be crystallized with considerable difficulty. Two examples from the author’s laboratory illustrate this point: the sequences dGCATGC and dGAAACGTTTC both produce crystals under a variety of conditions, the latter in particular resulting in exceptionally large and well-formed bipyramidal crystals. Yet neither diffract to better than ca. 6 Å, even with high-intensity X-ray sources. Octanucleotide duplexes crystallizing in the A form are the principal exceptions to this rule, with the overwhelming majority of structures reported in this class having a 5′-end guanosine residue (7).

2.3 Crystallization Conditions

Successful oligonucleotide crystallizations have used a remarkably narrow range of conditions, in striking contrast to proteins, where a wide variety of precipitants, counterions, and so on, are routinely employed (17). This in part reflects the uniformity of polyanionic surface presented by an oligonucleotide compared to the enormous range of charge and hydrophobicity patterns on the surface of proteins. The near-universal emphasis on just one type of oligonucleotide crystallizing approach, using 2-methyl-3,4-pentanediol (MPD), also reflects the success achieved by it in the early 1970s with the crystallization and subsequent structure analysis of yeast tRNAPhe. Prior to this success, many tRNA species had been crystallized under a variety of conditions, but they were invariably poor X-ray diffractors, with data not extending beyond about 6 Å.

Table 16 detail some of the crystallization conditions reported in the literature, Those with incomplete or ambiguous descriptions have been excluded, and, in some cases, the temperature used has been inferred from that of the data-collection experiments. There has been little reporting to date of usage of robotic methods, crystal screening with wide-ranging conditions, or of factorial approaches (17).

Table 1 A-DNAa
Table 2 B-DNAa
Table 3 Z-DNAa
Table 4 DNA Mtsmatches and Bulgesa
Table 5 Drug-DNA Minor Groove Complexesa
Table 6 Drug-DNA Intercalation Complexesa

Oligonucleotide crystallization experiments most commonly utilize MPD as precipitant in a closed vapor equilibration system. Some success has been reported with isopropyl alcohol, but there are very few reports of high-quality crystals being obtained with other agents, such as polyethylene glycol, or salts, such as ammonium sulfate. Polyethylene glycol, which is often the agent of choice for proteins, appears to be most useful in the crystallization of drug-oligonucleotide complexes where the drug is water-insoluble. Examples are nogalamycin (62) and actinomycin (63).

The polycation spermine is often employed—its role is presumed to shield the polyanionic helices from each other and so promote crystal packing. There are, however, several well-documented reports—see Table 16—where the presence of spermine has not been found to be essential. Magnesium ions are also required, probably to bridge and stabilize phosphate groups. The system is buffered at approx neutral pH with sodium cacodylate or tris buffer—major deviations from a value of ca. pH 7.0 can lead to base protonation or even acid-catalyzed strand scission in extreme cases. Phosphate buffer is not recommended, as magnesium phosphate can crystallize easily out in the concentrations and temperatures used. Spermine can sometimes play an active structural role, especially in Z-DNA, where for example, it has been in the minor groove of a Z-DNA duplex (64).

Crystallization techniques used reflect the often small amounts of oligomer available. Several different microcrystallization setups are in use. Those that employ vapor diffusion are the most common, with the most popular systems being:

  1. 1.

    Glass depression plates;

  2. 2.

    Plastic petri dishes with a 4 × 4 matrix of drops containing the crystallizing solution (65);

  3. 3.

    Hanging droplets

    For all of these, surfaces should be siliconized to minimize the spreading of droplets. A typical droplet volume is 10–15 µL. This would contain oligomer at a concentration of, typically, 1 mM, which is sufficient for at least six good-sized (0.1 × 0.2 × 0.5 mm) crystals of a decamer. Microdialysis, a technique frequently used for protein crystallization, has only rarely been successfully reported for oligonucleotides (32).

Other important factors include:

  1. 1.

    Time: Crystals of many dodecanucleotide sequences can appear within a few days of setting up, whereas other types of sequence may take many weeks or even months. For example, those of the O 6-methylated Z-form (CG)3 hexamer duplex were reported to have taken a year (45), and then to have only produced three crystals. Seeding has been occasionally used with success, Oligonucleotide crystals frequently deteriorate with age when kept in their mother liquor, with crystal surfaces clouding over and crazing, and diffraction quality being greatly diminished.

  2. 2.

    Temperature: It is axiomatic that sequences with low helix → coil transition→temperatures (T m ) will require cold-room conditions for successful crystallization. However, since this temperature is actually the midpoint of the transition, there will be a significant population of single-stranded species well below this temperature. It is therefore advisable initially to attempt all crystallizations in the cold. A notable exception to this rule is the sequence rU(UA)6A, with a temperature of 35°C being needed for successful production of crystals diffracting to high resolution (26,66). As with all macromolecular crystallizations, temperature stability is important.

  3. 3.

    Relative concentrations of oligonucleotide, magnesium ion, and precipitant: The magnesiumion is usually in considerable molar excess, although there does not appear to be any rule about its molar ratio to the DNA. Aqueous solubilities of different sequences can vary over a wide range, with those containing runs of guanines being the least soluble. Base and backbone modifications can also significantly decrease oligomer solubility.

  4. 4.

    Most drugs that have been complexed with DNA sequences are cationic and freely soluble in water to 10 mM concentration or more. Exceptions are Hoechst 33258 (only sparingly soluble), the anthracycline nogalamycin, and the bis-intercalators of the echinomycin class, such as triostin A All of these except for Hoechst 33258 are uncharged and essentially insoluble in aqueous solution. These drugs can be dissolved in 1:1 chloroform/methanol (triostin A) or methanol (echinomycin and nogalamycin). Once in a droplet with oligonucleotide, the volatile organic solvent quickly evaporates, and the drug becomes solubilized by DNA complexation. Drug-oligonucleotide complexes are generally less water-soluble than the DNA itself, so precipitation is frequently observed at quite low MPD levels. Crystals of complexes often grow out of the precipitate, provided it is not too extensive.

  5. 5.

    The volume of the enclosed system, especially with respect to the surface area of the reservoir of higher concentration precipitant, is important both for the time taken for crystallization and for the amount of water lost from the droplet in order to achieve equilibration in the enclosure. Clearly, too large a volume of container is to be avoided. This factor has to be taken into account when trying to reproduce conditions from other laboratories

  6. 6.

    The gradient in concentration between droplet and reservoir is a critical factor that governs speed of precipitation and/or crystallization. Too steep a gradient typically results in premature precipitation or a mass of small crystals before true equilibrium has been reached.

2.4 Correlations in Crystallizing Conditions

Crystallization of a newly synthesized oligonucleotide sequence normally starts with recourse to an examination of literature conditions (Table 16). Choice of oligomer concentration is often dictated by the quantity of material available. Can choices of concentrations for the other components in a crystallization experiment be made in a systematic manner? This question can be approached by a comparative examination of the data contained in Table 16. Linear regression analysis has been used in order to determine whether there are correlations between pairs of the major variable concentrations of DNA, spermine, magnesium ion, and final MPD concentration in the droplet that have successfully produced crystals. It has been assumed that this MPD concentration is reached at equilibrium, and has been calculated as the simple average of initial droplet and reservoir concentrations.

Table 7 details correlation coefficients for (1) the A- and B-form oligonucleotide conditions together in Table 1 and 2, and (2) the DNA-minor groove complexes in Table 5. It is apparent that none of the variables in the first group are correlated with any degree of significance. This is shown in the random scatter of points in the [DNA]: [MPD] concentration plot (Fig. 1). On the other hand, there are several significant correlations for the drug—DNA data. The magnesium ion: [DNA] correlation of 0.94 is highly statistically significant, even with the relatively small number of observations used (Fig. 2). The correlation between [DNA] and [MPD] levels (Fig. 3) is still statistically significant, although at not the same level. It is at the same level as the magnesium ion:[MPD] one (Fig. 4), although the correlation coefficient for this is increased to 0.70 when the aberrant value at high [Mg2+] is removed. On the other hand, spermine levels are quite uncorrelated with any of the other variables. The correlation coefficient between [Mg2+] and [MPD] for the Z-DNA oligomers (Table 3) is 0.80 (Fig. 5), which also has a weak correlation between [MPD] and [DNA].

Table 7 Correlation Coefficients Calculated by Means of Linear Regression Analysis
Fig. 1.
figure 1

Plot of DNA against MPD concentrations, for the A- and B-type oligonucleotide structures listed in Table 1 and 2. Concentrations are in mM for DNA and % v:v for MPD. Values for A-type oligomers have filled-in circles.

Fig. 2.
figure 2

Plot of DNA against Mg2+ concentratrons (in mM) for the minor groove-oligonucleotide structures listed in Table 5.

Fig. 3.
figure 3

Plot of DNA against MPD concentrations (in %) for the minor groove structures.

Fig. 4.
figure 4

Plot of MPD against Mg2+ concentrations for the minor groove structures

Fig. 5.
figure 5

Plot of MPD against Mg2+ concentrations for the Z-form oligonucleotide structures listed in Table 3.

These observations suggest that for a set of related sequences in length and sequence type, the concentration of the MPD-precipitating agent required for crystallization of a given oligonucleotide can be approximately predicted, as can magnesium levels. It is noteworthy that the magnesium:DNA molar ratio for the drug:DNA minor groove complexes averages 1:10–12, in accord with what one would expect for dodccanucleotides. This suggests that such a ratio, of one Mg2+/nucleotide, may be a generally useful one. On the other hand, spermine concentrations do not show any discernible pattern—it may be significant that a number of oligomers have been crystallized in the absence of spermine, and only rarely has it been located in electron density maps. It may be that the presence of spermine is only fulfilling a secondary need, to provide a general counterion atmosphere in the crystallizing solution. The absence of correlation for the A and B oligomer conditions in Table 1 and 2 may be a reflection of the much greater disparity in other variables compared to the drug complexes in Table 5, such as sequence, temperature, and crystallization method.

3 Oligonucleotide Crystal Forms

It is widely accepted that crystal-packing considerations can force particular lengths of oligonucleotide duplex to adopt one helical type rather than another (7). This is best illustrated by the octanucleotide family, of over 30 structures, all of which have been found in the A-form even though their crystallization conditions are often indistinguishable from those used to crystallize B-form oligonucleotides. Almost all octanucleotides crystallize in one of two space groups, the tetragonal P43212 or the hexagonal P61. Several sequences, such as dGGGCGCCC (25), have been found to crystallize in both forms, with consequent differences in hydration and conformation between them. The highest resolution reported for an A-DNA oligomer is 1.5 Å, although 2.0–2.2 Å is the norm. One octanucleotide sequence dGTACGTAC has uniquely been found to crystallize in the orthorhombic space group P21212 as well as the standard tetragonal form (23,24).

The largest single group of oligonucleotide structures are B-form dodecamers, and are variants of the seminal “Dickerson-Drew” sequence of structures reported to date. The group includes the minor-groove drug complexes and a number of base mismatched structures. Almost all crystallize virtually isomorphously in an orthorhombic P212121 unit cell. A nonisomorphous B-form dodecamer, in space group C2, has been reported (67), although with a similar packing motif. Even more extreme has been the finding (68,69), of two dodecamer sequences, dCCGTACGTACGG and dGCGTACGTACGC, that crystallize in an A-DNA conformation. It appears that a 5′-end sequence of CGC is required in order for a dodecamer sequence to crystallize in the “standard” P212121 form. The highest resolution reported is 1.9 Å, with 2.2–2.5 Å being common. By contrast, several B-form decamer; have been found to crystallize to much higher resolutions (8), (1.3–1.5 Å). These duplexes crystallize in a wide variety of packing modes, in contrast to the dodecamers, with space groups P212121, P6, R3, C2, P3221, and I212121 being reported (8,32,7074).

Z-form oligonucleotides generally diffract to still higher resolution, with dCGCGCG itself having data extending to 1.0 Å. The Z-DNA hexamers all crystallize in an isomorphous orthorhombic P212121 cell.

The mono-intercalation drug complexes are almost all with hexanucleotide duplexes. They tend to crystallize in high-symmetry space groups and frequently diffract to high resolution.

4 Methods of Crystal Structure Analysis

The overwhelming majority of oligonucleotide structures have been solved by molecular replacement methods. This has been facilitated by the tendency of all oligomers of a given length to crystallize in the same space group and with closely similar unit-cell dimensions. The assumption that all such structures are isomorphous is, however, not necessarily correct, as was shown by the existence of two oppositely oriented “half-molecules” in the structure of an apparently normal dodecanucleotide duplex, dCGCAAAAATGCG (29). Nonetheless, most members of each oligonucleotide family are more truly isomorphous. This has been exploited to considerable effect in the case of the dodecamers, with such variations as sequence changes, base mispairing, base alkylation, and drug binding being analyzed, and much significant new information on them being obtained. These variations are invariable in the central 6–8 bp region, where crystal packing factors are not significant.

A number of structures have been solved by search procedures, notably using the ULTIMA (75) and MERLOT (76) computer programs. Oligonucleotide structures are particularly amenable to successful rotation searches, provided a good starting model is available. When this is so, the dominance of the intensity transform by base pairs generally provides an unambiguous and unique solution to a rotation function search. This same feature can cause problems with translation searches, as it can be difficult to distinguish solutions that differ by base-pair translations, especially when oligonucleotide helices are stacked end-to-end in the crystal lattice. Such a situation has been discussed for a B-DNA decamer (32), when a solution was found that appeared to refine satisfactorily, to an R factor of 23%. However, examination of electron density maps and the detail of intermolecular distances indicated that the structure was in error. It is generally true that an incorrect structure does not pack satisfactory in its unit cell, that difference electron density maps show spurious features of significant and unexplained density, and that the R factor will not decrease below about 23–25% even with solvent molecules included. The structure reported for a 13-mer oligonucleotide duplex with a looped-out base (78), initially refined to an R factor of 15%; however a subsequent careful re-examination has shown (79) that significant revisions to this structure are needed.

Direct methods have not been successfully used to solve an oligonucleotide structure. This is not surprising in view of the condition of uniform electron distribution not being obeyed in these structures. One structure has been reported as solved by the application of maximum entropy methods (80,81), which use a variant of the standard sign relationships.

Only a small number of oligonucleotide structures have been solved by multiple isomorphous replacement (82) and can therefore be considered to be determined ab initio, independent of any prior structural model. These include the “Dickerson-Drew” B-form dodecamer dCGCGAATTCGCG and the Z-form hexamer dCGCGCG. In the former case, two derivatives were used, one with a covalently bound bromine atom and the other with cis-dichlorodiamino platinum (II) soaked into the native crystals. The dCGCGCG structure was solved with the aid of three heavy-atom derivatives, of Ba2+, Co2+, and Cu2+, which were diffused into native crystals.