Key words

1 Introduction

The design and manufacture of custom genes is fast becoming an indispensable tool in synthetic biology (1) and protein engineering (2). Current de novo gene synthesis methods include ligase chain reaction (LCR) (3) and polymerase chain reaction (PCR) assembly (4). Both of them rely on the use of overlapped oligonucleotides (oligos) to construct genes. In LCR assembly, adjacent oligonucleotides with no gap between consecutive oligonucleotides are ligased, resulting in the extension of DNA, whereas PCR assembly utilizes DNA polymerase to extend the oligonucleotides. Regardless of whether LCR or PCR assembly is used, a successful synthesis requires appropriate oligonucleotide design to ensure that the oligonucleotides are highly specific for their targets and have uniform hybridization temperature to enhance the assembly efficiency.

Programs have been developed for gene synthesis that include the design of oligonucleotides based on user-specific hybridization temperature and oligonucleotide length (59). The programs DNAWorks (5) or Gene2Oligo (6) provide fairly good synthesis results for DNA sizes below 1 kb. GeneDesign (8) and GeMS (9) have implemented a multipool function for the synthesis of multikilobase genes, in which long DNA sequences are split into smaller segments (∼500 bp). These segments are first assembled in separated pools, before these intermediate segments are assembled into the full-length product in a final PCR step. DNAWorks provides an important and useful feature for predicting the potential for mishybridization and secondary structures among potential oligonucleotides.

This chapter presents a gene design program called TmPrime (10) that is capable to design oligonucleotides—and analyze their potential mishybridization and secondary structures—for up to 20 genes with very long gene sequences (≤40 kb) for use in LCR and gapless PCR assembly. This program allows to construct oligonucleotides with uniform melting temperatures (ΔT m  <  3°C) which increases the yield of the assembled full-length DNA product by PCR gene assembly. These features are useful for de novo gene synthesis, especially for aspiring applications in genome synthesis and multiplex gene synthesis.

2 Materials

2.1 TmPrime Interface

TmPrime uses the equal-temperature (Equi-T m) approach to design oligonucleotide sets. The program first divides the given sequence into fragments, from the beginning to the end of the DNA sequence, based on the user-specified melting temperature (Fig. 1). This process usually leaves a small DNA tail that has a melting temperature (T m) that is lower than the user-specified T m. The fragment boundaries are then shifted to accommodate this tail and to minimize melting temperature deviations of the fragments. Once the melting temperatures of the fragments are equilibrated, the oligonucleotides to be used in gapless PCR or LCR are constructed by connecting two adjacent fragments along both the sense and antisense strands. Each oligonucleotide overlaps with its complementary neighbors by exactly one fragment (see Note 1).

Fig. 1.
figure 1_17

An overview of the oligonucleotide design scheme. TmPrime first divides the input sequence into sections of approximately equal melting temperatures (Equi-T m) using markers based on the user-specified melting temperature. The positions of the markers are iteratively shifted to globally minimize the deviation in melting temperature among the fragments (T m equilibrate). Two adjacent fragments are joined together to generate oligonucleotides for PCR gapless assembly. The two tail segments at the 3′ ends of the sense and antisense sequences are also included for LCR assembly (i.e., R0 and Fn  +  1).

Figure 2 indicates all the parameters that are needed to generate the oligonucleotide sets using TmPrime. Most of the parameters are self-explanatory. The user is asked to provide gene information, gene assembly buffer condition, oligonucleotide and outer primer concentrations, optional parameters for long DNA assembly, and parameters for mispriming analysis. The software will report melting temperatures, oligonucleotide sequences, primer sequences, potential formation of secondary structures, and statistical information of the oligonucleotide sets of each pool compiled in a PDF file.

Fig. 2.
figure 2_17

Gene design web interface of TmPrime.

2.2 Reagents for LCR Gene Assembly

  1. 1.

    100 μM oligonucleotides.

  2. 2.

    T4 ligase and buffer.

  3. 3.

    Ampligase and buffer (Epicentre Biotechnology).

  4. 4.

    T4 polynucleotide kinase.

  5. 5.

    100 μM outer primers.

  6. 6.

    25 mM MgSO4.

  7. 7.

    dNTP mixture (containing 25 mM dATP, 25 mM dGTP, 25 mM dCTP, and 25 mM dTTP).

  8. 8.

    High-fidelity KOD Hot Start DNA polymerase (1.0 U/μl) and 10× KOD buffer (Novagen).

3 Methods

3.1 Calculation with Codon Optimization

TmPrime includes a codon optimizing feature. It implements global codon optimization that replaces each codon based on the organism-specific codon frequencies using the organism-specific codon data in the Codon Usage Database (http//www.kazusa.or.jp/codon/). The user can select an organism for codon optimization from a list of organisms which exists in the Codon Usage Database (NCBI-GenBank Flat File Release 166.0).

3.2 Multiplex Gene Synthesis

TmPrime can handle up to 20 genes with a total DNA length of up to 40 kb (Fig. 2). This function is specifically useful for multiplexing gene synthesis, which allows users to screen the potential mishybridization among a set of multiple genes. When the parameter of “# of pools” is set to 1 (default), TmPrime automatically stitches the uploaded multiple gene sequences together into a single DNA sequence and conducts the oligonucleotide design and mishybridization analysis accordingly (see Note 2). The program performs mishybridization screening through a pairwise sequence alignment with a score based on the user-specified number of matched bases and G  +  C content (see “minimum number of matched bases” and “GC content” parameters in Fig. 2). The program connects adjacent potential mishybridization regions and reports the entire extended region. The oligonucleotides are displayed in alternating upper and lower case (Fig. 3). These features allow users to easily visualize and inspect any problematic DNA regions (see Note 3).

Fig. 3.
figure 3_17

Mishybridization analysis of a 305-bp human minisatellite region. (a) DNA sequence. (b) Partial results of the mishybridization analysis.

3.3 Single-Pool Assembly

TmPrime supports oligonucleotide design for conventional one-step and two-step PCR-based gene syntheses, “TopDown” one-step gene synthesis, and LCR-based gene synthesis. The software generates gapless oligo sets that have no gap between consecutive oligos, and reports the oligo set, which has the lowest melting temperature deviation, with the average melting temperature within ±2°C of the user-specified melting temperature. Oligonucleotides are displayed in alternating upper and lower case to make it easy for the user to find the boundaries with the prefix of oligonucleotide sets and primers defined in Fig. 4 (see Note 4).

Fig. 4.
figure 4_17

(a) Schematic illustration of overlapping PCR assembly. (b) The prefix of oligonucleotide sets and primers in the output file. (c) Oligonucleotides are displayed in alternating upper and lower case for easy finding of the boundaries.

Overlapping PCR assembly is a parallel process, by which the lengths of the overlapping oligonucleotides are extended after each PCR cycle. The theoretical minimum number of cycles (x) needed in order to construct a double-stranded (ds)DNA molecule of the length (L) from an uniform oligonucleotide length (n) and overlapping size (s), or from a pool of m oligonucleotides of various lengths can be calculated by Eqs. 1 and 2, respectively.

$$ {2}^{x}n-({2}^{x}-1)s>L$$
(1)
$$ x\ge {\text{log}}_{2}(m)$$
(2)

Therefore, theoretically, six PCR cycles are sufficient for assembling a 1,000-bp DNA segment from a pool of oligonucleotides of 40 nucleotides (nt) in length with an overlap of 20 nt (see Note 5).

3.4 Multiple-Pool Assembly

For DNA with length of greater than 1.5 kb, we recommend splitting the gene into DNA segments and conducting the gene assembly in multiple steps (11). TmPrime automatically splits the gene into pools of shorter sequences of approximately equal length based on the user-specified number of pools, whereby the pool-pool overlap length is automatically adjusted according to the annealing temperature of the across-pool assembly of the outer primers. This function is implemented in the feature of “Long DNA Assembly” as shown in Fig. 2. Different annealing temperatures can be assigned for the assembly of outer primers across the pool (annealing temperature of outer primer), of outer primers within a pool (annealing temperature of pool primer), and of inner oligonucleotides (annealing temperature of oligonucleotide), thus providing flexibility for long gene construction. Oligonucleotides for each pool assembly are optimized at the same melting temperature to allow the parallel synthesis of different segments or different genes simultaneously in a single thermal cycler. The self-explanatory prefixes of outer primers, pool primers, and inner oligo sets are defined in Fig. 5 (see Note 6).

Fig. 5.
figure 5_17

Schematic illustration of multiple-pool assembly. A long gene is first split into DNA segments, and then, the DNA segments are further divided into individual oligonucleotide sets. The assembly process is conducted in two steps. DNA segments are first assembled from individual oligo sets in separate pools, followed by a final PCR to create the full-length gene. The lengths of pool overlapping regions (P1–P2 and P2–P3) are automatically defined based on the user-specified annealing temperature of pool primers.

3.5 Comparison of Oligonucleotide Design Programs

The oligonucleotide design features of different synthetic gene design programs are summarized in Table 1, and Table 2 compares the performance of these programs for S100A4 (chr1:1503312036–1503311284), GFPuv (Genbank U62636; region of 261–1,020) and the entire genomes of poliovirus (Genbank FJ517648; 7,418 bp) and of øX174 bacteriophage (Genbank J02482; 5,386 bp). TmPrime offers the most homologous melting temperatures with ΔT m  <  3°C, and a wider range of annealing temperatures (50–70°C) as compared to DNAWorks (58–70°C) and Assembly PCR Oligo Maker (50–60°C). GeneDesign cannot adjust the oligonucleotide concentrations and PCR buffer conditions, and oligonucleotide design may fail when the sequence of consecutive oligonucleotides collides. Gene2Oligo has difficulty in designing S100A4 and fails to converge at specified annealing temperature. Only TmPrime can handle the poliovirus and øX174 bacteriophage genomes.

Table 1 Comparisons of the oligonucleotide design features of gene synthesis programs
Table 2 Comparison of the oligonucleotide design performance of different gene synthesis programs

3.6 LCR Gene Assembly Protocol

  1. 1.

    LCR assembly

    The LCR assembly is carried out in a final volume of 50 μl containing 5 μl of 10× T4 ligase buffer, 5 μl of 10× Ampligase buffer, and 10–100 nM of TmPrime-optimized oligonucleotides that have been phosphorylated using 20 U of T4 polynucleotide kinase and 20 U of Ampligase.

    LCR assembly is conducted as follows: 37°C for 4 h, denatured at 95°C for 3 min, ramped to 60°C (matched with the average melting temperature of oligonucleotides) at 0.1°C/s for annealing, and incubated at 60°C for 2–8 h.

  2. 2.

    PCR amplification

    The full-length assembly product is amplified by a PCR containing 5 μl of the assembly mixture from step 1 above, 0.4 μM of outer primers, 1 μl of KOD Hot Start Master Mix, and 1× PCR buffer in a final volume of 25 μl.

    The PCR is conducted under the following conditions: 2 min of initial denaturation at 95°C—30 cycles of 95°C for 20 s, 55°C (matched with the melting temperature of primers) for 30 s, and 72°C for 30 s—followed by a final extension step of 72°C for 10 min.

4 Notes

  1. 1.

    The oligonucleotide sets designed for PCR gene synthesis cannot be directly utilized for LCR gene assembly as the two tail segments at 3′ end of the sense and antisense sequences are not included in the oligonucleotide sets. In addition, the average melting temperature of oligonucleotide sets decreases ∼2.8°C with each order of magnitude decrease in the oligonucleotide concentration. The user should therefore adjust the annealing temperature of PCR and LCR assembly processes accordingly if the oligonucleotide concentration for oligonucleotide design and actual gene assembly are different.

  2. 2.

    The software skips the multigene mispriming analysis if the setting of “# of pools” is not 1. Under this condition, TmPrime assumes that the user will conduct multipool gene synthesis.

  3. 3.

    The potential mishybridization and secondary structures reported by TmPrime depend on the user-specified number of matched bases and the GC content. Users should adjust these parameters according to the GC content of target genes. We recommend starting the gene design with low value of GC content (such as 0.3). This would ensure capturing all potential misprimings and secondary structures even if the gene or portion of the gene has low GC content.

  4. 4.

    TmPrime generates oligonucleotides of various lengths, depending on the base composition profile of the gene sequence. Some genes may contain clusters of G  +  C or A  +  Tregions. The region with the high G  +  C content will generate shorter oligonucleotides than that with a high A  +  T content.

  5. 5.

    The assembly efficiency gradually decreases as the target gene length increases. For single-pool PCR gene synthesis, consistent and successful gene synthesis is obtained with DNA length below 1.5 kbp or from a pool of up to 60 oligonucleotides (12).

  6. 6.

    For DNA with high sequence repeats, PCR-based gene synthesis may not be the best choice. The LCR-based approach is more effective for these challenging DNA sequences as the LCR assembly inherently requires a more stringent assembly condition than that of the PCR process. Ligation only occurs when two adjacent oligonucleotides that do not have any gap are hybridized with an opposite pairing DNA. We recommend conducting the LCR gene assembly with a thermostable DNA ligase (such as Ampligase) and with an elevated annealing temperature to increase the annealing stringency of oligonucleotides and to minimize the potential mishybridization of oligos. We use the LCR gene assembly protocol described in Subheading 3.6.