Abstract
This chapter presents TmPrime, a computer program to design oligonucleotide for both ligase chain reaction (LCR)- and polymerase chain reaction (PCR)-based de novo gene synthesis. The program divides a long input DNA sequence based on user-specified melting temperatures and assembly conditions, and dynamically optimizes the length of oligonucleotides to achieve homologous melting temperatures. The output reports the melting temperatures, oligonucleotide sequences, and potential formation of secondary structures in a PDF file, which will be sent to the user via e-mail. The program also provides functions on sequence pooling to separate long genes into smaller pieces for multipool assembly and codon optimization for expression based on the highest organism-specific codon frequency. This software has been successfully used in the design and synthesis of various genes with total length >20 kbp. This program is freely available at http://prime.ibn.a-star.edu.sg.
Similar content being viewed by others
Key words
- De novo gene synthesis
- TmPrime
- Bioinformatics
- PCR
- Ligase chain assembly
- Melting temperature
- Assembly efficiency
1 Introduction
The design and manufacture of custom genes is fast becoming an indispensable tool in synthetic biology (1) and protein engineering (2). Current de novo gene synthesis methods include ligase chain reaction (LCR) (3) and polymerase chain reaction (PCR) assembly (4). Both of them rely on the use of overlapped oligonucleotides (oligos) to construct genes. In LCR assembly, adjacent oligonucleotides with no gap between consecutive oligonucleotides are ligased, resulting in the extension of DNA, whereas PCR assembly utilizes DNA polymerase to extend the oligonucleotides. Regardless of whether LCR or PCR assembly is used, a successful synthesis requires appropriate oligonucleotide design to ensure that the oligonucleotides are highly specific for their targets and have uniform hybridization temperature to enhance the assembly efficiency.
Programs have been developed for gene synthesis that include the design of oligonucleotides based on user-specific hybridization temperature and oligonucleotide length (5–9). The programs DNAWorks (5) or Gene2Oligo (6) provide fairly good synthesis results for DNA sizes below 1 kb. GeneDesign (8) and GeMS (9) have implemented a multipool function for the synthesis of multikilobase genes, in which long DNA sequences are split into smaller segments (∼500 bp). These segments are first assembled in separated pools, before these intermediate segments are assembled into the full-length product in a final PCR step. DNAWorks provides an important and useful feature for predicting the potential for mishybridization and secondary structures among potential oligonucleotides.
This chapter presents a gene design program called TmPrime (10) that is capable to design oligonucleotides—and analyze their potential mishybridization and secondary structures—for up to 20 genes with very long gene sequences (≤40 kb) for use in LCR and gapless PCR assembly. This program allows to construct oligonucleotides with uniform melting temperatures (ΔT m < 3°C) which increases the yield of the assembled full-length DNA product by PCR gene assembly. These features are useful for de novo gene synthesis, especially for aspiring applications in genome synthesis and multiplex gene synthesis.
2 Materials
2.1 TmPrime Interface
TmPrime uses the equal-temperature (Equi-T m) approach to design oligonucleotide sets. The program first divides the given sequence into fragments, from the beginning to the end of the DNA sequence, based on the user-specified melting temperature (Fig. 1). This process usually leaves a small DNA tail that has a melting temperature (T m) that is lower than the user-specified T m. The fragment boundaries are then shifted to accommodate this tail and to minimize melting temperature deviations of the fragments. Once the melting temperatures of the fragments are equilibrated, the oligonucleotides to be used in gapless PCR or LCR are constructed by connecting two adjacent fragments along both the sense and antisense strands. Each oligonucleotide overlaps with its complementary neighbors by exactly one fragment (see Note 1).
Figure 2 indicates all the parameters that are needed to generate the oligonucleotide sets using TmPrime. Most of the parameters are self-explanatory. The user is asked to provide gene information, gene assembly buffer condition, oligonucleotide and outer primer concentrations, optional parameters for long DNA assembly, and parameters for mispriming analysis. The software will report melting temperatures, oligonucleotide sequences, primer sequences, potential formation of secondary structures, and statistical information of the oligonucleotide sets of each pool compiled in a PDF file.
2.2 Reagents for LCR Gene Assembly
-
1.
100 μM oligonucleotides.
-
2.
T4 ligase and buffer.
-
3.
Ampligase and buffer (Epicentre Biotechnology).
-
4.
T4 polynucleotide kinase.
-
5.
100 μM outer primers.
-
6.
25 mM MgSO4.
-
7.
dNTP mixture (containing 25 mM dATP, 25 mM dGTP, 25 mM dCTP, and 25 mM dTTP).
-
8.
High-fidelity KOD Hot Start DNA polymerase (1.0 U/μl) and 10× KOD buffer (Novagen).
3 Methods
3.1 Calculation with Codon Optimization
TmPrime includes a codon optimizing feature. It implements global codon optimization that replaces each codon based on the organism-specific codon frequencies using the organism-specific codon data in the Codon Usage Database (http//www.kazusa.or.jp/codon/). The user can select an organism for codon optimization from a list of organisms which exists in the Codon Usage Database (NCBI-GenBank Flat File Release 166.0).
3.2 Multiplex Gene Synthesis
TmPrime can handle up to 20 genes with a total DNA length of up to 40 kb (Fig. 2). This function is specifically useful for multiplexing gene synthesis, which allows users to screen the potential mishybridization among a set of multiple genes. When the parameter of “# of pools” is set to 1 (default), TmPrime automatically stitches the uploaded multiple gene sequences together into a single DNA sequence and conducts the oligonucleotide design and mishybridization analysis accordingly (see Note 2). The program performs mishybridization screening through a pairwise sequence alignment with a score based on the user-specified number of matched bases and G + C content (see “minimum number of matched bases” and “GC content” parameters in Fig. 2). The program connects adjacent potential mishybridization regions and reports the entire extended region. The oligonucleotides are displayed in alternating upper and lower case (Fig. 3). These features allow users to easily visualize and inspect any problematic DNA regions (see Note 3).
3.3 Single-Pool Assembly
TmPrime supports oligonucleotide design for conventional one-step and two-step PCR-based gene syntheses, “TopDown” one-step gene synthesis, and LCR-based gene synthesis. The software generates gapless oligo sets that have no gap between consecutive oligos, and reports the oligo set, which has the lowest melting temperature deviation, with the average melting temperature within ±2°C of the user-specified melting temperature. Oligonucleotides are displayed in alternating upper and lower case to make it easy for the user to find the boundaries with the prefix of oligonucleotide sets and primers defined in Fig. 4 (see Note 4).
Overlapping PCR assembly is a parallel process, by which the lengths of the overlapping oligonucleotides are extended after each PCR cycle. The theoretical minimum number of cycles (x) needed in order to construct a double-stranded (ds)DNA molecule of the length (L) from an uniform oligonucleotide length (n) and overlapping size (s), or from a pool of m oligonucleotides of various lengths can be calculated by Eqs. 1 and 2, respectively.
Therefore, theoretically, six PCR cycles are sufficient for assembling a 1,000-bp DNA segment from a pool of oligonucleotides of 40 nucleotides (nt) in length with an overlap of 20 nt (see Note 5).
3.4 Multiple-Pool Assembly
For DNA with length of greater than 1.5 kb, we recommend splitting the gene into DNA segments and conducting the gene assembly in multiple steps (11). TmPrime automatically splits the gene into pools of shorter sequences of approximately equal length based on the user-specified number of pools, whereby the pool-pool overlap length is automatically adjusted according to the annealing temperature of the across-pool assembly of the outer primers. This function is implemented in the feature of “Long DNA Assembly” as shown in Fig. 2. Different annealing temperatures can be assigned for the assembly of outer primers across the pool (annealing temperature of outer primer), of outer primers within a pool (annealing temperature of pool primer), and of inner oligonucleotides (annealing temperature of oligonucleotide), thus providing flexibility for long gene construction. Oligonucleotides for each pool assembly are optimized at the same melting temperature to allow the parallel synthesis of different segments or different genes simultaneously in a single thermal cycler. The self-explanatory prefixes of outer primers, pool primers, and inner oligo sets are defined in Fig. 5 (see Note 6).
3.5 Comparison of Oligonucleotide Design Programs
The oligonucleotide design features of different synthetic gene design programs are summarized in Table 1, and Table 2 compares the performance of these programs for S100A4 (chr1:1503312036–1503311284), GFPuv (Genbank U62636; region of 261–1,020) and the entire genomes of poliovirus (Genbank FJ517648; 7,418 bp) and of øX174 bacteriophage (Genbank J02482; 5,386 bp). TmPrime offers the most homologous melting temperatures with ΔT m < 3°C, and a wider range of annealing temperatures (50–70°C) as compared to DNAWorks (58–70°C) and Assembly PCR Oligo Maker (50–60°C). GeneDesign cannot adjust the oligonucleotide concentrations and PCR buffer conditions, and oligonucleotide design may fail when the sequence of consecutive oligonucleotides collides. Gene2Oligo has difficulty in designing S100A4 and fails to converge at specified annealing temperature. Only TmPrime can handle the poliovirus and øX174 bacteriophage genomes.
3.6 LCR Gene Assembly Protocol
-
1.
LCR assembly
The LCR assembly is carried out in a final volume of 50 μl containing 5 μl of 10× T4 ligase buffer, 5 μl of 10× Ampligase buffer, and 10–100 nM of TmPrime-optimized oligonucleotides that have been phosphorylated using 20 U of T4 polynucleotide kinase and 20 U of Ampligase.
LCR assembly is conducted as follows: 37°C for 4 h, denatured at 95°C for 3 min, ramped to 60°C (matched with the average melting temperature of oligonucleotides) at 0.1°C/s for annealing, and incubated at 60°C for 2–8 h.
-
2.
PCR amplification
The full-length assembly product is amplified by a PCR containing 5 μl of the assembly mixture from step 1 above, 0.4 μM of outer primers, 1 μl of KOD Hot Start Master Mix, and 1× PCR buffer in a final volume of 25 μl.
The PCR is conducted under the following conditions: 2 min of initial denaturation at 95°C—30 cycles of 95°C for 20 s, 55°C (matched with the melting temperature of primers) for 30 s, and 72°C for 30 s—followed by a final extension step of 72°C for 10 min.
4 Notes
-
1.
The oligonucleotide sets designed for PCR gene synthesis cannot be directly utilized for LCR gene assembly as the two tail segments at 3′ end of the sense and antisense sequences are not included in the oligonucleotide sets. In addition, the average melting temperature of oligonucleotide sets decreases ∼2.8°C with each order of magnitude decrease in the oligonucleotide concentration. The user should therefore adjust the annealing temperature of PCR and LCR assembly processes accordingly if the oligonucleotide concentration for oligonucleotide design and actual gene assembly are different.
-
2.
The software skips the multigene mispriming analysis if the setting of “# of pools” is not 1. Under this condition, TmPrime assumes that the user will conduct multipool gene synthesis.
-
3.
The potential mishybridization and secondary structures reported by TmPrime depend on the user-specified number of matched bases and the GC content. Users should adjust these parameters according to the GC content of target genes. We recommend starting the gene design with low value of GC content (such as 0.3). This would ensure capturing all potential misprimings and secondary structures even if the gene or portion of the gene has low GC content.
-
4.
TmPrime generates oligonucleotides of various lengths, depending on the base composition profile of the gene sequence. Some genes may contain clusters of G + C or A + Tregions. The region with the high G + C content will generate shorter oligonucleotides than that with a high A + T content.
-
5.
The assembly efficiency gradually decreases as the target gene length increases. For single-pool PCR gene synthesis, consistent and successful gene synthesis is obtained with DNA length below 1.5 kbp or from a pool of up to 60 oligonucleotides (12).
-
6.
For DNA with high sequence repeats, PCR-based gene synthesis may not be the best choice. The LCR-based approach is more effective for these challenging DNA sequences as the LCR assembly inherently requires a more stringent assembly condition than that of the PCR process. Ligation only occurs when two adjacent oligonucleotides that do not have any gap are hybridized with an opposite pairing DNA. We recommend conducting the LCR gene assembly with a thermostable DNA ligase (such as Ampligase) and with an elevated annealing temperature to increase the annealing stringency of oligonucleotides and to minimize the potential mishybridization of oligos. We use the LCR gene assembly protocol described in Subheading 3.6.
References
Cox JC, Lape J, Sayed MA and Hellinga HW (2007) Protein fabrication automation. Protein Sci 16:379–390.
Sprinzak D, and Elowitz MB (2005) Reconstruction of genetic circuits. Nature 438:443–448.
Au LC, Yang FY, Yang WJ, Lo SH and Kao CF (1998) Gene synthesis by a LCR-based approach: High-level production of leptin-L54 using synthetic gene in Escherichia coli. Biochem Biophys Res Commun. 248:200–203.
Prodromou C and Pearl L (1992) Recursive PCR: A novel technique for total gene synthesis. Protein Eng 5:827–829.
Hoover DM and Lubkowski J (2002) DNAWorks: An automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 30:e43.
Rouillard J-M, Lee W, Truan G, Gao X, Zhou X and Gulari E (2004) Gene2oligo: Oligonucleotide design for in vitro gene synthesis. Nucleic Acids Res 32:W176–W180.
Rydzanicz R, Zhao XS and Johnson PE (2005) Assembly PCR oligo maker: A tool for designing oligodeoxynucleotides for constructing long DNA molecules for RNA production. Nucleic Acids Res 33:W521–W525.
Richardson SM, Wheelan SJ, Yarrington RM and Boeke JD (2006) GeneDesign: Rapid, automated design of multikilobase synthetic genes. Genome Res 16:550–556.
Jayaraj S, Reid R and Santi DV (2005) GeMS: An advanced software package for designing synthetic genes. Nucleic Acids Res 33:3011–3016.
Bode M, Khor S, Ye H, Li,M-H and Ying JY (2009) TmPrime: fast, flexible oligonucleotide design software for gene synthesis. Nucleic Acids Res 37:W214–W221.
Shevchuk NA, Bryksin AV, Nusinovich YA, Cabello FC, Sutherland, M and Ladisch S (2004) Construction of long DNA molecules using long PCR-based fusion of several fragments simultaneously. Nucleic Acids Res 32:e19.
Cheong WC, Lim LS, Huang MC, Bode M and Li M-H (2010) New Insights into the de novo Gene Synthesis Using the Automatic Kinetics Switch Approach. Anal Biochem.406:51–60.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Sceince+Business Media, LLC
About this protocol
Cite this protocol
Li, MH., Bode, M., Huang, M.C., Cheong, W.C., Lim, L.S. (2012). De Novo Gene Synthesis Design Using TmPrime Software. In: Peccoud, J. (eds) Gene Synthesis. Methods in Molecular Biology, vol 852. Humana Press. https://doi.org/10.1007/978-1-61779-564-0_17
Download citation
DOI: https://doi.org/10.1007/978-1-61779-564-0_17
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-563-3
Online ISBN: 978-1-61779-564-0
eBook Packages: Springer Protocols