MPprimer: a program for reliable multiplex PCR primer design
- 37k Downloads
Multiplex PCR, defined as the simultaneous amplification of multiple regions of a DNA template or multiple DNA templates using more than one primer set (comprising a forward primer and a reverse primer) in one tube, has been widely used in diagnostic applications of clinical and environmental microbiology studies. However, primer design for multiplex PCR is still a challenging problem and several factors need to be considered. These problems include mis-priming due to nonspecific binding to non-target DNA templates, primer dimerization, and the inability to separate and purify DNA amplicons with similar electrophoretic mobility.
A program named MPprimer was developed to help users for reliable multiplex PCR primer design. It employs the widely used primer design program Primer3 and the primer specificity evaluation program MFEprimer to design and evaluate the candidate primers based on genomic or transcript DNA database, followed by careful examination to avoid primer dimerization. The graph-expanding algorithm derived from the greedy algorithm was used to determine the optimal primer set combinations (PSCs) for multiplex PCR assay. In addition, MPprimer provides a virtual electrophotogram to help users choose the best PSC. The experimental validation from 2× to 5× plex PCR demonstrates the reliability of MPprimer. As another example, MPprimer is able to design the multiplex PCR primers for DMD (dystrophin gene which caused Duchenne Muscular Dystrophy), which has 79 exons, for 20×, 20×, 20×, 14×, and 5× plex PCR reactions in five tubes to detect underlying exon deletions.
MPprimer is a valuable tool for designing specific, non-dimerizing primer set combinations with constrained amplicons size for multiplex PCR assays.
KeywordsGreedy Algorithm Single Nucleotide Polymorphism Multiplex Polymerase Chain Reaction Duchenne Muscular Dystrophy Conventional Polymerase Chain Reaction
Multiplex polymerase chain reaction (PCR), defined as the simultaneous amplification of multiple regions of a DNA template or multiple DNA templates through the use of multiple primer sets (PS, comprising a forward primer and a reverse primer) in one tube, has been widely used in diagnostic applications of clinical [1, 2] and environmental microbiology studies . The key step in running a successful multiplex PCR reaction is to design an optimal primer set combination (PSC, a group of PSs, PSCs for primer set combinations). It is well known that for conventional PCR, the optimal PS has the following standards or properties: 1) primer size: 18-30 bp; 2) product size: 100-500 bp; 3) melting temperature (Tm) of both forward and reverse primers: 58-65°C, with a temperature difference of less than 3°C; 4) GC content of primers: 40-60%; 5) ΔG (Gibbs free energy) of the last five resides of the primers at the 3' end: ≥ -9 kcal/mol; etc[4, 5]. However, there are several additional criteria which must be taken into account when considering multiplex PCR assay: 1) lack of primer dimerization between all of the primers; 2) similarity of the Tms of each primer; 3) primer specificity to avoid mis-priming; and 4) constraint of electrophoretic mobility of the amplicons in order to separate and purify the DNA fragments easily in agarose gel electrophoresis [6, 7, 8, 9].
It has been proved by Nicodeme and Steyaert that determining minimum-set primers for multiplex PCR is an NP-complete problem . Most of the current programs mainly focus on the issue of determining the minimum-set primers, such as MPP (greedy algorithm) , PDA-MS/UniQ (modified compact genetic algorithm) , G-PRIMER (greedy algorithm) , and Greene SCPrimer (greedy algorithm) . There are also other programs available for the design of primers in specific contexts. For example, Primaclade  designs minimally degenerate primers for comparative studies of multiple species. Primique  designs PCR primers specific for each sequence in a gene family. PrimerStation  designs human-specific multiplex PCR primers by checking the entire human genome database. MuPlex [5, 6] utilizes a multi-node graph algorithm derived from a greedy algorithm to assign and partition single nucleotide polymorphisms (SNP) into multiplex-compatible tubes for SNP genotyping. However, with the exception of PrimerStation, few of the programs mentioned above analyze the primer specificity against the genomic or transcript DNA database. Additionally, only a few of these programs provided a simple BLAST  search against the database locally or the GenBank database to examine the specificity of PCR primers [6, 11]. For example, the only database used to check primer specificity in PrimerStation is the human genome database. Moreover, few of the primer design programs constrain the amplicon size to allow separation and purification of the DNA fragments in agarose gel electrophoresis when designing PSCs for multiplex PCR. Some programs simply define a fixed length (for example, 10 bp) as the minimum size difference between the amplicons. However, the relation between an amplicons' size and their electrophoretic mobility in agarose gel electrophoresis is absolutely nonlinear . For example, it is easy to separate two DNA fragments of 100 bp and 150 bp in agarose gel (0.5%-2%) electrophoresis, but quite difficult to separate two amplicons of 1000 bp and 1050 bp in a similar gel. The electrophoretic mobility of the amplicons should be considered at the very beginning of primer design.
In this work, we developed a program named MPprimer to address these issues. It employs the widely used primer design program Primer3  and the primer specificity evaluation program MFEprimer  to design and evaluate candidate primers based on genomic or transcript DNA databases, followed with primer dimerization examination to discard unsuitable primers. Finally, a graph-expanding algorithm derived from a greedy algorithm was used to determine the optimal PSCs for multiplex PCR. MPprimer then provides a virtual electrophotogram to help users choose the best PSCs for multiplex PCR assay.
To avoid primer dimerization, MPprimer examines each pair of primers in a PSC using a dimer checking program named PriDimerCheck, which is part of the PerlPrimer source code , with some modifications. The most significant modification we made was the criterion to determine the complementarity degree of two primers. We calculated the stability of complementarity using a Nearest-Neighbor method using thermodynamic parameters instead of the matching score, which is used in many programs such as AutoDimer  and Primer3. Checking primer dimerization based on thermodynamics is more quantitative and reliable than the matching score method . MPprimer uses a stringent cutoff value of -7 kcal/mol to define a dimer in PriDimerCheck.
MPprimer uses the MFEprimer program to evaluate the specificity of the primers in a PSC. In order to illustrate our specificity evaluation strategy, we used the terms TiPf and TiPr ('T' for template and 'P' for primer) to represent forward and reverse primers of template i, and TjPf and TjPr for primers of template j (where TiPf, TiPr, TjPf and TjPr belong to one PSC). There are two steps of specificity examination in MPprimer. The first step is to evaluate the specificity of primer pair (PP) TiPf and TiPr [PP1] or primer pair of TjPf and TjPr [PP2]. This is similar to conventional PCR primer specificity examination. The second step is to evaluate the nonspecific cross amplification, for example, between primer pairs of TiPf and TjPf [PP3], TiPr and TjPr [PP4], TiPf and TjPr [PP5], or TiPr and TjPf [PP6]. All of these potential primer pairs are thoroughly examined by MFEprimer. If one of these six primer pairs [PP1~6] failed in the examination, the two PSs ([TiPf+TiPr] for Ti and [TjPf+TjPr] for Tj) are not allowed to co-occur in a PSC.
Following dimerization examination and specificity evaluation, a scoring matrix is developed based on the PSs examination results. If there is no dimerization and no nonspecific amplicons found between two PSs, a value of 1 is assigned to these two PSs, otherwise a value of 0 is set to generate the scoring matrix as shown in Figure 1.
Choosing optimal PSCs using a graph-expanding algorithm
We used a graph-expanding algorithm derived from a greedy algorithm to choose the optimal PSC. In the graph model (Figure 2), a node represents a candidate PS and an edge represents a specific relationship between two PSs. An edge connecting two PSs indicates that they are able to work in one tube without any interference or competition to produce nonspecific amplicons. After the first two PSs (nodes) are linked together by one edge (PS1 and PS2), another node is found and added to this graph to join to two new edges (PS3 with PS1 and PS3 with PS2, Figure 2, right panel). Whether two nodes can be connected is based on the primer dimerization and specificity evaluation results, which is displayed in the scoring matrix (described above). By repeating this procedure, we can extend the graph to give it sufficient nodes to create a candidate PSC (Figure 2). Because the PSs designed by Primer3 are sorted by the penalty values (PS1-5 in Figure 2), our graph-expanding strategy changed from randomly selected nodes  to the nodes with lowest penalty. Thus, the program finds the optimal PSC, as opposed to choosing a random one, without requiring additional PSCs sorting time. The scoring matrix makes sure that the PSs in a PSC are compatible. For N × 5 nodes (N× plex PCR primer design or N template sequences, 5 for candidate PSs) in the graph, pre-computing and storing the scoring matrix would require O(N2) time and space.
Results and discussion
We provide both a web application and a stand-alone version of MPprimer for different purposes. The MPprimer web application is conveniently designed, with a friendly interface, while the stand-alone version is applicable to high-throughput multiplex PCR primer design with supporting comprehensive, custom-built databases in order to meet specific user demands. After inputting a set of template DNA sequences in FASTA format, MPprimer will design and output the best 15 PSCs by default in a user-friendly format. Besides the binding pattern of the primers and their template sequences, the detailed specificity evaluation results of MFEprimer are also provided. In addition, for the PSs of each PSC, users can then re-run the sequences on PriDimerCheck and MFEprimer to view the results of possible primer dimerization or nonspecific amplifications with stricter NCBI BLAST parameters such as using a lower word size (-W 4) and larger E value (-e 10000) to improve the sensitivity. Notably, MPprimer also provides a function to generate the virtual agarose gel electrophotogram for each PSC to give users a visual impression before running the real multiplex PCR reaction (see homepage of MPprimer).
Because the purpose of MPprimer is to design an optimal PSC that can specifically amplify the target DNA sequences without nonspecific amplification or primer dimerization, it is useful for designing primers for DNA template sequences without high sequence similarity. However, MPprimer does provide an alternative way to design primers for highly similar sequences (for example, several genes of one family or different transcript variants of the same gene) by allowing users to specify the region of primer location for each DNA template sequence. This requires users to first run a multiple alignment for the sequences to find the nonconserved region(s) before designing primers using MPprimer.
Although designing minimum-set primers can save cost and time by reducing primer synthesis demand [11, 12, 13, 14], it is crucial to design specific PSs, especially at the genomic or transcript level, to perform multiplex PCR with high reliability . MPP , PDA-MS/UniQ , G-PRIMER  and Greene SCPrimer  are mainly concerned with the problem of minimum-set primers. PrimerStation  designs specific multiplex PCR primers only by checking the entire human genome database, but this database is too limited. The MPprimer web application supports transcript level specificity evaluation for more than ten species, while the stand-alone version can support any DNA sequence database, even the large genomic DNA database. MuPlex [5, 6] and MPP  simply use BLAST  to check primer specificity, but this is insufficient. Moreover, several other conditions such as Tm were not considered [9, 22]. Primaclade  and Greene SCPrimer  are used for degenerate primer design, while Primique  focuses on designing specific PCR primers for each sequence in a gene family. However, none of these programs provide the function for predicting the electrophoretic mobility of the amplicons from multiplex PCR reaction . MuPlex [5, 6] and MPprimer use a very similar algorithm to find PSC in a graph where nodes are PSs and edges connect compatible pairwise PSs for multiplex PCR. The difference between them is that MPprimer selects nodes which have a lower penalty (indicating higher quality ) rather than random ones. Therefore, MPprimer can find the optimal PSC without enumerating and sorting all the PSCs to find the optimal one. It should be noted that, as our graph-expanding model is based on the preselected candidate primer sets (MPprimer utilizes Primer3 to design 5 primer sets for each of the template sequences), the output PSCs are not global but only local optimal. In another aspect, the running time of MPprimer is incomparable to other programs, because the specificity examination by MFEprimer requires more time for sequence similarity analysis between the primer sequence and the genomic or complementary DNA database of the same species. Therefore, the MPprimer web application currently only supports transcript level specificity examination. However, the stand-alone version of MPprimer supports unlimited databases, such as the genomic DNA database, which mainly depend on the user's computing capability.
Our further plans are to: 1) automatically analyze and design primers for amplifying different transcript variants for alternative splicing analysis of a single gene; 2) provide an alternative solution, should MPprimer not find suitable primers for a set of template sequences in one tube, by suggesting two or more tubes for one PCR reaction.
We developed a new program named MPprimer with both a web application and a stand-alone version to help users design highly reliable PSs for multiplex PCR assay. The web application is easy-to-use with a friendly interface, while the stand-alone version is applicable to high-throughput multiplex PCR primer design with the support of comprehensive custom-built DNA sequence databases. With the help of MPprimer, users can design reliable primer set combinations for multiplex PCR analysis.
Availability and requirements
Project name: MPprimer
Project home page: http://biocompute.bmi.ac.cn/MPprimer/
Operating system(s): The web application is platform independent and the stand-alone version runs on Linux/Unix.
Programming language: Python
Other requirements: Python ≥ 2.5
License: GNU GPL v 3
Any restrictions to use by non-academics: License needed
We wish to thank the authors of Primer3 for providing such an excellent program for primer design. We also wish to thank the anonymous reviewers for their valuable suggestions to improve this manuscript. This work is supported by National Basic Research Project (973 program) (2006CB504100), The National Key Technologies R&D Program for New Drugs (2009ZX09103-616, 2009ZX09503-002, 2009ZX09301-002), General Program (30900862, 30973107, 30771230) of General Program of National Natural Science Foundation of China, the State Key Laboratory of Proteomics (SKLP2010yx-005), Major Program for Science and Technology Research of Beijing Municipal Bureau (7061004).
- 6.Rachlin J, Ding C, Cantor C, Kasif S: MuPlex: multi-objective multiplex PCR assay design. Nucleic Acids Res 2005, (33 Web Server):W544–547. 10.1093/nar/gki377Google Scholar
- 8.Yamada T, Soma H, Morishita S: PrimerStation: a highly specific multiplex genomic PCR primer design server for the human genome. Nucleic Acids Res 2006, (34 Web Server):W665–669. 10.1093/nar/gkl297Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.