1 Nucleic Acid Extraction, Purification and Storage

The application of PCR and other methods in molecular biology require the extraction of nucleic acid from biological samples and a number of approaches have been devised for performing this extraction. Nucleic acids generally do not occur as free molecules but rather in bacteria, cells, virus particles, fungi, protozoa etc. as they are covered with cell membranes and walls which are composed of proteins, lipids and sugars. Nucleic acids themselves form complexes with histone and other proteins and to extract nucleic acids which are present in this manner, the cell membranes and walls covering them must be disrupted and the proteins of the complexes mentioned above denatured or degraded to thereby become soluble, so that the nucleic acids are freed and then extracted.

In the case of isolating a nucleic acid, the nucleic acid-protein complex needs to be denatured or degraded to free the desired nucleic acid from the complex so that it can be solubilised and extracted.

In recent years, a number of approaches have been developed for rapid extraction of nucleic acids from various materials, including blood, blood, serum, faeces, urine, tissue (including paraffin-embedded), cell cultures, plasmid DNA from bacterial lysates and genomic DNA and total RNA from blood, animal and human or plant tissues and cell cultures.

The methods not only provide an ease and convenience of processing, but they allow the processing of a high volume of samples [70]. Many websites exist to assist the scientist – one excellent website combining many links to protocols can be found at http://www.molecularstation.com/.

Nucleic acids have been conventionally extracted by one or a combination of the following methods:

  • Extraction with phenol and phenol/chloroform mixtures for purification of DNA and RNA. Proteins and restriction enzymes are removed by phenol and chloroform in disrupting protein secondary structure causing proteins to denature and precipitate from solution. Although each of these solvents is capable of performing this function alone, the two materials together remove proteins from solution much more effectively. Nucleic acids are recovered in the liquid phase.

  • Nucleic acids are released by means of strongly denaturing and reducing agents, including hydrolytic enzymes (e.g. protease, lysozyme, lyticase), from cells and tissues and subsequently extracted and purified with a mixture of chloroform and phenol. The nucleic acids are finally obtained from the aqueous phase by ethanol precipitation or narrowing down by means of dialysis [63] or capturing via silicon resins or magnetic particles.

  • Proteinase KFootnote 1/phenol method, in which a proteolytic enzyme such as proteinase K or a surfactant is added to disrupt the cell membrane or wall and the protein of a complex of interest is degraded to free nucleic acids; then phenol/chloroform are added and the mixture is centrifuged to have the nucleic acids transferred into the aqueous phase; the aqueous phase is recovered by separation and ethanol, isopropanol or the like is added to the recovered aqueous phase, thereby rendering the nucleic acids insoluble [63].

  • The AGPC method, in which a liquid mixture of guanidinium isothiocyanate and phenol is added to a sample of interest to disrupt the cell membrane and wall, so that the protein of the complex is denatured to become soluble; nucleic acids are then freed and chloroform is added to transfer the nucleic acids to the aqueous phase; the aqueous phase is recovered by separation and thereafter, ethanol, isopropanol or the like is added to the recovered aqueous phase, thereby rendering the nucleic acids insoluble [15]. This method often uses a proprietary formulation of this reagent called Trizol (TRIzol is a chemical solution used in RNA/DNA/protein extraction and is the brand name of the product from Invitrogen).

  • The guanidinium method, in which guanidinium hydrochloride or guanidinium thiocyanate is added to a sample of interest to disrupt the cell membrane and wall, so that the protein of the complex is denatured to become soluble and to remove dissolved impurities; nucleic acids are then freed and ethanol or isopropanol is added to render the free nucleic acids insoluble. Elution of any bound nucleic acids from the supporting material is with water or low-salt buffers such as 10 mmol/l Tris or TE (10 mmol/l Tris, 1 mmol/l EDTA). An advantage of this method is that chaotropic salts ensure the irreversible denaturing and thus inactivation of nucleases. Essential disadvantages of the method are that the concentration of the chaotropic salts to some extent have to be strongly adjusted to the material used, and also the lysis of biological material, such as fungal or plant tissue, is sometimes only very inefficient (these may require initial protease inclusion to dissolve these thicker cell walls). If enzymes (proteinase, RNase) are used in the purification methods, the concentration of chaotropic agents must be reduced below values which otherwise bring about inactivation of nucleases [15, 63].

  • The sodium iodide method, in which sodium iodide containing glycogen which has affinity for the nucleic acid to be extracted is added to a sample of interest, whereby the cell membrane and wall are disrupted and the protein of the complex is denatured, so that it becomes soluble; nucleic acids are then freed and isopropanol is added to render the free nucleic acids and glycogen insoluble [32].

Nucleic-acid-containing solutions can also be obtained by incubation of nucleic-acid-containing materials with lysis buffers, which contain either (i) chaotropic salts such as guadinine salts, (ii) alkaline compounds such as NaOH, (iii) neutral, anionic or cationic detergents such as sodium dodecyl sulphate (SDS), Triton X-100, TWEEN-20 or hexadecyl trimethyl ammonium bromide CTAB or (iv) enzymes such as proteinase K or lysozyme in bacteria, lyticase in yeasts, chitinase in fungi, or proteases in tissues. All approaches are to lyse cells and release the nucleic acid along and may be followed with extraction using phenol and/or chloroform [63].

These methods were superseded by the more rapid and easier method devised by Boom et al. [10]. The method is based on the lysing and nuclease-inactivating properties of the chaotropic agent guanidinium thiocyanate together with the nucleic acid-binding properties of silica particles or diatoms in the presence of this agent. By using size-fractionated silica particles, nucleic acids (covalently closed circular, relaxed circular and linear double-stranded DNA; single-stranded DNA; and rRNA) could be purified from 12 different specimens in less than 1 h and were recovered in the initial reaction vessel.

These “classical” methods are especially time-consuming (sometimes taking up to 48 h), require a considerable amount of equipment, and relatively large quantities of biological material and, in addition, involve a considerable health risk (amongst other things due to the use of chloroform and phenol).

The newer methods are commercially marketed in the form of easy to use extraction kits (some in broad total nucleic acid extraction format) see Table 2.1, and are based on the principle that nucleic acids bind to mineral supports in the presence of high ionic strength, especially chaotropic salts. Finely ground glass powder (e.g. Promega, MoBio), diatomaceous earth (Sigma), silica gels (Qiagen) or chemically modified materials such as silica carbide can also be used and have also proved successful as supporting materials. Ambion, Applied Biosystems, Epicentre Technologies, Invitrogen, MO BIO Laboratories, Inc., QIAGEN, Roche Diagnostics and Sigma Aldrich, are some of the larger Companies with manual extraction kits and methods and there are many other smaller companies vying for business. However many of the smaller companies are being bought out by the larger companies and this will most likely continue over the next few years. Additionally many methods have been the subject of intellectual property and patent clams and this has also limited further development with some of the methodology.

Table 2.1 “Larger” companies with manual DNA and RNA extraction kits

Some of these commercial kits can be used on different robotic extraction platforms – which make these robotic systems more attractive as more rapid, better and cheaper kits become available. Refer to Table 2.2 for major systems available in Australia. Automated systems allow you to run from 1 to 96 samples at once with minimal hands on time. DNA and/or RNA can be extracted in time frames from around 10 min up to 2 h. Some of the larger systems are also an “Open System” which allows tailoring of ones own methods and/or commercial kits on the machine. Other systems are closed with no change in a specific procedure possible. The major suppliers are Roche, Qiagen, Abbott, Biomerieux, Thermo Scientific, Promega, Invitrogen, Ambion and Epicentre Biotechnologies with continual advances from most – what exists today may change tomorrow which makes specifics a little more challenging! However an automated system is not for every laboratory as they are not cheap to purchase and depending on the target and PCR methods employed, such “quality” nucleic acid may not even be necessary.

Table 2.2 Automated systems for nucleic acid extraction

Methods based on silica do not involve the use of chaotropic salts. An advantage of such a commercially available method is that nucleic acids can be extracted even from materials with very small nucleic acid content with a universal protocol. A disadvantage is that the nucleic acid preparations may not meet high quality standards (including photometric measurement-absorption ratios of less than 1.70 at 260 nm to 280 nm) and this may increase the potential of inhibition within the PCR reaction. It is possible that PCR inhibitors (heme protein, anionic surfactants, cationic surfactants, non-ionic surfactants and zwitterionic surfactants) can sometimes interfere with subsequent PCR amplification. These inhibitors may therefore be present in the extraction solution or obtained as part of the extraction process from the biological sample or from some other source. The presence of PCR inhibitors in the extraction solution would result in little or no amplification of nucleic acids and this would be deemed to constitute absence of effective extraction from the sample. That is where the inclusion in the PCR of some form of internal control is of importance and why standards push for their inclusion.

Therefore the chosen method by which high-purity total DNA and total RNA (including tRNA, mRNA, rRNA, mitochondrial RNA and hnRNA [Heterogeneous nuclear RNA – a variety of RNAs found in the nucleus, including primary transcripts]) can be prepared, which is universally applicable with regard to the nucleic-acid-containing source material and is quick and simple to handle, will require evaluation to optimise methods for each application of extraction and PCR combination for the specific specimen type and method chosen.

It is therefore essential to start with what your target for PCR actually is. This is reflected in the basic premise that the PCR itself can only amplify DNA via the action of DNA polymerase enzymes. Many targets may be some form of the less stable RNA, be it messenger, transfer or viral RNA and it is this more labile feature of RNA that makes extraction, purification and handling of RNA a more stringent procedure. Ideally the process used should be quick, simple, reproducible and this has been improved with the many different commercial methods available. Put simply – nucleic acids must be extracted in such a manner that they can be subsequently amplified by PCR.

Coupled to these advances are the increasingly available commercial robotic extraction systems which increase the throughput and lessen the boredom of multiple smaller scale extractions. These systems may use magnetic particle-based extraction (the majority of high through-put systems) or silica resin based extraction systems. These systems may extract up to 96 samples per run but some use more plastic disposables than others and some may be used coupled with a liquid handling system for higher throughput of PCR testing. This then also becomes a budgetary decision.

A look at the methods within this text will show that the variety of nucleic acid extraction methods is vast. Many are commercial, many are performed on robotic platforms and many may be a simple extraction procedure without the extensive lysis and washing procedures demanded for high quality nucleic acid.

Other simple procedures for nucleic acid extraction consist of lysing microorganism membranes by a combination of three alternative modes of lysis: chemical lysis using a detergent, mechanical lysis by agitation in the presence of beads (good for cell wall disruption), and heat shock lysis by repeated freezing and incubation at very high temperature (around 95–100°C). All are time consuming.

The simplest procedure able to be used in some assays is the “boil for 10 min method”. This effectively lyses cells and may inactivate high risk microorganisms. It is still used in some laboratories and for specific assays fully evaluated for the use of such a procedure. However it is the potential existence of other factors within the sample itself which may inhibit the actual PCR method used. This may be cellular or histone proteins, lipids, cell wall components or heme proteins derived from blood which has been shown to be inhibitory to some PCR methods. This inhibition is most often seen with the extraction of DNA from faecal samples. The extraction of the more labile RNA from stools also requires methods more suitable to extraction, stability and avoidance of inhibition. Some commercial kits have been designed with inhibitor removal buffers or tablets for these specific applications.

Three main rapid extraction procedures are available in the form of ready to use solutions. These can eliminate the time-consuming and labor-intensive deproteinisation, organic extraction, dialysis, and alcohol precipitation protocols required in traditional DNA purification procedures. Additionally, in many instances, they can replace robotic extraction procedures and save valuable time:

  1. 1.

    Bio-Rad Laboratories has a 20 ml Chelex-based resin solution for PCR-ready DNA purification from blood, cultured cells, or bacteria. The procedure is rapid: incubate sample with InstaGene® matrix at 56°C for 15–30 min, then boil for 8 min. and microcentrifuge for 2 min to pellet the resin. DNA in the supernatant is ready for PCR. InstaGene® matrix is made with a specially formulated 6% w/v Chelex resin and it makes DNA sample preparation fast, easy, and cost-effective. The Chelex matrix binds to PCR inhibitors and adsorbs cell lysis products rather than DNA, preventing DNA loss due to irreversible DNA binding and produces an improved substrate for PCR amplification.

    http://www.bio-rad.com/prd/en/US/adirect/biorad?cmd=BRCatgProductDetail&productID=111001

  2. 2.

    Applied Biosystems has a 20 ml PrepMan® Ultra Preparation reagent and is applicable for a variety of different sample preparation applications. It has been used successfully to prepare DNA template from bacteria, yeast, filamentous fungi, both from a plate or from tissue smears, human cells (buccal swab), mammalian whole blood and from Gram-negative food-borne pathogens for use in PCR amplification reactions. Using a simple boil and spin protocol, PrepMan®Ultra Reagent efficiently inactivates PCR inhibitors and significantly reduces the need to repeat the template preparation step. PrepMan®Ultra Reagent is a novel formulation, developed entirely at Applied Biosystems, and is a homogeneous solution that does not contain Chelex or any other type of resin or matrix. It is based on ethyl glycol monobutyl ether with hydroxylated organoamine. The protocol is rapid and easy: resuspend the sample in 200 μl PrepMan® Ultra reagent or add the sample directly to 200 μl PrepMan® Ultra reagent, boil for 10 min, cool for 2 min, microcentrifuge for 2 min, transfer 5 μl of the supernatant to the assay.

    https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=602362&tab=Overview

  3. 3.

    EPICENTRE Biotechnologies has a range of rapid extraction procedures depending on sample type and “target” of extraction – be it DNA or RNA. http://www.epibio.com/main.asp

Purified RNA should be stored at –20°C or –70°C in RNase-free water. One must ensure that when RNA is purified using a chosen kit or method that no degradation will occur upon storage. Purified DNA should be stored at –20°C or –70°C under slightly basic conditions (e.g., Tris-HCl, pH 8.0) because acidic conditions can cause hydrolysis of DNA. It is preferable to store diluted solutions of nucleic acids in aliquots and thaw them once only. It is also recommended to store aliquots in siliconised or low absorption tubes to avoid adsorption of nucleic acids to the tube walls, which would reduce the concentration of nucleic acids in solution.

New nucleic acid stabilisation technologies allow for the storage of DNA and RNA at room temperature in a cost-effective, environmentally friendly manner. Some of the currently available products include Biomatrica, GenVault and the Qiagen QIAsafe DNA Tubes and 96-well Plates. These innovative technologies provide room temperature storage which saves on refrigeration costs and enables easy transportation. Sample recovery is as easy as “just add water”!

A recent study [83] evaluated two novel products for room temperature DNA storage: Biomatrica’s DNA SampleMatrix technology and GenVault’s GenTegra DNA technology. The study compares the integrity and quality of DNA stored using these products against DNA stored in a freezer by performing downstream testing with short range PCR, long range PCR, DNA sequencing, and SNP microarrays. In addition, the investigators tested Biomatrica’s RNAstable product for its ability to preserve RNA at room temperature for use in a quantitative reverse transcription PCR assay.

2 Conventional PCR

2.1 Introduction

The discovery of the polymerase chain reaction (PCR) in 1985 revolutionised the diagnosis of infectious diseases in clinical laboratories by allowing rapid, sensitive and specific detection and identification of pathogens directly from clinical specimens, without the need for culture. PCR-based assays enable the amplification of a few target molecules (theoretically a single cell) to detectable levels, from both viable and non-viable cells. These applications are gradually complementing or replacing culture-based, biochemical and immunological assays in routine diagnostic laboratories [84].

2.2 Components of PCR

A conventional PCR reaction mix consists of target DNA, two primers, heat-stable DNA polymerase, deoxynucleotide triphosphates (dNTPs including dATP, dCTP, dGTP and dTTP), and a buffer usually containing Mg2+. Primers are short (20–30 base pairs) oligonucleotides of known sequence that are complementary to the two 3-ends of the target DNA [54]. The specificity of the primers determines the accuracy of the PCR assay as poor quality nucleic acids or non-target background DNA can influence the specific annealing of the primers. This can result in non-specific amplification and possible misinterpretation of results [84].

When the target nucleic acid is RNA, reverse transcriptase is included in the PCR reaction to convert RNA to cDNA. This procedure is known as reverse transcription PCR (RT-PCR).

A nested PCR assay is a type of conventional PCR that uses two pairs of primers in two separate, successive reactions. The initial reaction amplifies a target region of DNA with an outer primer pair. The resulting PCR product is used as template DNA for the second reaction, which employs a second set of primers that are located internally to those used in the first reaction. These assays have better sensitivity and specificity than single amplification assays and are useful for pathogen detection in clinical specimens. However, they are more prone to contamination from carryover of PCR product from the first reaction to the second [52].

2.3 PCR Amplification and Product Detection

PCR amplification is automated and performed on thermocyclers programmed to heat and cool to different temperatures for varying lengths of time.

Each PCR cycle involves three steps: (i) denaturation, (ii) annealing, and (iii) extension, and the cycles are repeated 20–40 times. The steps for RT-PCR are essentially the same, once the RNA has been transcribed to cDNA at 40–50°C. During denaturation, the DNA template is heated (94–96°C) to separate the two DNA strands. The temperature is then cooled (50–65°C) during the annealing step to allow the specific primers to hybridize to the 3 ends of the separated DNA strands. The annealing temperature is dependent on the length and composition of the primers. Finally, during extension (72°C), the heat stable Taq DNA polymerase catalyses the elongation of the primers by incorporating the complimentary dNTPs that bind to the target DNA. The extended primers form two new strands of target DNA for the next PCR cycle. In theory, the amount of target DNA should double after each PCR cycle [54].

“Hot-start” PCR was developed to improve PCR amplification and specificity by reducing non-specific amplification during PCR set-up. The Hot-start Taq DNA polymerase enzyme is inactive at ambient temperature, preventing the extension of non-specifically annealed primers or the formation of primer dimer. The functional activity of the enzyme is restored during incubation at 95°C for 5–10 min [17].

Upon completion of a conventional PCR assay, the amplified products are usually analysed by agarose gel electrophoresis using DNA-binding fluorescent dyes (e.g. ethidium bromide) under UV illumination and fragment length as an indicator for identification [52].

2.4 Conventional PCR in the Diagnostic Laboratory

PCR-based assays are now accepted as the standard method for detecting many viruses and bacteria in diagnostic microbiology laboratories, however, their use is lagging for diagnosis of fungi and parasites due to the absence of commercial kits with quality controls. The development of real-time PCR technology and automated DNA extraction systems is expected to improve the reliability of conventional assays [11].

Generally, conventional PCR-based assays are only able to detect a single parameter, which can limit their scope unless a particular pathogen is suspected. Broad-range PCR assays, which target universal regions such as 16S-23S rRNA and heat-shock proteins, have been developed to allow simultaneous testing for more than one organism or to screen clinical specimens for pathogens. If the PCR assay yields an amplicon, the aetiologic agent must be identified by DNA sequencing [52]. Multiplex PCR assays, which incorporate multiple sets of primers in a single reaction to simultaneously detect numerous pathogens have also been developed, but they are better suited to real-time PCR technologies.

2.5 Limitations of Conventional PCR

The considerable increase in analytical sensitivity and specificity of PCR-based assays compared to conventional diagnostic tests are their major advantage, however, there are also limitations. Each individual PCR assay requires careful optimisation of reagents (Mg2+ and primers) and amplification conditions. Primer design is extremely important for effective PCR amplification as cross-reaction with non-target DNA can result in non-specific products. The cost of performing molecular tests is high in comparison to traditional diagnostic tests and re-imbursement is often low, particularly for assays developed “in-house” [54]. Additionally, laboratories performing these assays need to invest considerable costs in dedicated “DNA-free” laboratory space and equipment. This is essential to minimise contamination of subsequent specimens by PCR amplicons that can lead to false positive results. This must be monitored by the inclusion of non-template or water controls in every PCR. The development of closed-tube, real-time PCR technologies and melting curve analysis has greatly reduced the risk of contamination and false positive results. DNA degradation or PCR inhibitors in clinical specimens can lead to false negative results and this must be monitored by the inclusion of internal positive controls.

2.6 Summary

There is no doubt that the development of PCR has revolutionised the diagnosis of infectious diseases in routine microbiology laboratories. However, many first generation conventional PCR assays are being replaced by real-time PCR platforms which offer increased sensitivity, specificity and rapidity, reduced contamination and greater potential for automation [52]. Despite their limitations, conventional PCR assays will continue to have a role in smaller, regional diagnostic laboratories that perhaps cannot afford the higher cost of reagents and instrumentation needed for real-time PCR assays.

3 Real-Time PCR

3.1 Introduction

Since its introduction, real-time PCR has made a major contribution to the diagnosis of infectious disease in most clinical laboratories. Its success has been due to the development of novel chemistries and instrumentation enabling detection of PCR products on a real-time basis within a closed system over a range of cycles. Also, real-time PCR instruments cycle the temperatures more rapidly than conventional thermocyclers, and, because of the increased sensitivity of the fluorescent detection system, offer a much broader dynamic range compared to conventional PCR. The inbuilt detection system offers product confirmation and quantification, with an electronic result output which lends itself to high through-put electronic reporting.

3.2 Real-Time PCR Technology

Generally, real-time PCR chemistries consist of fluorescent probes which release a fluorescent signal that increases in direct proportion to the amount of PCR product produced in the reaction [41, 42]. The higher the starting copy number of the nucleic acid target, the sooner a significant increase in fluorescence is observed.

A number of different fluorescent chemistries are now widely used, including non-specific DNA intercalating dyes, dye-primer systems and target-specific oligoprobes. Each system has its own unique characteristics, but the strategy for each is similar; they must link a change in fluorescence to amplification of DNA.

The non-specific DNA intercalating dyes exhibit little or no fluorescence when in solution, but emit a strong fluorescent signal upon binding to double-stranded DNA. Oligoprobes depend on Fluorescence Resonance Energy Transfer (FRET) to generate the fluorescence signal via the coupling of a fluorogenic donor molecule and a quencher or reporter molecule to the same or different oligonucleotide substrates.

Non-specific dyes such as SYBR® green [65], YO-PRO 1 [31], SYTO9 [51] and more recently BOXTO (TATAA Biocenter, Sweden) are relatively inexpensive and do not require additional oligoprobe design. Also they are not affected by mutations in target sequence which may impair the binding of specific probes thereby influencing the final result [87].

SYBR® green is the most widely used chemistry, and provides the simplest and most economical format for detecting and quantifying PCR products.

It is present in the reaction mix at the start, and binds to the minor groove of double stranded DNA, emitting 1,000-fold greater fluorescence than when it is free in solution. Thus, as a amplification product accumulates, fluorescence increases. The advantages of SYBR® green are that it is inexpensive, easy to use, and quite sensitive. The disadvantage is that it will bind to any double-stranded DNA in the reaction, including non-specific reaction products such as primer-dimer, which may give false-positive results or an overestimation of the target concentration. To help assess specificity, the dissociation curve of the amplified product can be analysed to determine the melting point. If two or more melting peaks are evident, it suggests that more than one amplified sequence was obtained, and the amplification was not specific for a single DNA target. For single PCR product reactions with well designed primers, SYBR® green can work extremely well, with spurious non-specific background only showing up in very late cycles. Therefore, in practice, the non-specific reporters are most suitable for highly optimised PCR assays.

There are many different dye-primer based signaling systems for real-time PCR, ranging from simple light upon extension (LUX) primers to the more complex scorpion primers [67]. The template specificity of the dye-primer system is the same as for the intercalating DNA dyes except for the scorpion primer, where the signal generated by the primer is dependent on a complementary match with sequence located within the PCR amplification product.

The use of oligoprobes to detect amplification product adds a further level of specificity to the reaction by immediate confirmation of the target sequence. There are a variety of oligoprobe-based assays in use today, including (i) hydrolysis (TaqMan®) probes [26], (ii) minor groove binding (MGB) probes [1], (iii) molecular beacons [49], (iv) hybridisation probes [91], (v) scorpion primer/probes, (vi) locked nucleic acid (LNA) probes [43], and (vii) peptide nucleic acid (PNA) light-up probes, and combination thereof. Each of these has the capacity to use multiple reporter dyes with multiple quenchers for efficient FRET pairs.

Hydrolysis or TaqMan® probes (also called 5-nuclease probes because the 5-exonuclease activity of DNA polymerase cleaves the probe) were among the first to be used in real-time PCR and are arguably the most widely used fluorescent probe format. They are sequence-specific oligonucleotides that carry a fluorescent dye at the 5 base, and a quenching dye on the 3 base, and are designed to anneal to a complementary sequence on the amplification product. Whilst the probe is intact, the quencher and reporter are in close proximity, and the quencher absorbs the signal from the reporter through FRET. During amplification the 5-nuclease activity of the DNA polymerase hydrolyses the probe [27], separating the fluorescent reporter dye and the quencher, allowing the reporter’s energy to be released as a fluorescent signal. The level of fluorescence increases in each cycle proportional to the rate of probe cleavage, and is indicative of a positive reaction. Examples of common quencher fluorophores include TAMRA, DABCYL, and BHQ, whereas many reporter dyes are available (e.g., FAM, VIC, NED, etc.). Hydrolysis probes have greater specificity because only sequence-specific amplification is measured.

Minor groove binding (MGB) probes are a modification of the hydrolysis oligoprobe chemistry [1]. This system uses a reporter dye at the 5 terminus and a non-fluorescent quencher (NFQ) at the 3 end. In addition, the 3 end also carries a MGB molecule which further stabilises the oligoprobe-target duplex by folding into the minor groove of the double stranded DNA [1]. Unhybridised, the MGB probe assumes a random coil configuration which results in quenching of the fluorescent signal. On specific hybridization with the target, the molecule becomes linear before being cleaved by the DNA polymerase resulting in the emission of fluorescence. The advantage of MGB probes is that they may be very short (12–17 nt), and are ideal for targets with limited consensus sequence.

Like TaqMan probes, molecular beacons also contain a fluorescent dye at the 5 end and a quencher molecule at the 3 end and use FRET to detect a fluorescent signal. However, molecular beacons remain intact during the amplification reaction, and rehybridise to the target sequence for signal measurement during every cycle of the PCR. When free in the reaction mix, molecular beacons assume a stem-loop configuration, with the fluorescent and quencher molecules in close proximity, thereby preventing the probe from fluorescing. When hybridised to a target, the fluorescent dye and quencher are separated, quenching through FRET does not occur, and a fluorescent signal is released upon excitation with an appropriate light source. Molecular beacons, like TaqMan probes, can be used for multiplex assays by using spectrally separated fluorescent and quencher molecules on separate probes, one each for the target under investigation.

With scorpion probes, sequence-specific priming and PCR product detection is achieved using a single oligonucleotide. Like beacons, the molecule contains a fluorophore at the 5 end and a quencher at the 3 end. In the unhybridised state it maintains a stem-loop configuration bringing the two dyes in close proximity, and the fluorescence emitted by the fluorophore is absorbed by the quencher. The 3 portion of the stem also contains sequence that is complementary to the extension product of the primer. This sequence is linked to the 5 end of a specific primer via a non-amplifiable monomer. After extension of the scorpion primer, the specific probe sequence binds to its complement within the extended amplification product thus opening up the hairpin loop. This separates the fluorescent reporter and quencher molecules emitting a signal of the appropriate wavelength

Hybridisation probes (or HybProbes) are commonly used with the LightCycler instrument (Roche Diagnostics, Switzerland), and consist of two oligoprobes. One, the donor probe, is labeled with fluorescent dye at the 3 end and the second, the acceptor probe, is labeled at the 5 end with a reporter dye which absorbs resonance energy from the donor probe. Fluorescence by the acceptor probe will only occur through FRET when both the donor probe and the acceptor probe have annealed to the amplification product in close proximity to each other. Increasing fluorescence is a measure of amplification product formation. Unlike TaqMan® probes the process is non-destructive and reversible.

An added advantage of this system is the ability to perform melting curve analysis to confirm the identity of the amplification product. Melting curve analysis provides an extra element of specificity to the PCR, because sequence variation in probe target sites will result in a shift of melting temperature. This may also act as an important quality control feature to confirm the correct identity of the amplification product, and provides a simple and elegant method to genotype mutations, including single base mutations [7, 39].

3.3 Real-Time PCR in the Diagnostic Laboratory

Review of the current literature shows that real-time PCR has been widely applied in clinical laboratories for the detection of bacterial, viral and fungal pathogens. As a result, DNA and RNA are now widely accepted as important and universal diagnostic targets. The biggest impact of real-time PCR has been in the rapid diagnosis of life-threatening diseases such as meningococcal disease, SARS, avian influenza (H5N1) and herpes simplex encephalitis [86]. However, real-time PCR diagnostics generally, also offers significant improvements over more traditional methods for the detection of a wide range of organisms, particularly organisms that may not be isolated by culture or those that require extended isolation processes. A comparison with conventional culture-based methods has convincingly demonstrated greater sensitivity of the molecular assays.

3.4 Instrumentation for Real-Time PCR

Real-time PCR technology requires appropriate instrumentation such as a thermal cycler with optics for the collection of fluorescence excitation and emission and a computer with appropriate data acquisition and analysis software. Because fluorescent chemistries require both a specific input of energy for excitation and a detection of a particular emission wavelength, the instrumentation must be able to do both simultaneously and at the desired wavelengths. Three basic ways are used to supply the excitation energy for fluorophores: by (i) lamp, (ii) light-emitting diode (LED), or (iii) laser. Instruments that utilise lamps (tungsten halogen or quartz tungsten halogen) generally also include filters to restrict the emitted light to specific excitation wavelengths. An example of instruments using lamps include the Applied Biosystem ABI 7500 and Stratagene Mx4000; LED systems include the Roche LightCycler and Qiagen Rotor-Gene, whilst the ABI Prism 7700 and 7900HT use a laser for excitation. Detectors to collect emission energies include charge-coupled device cameras, photomultiplier tubes, or other types of photodetectors. Only the desired wavelength(s) are collected by use of narrow wavelength filters or channels. Usually, multiple discrete wavelengths can be measured at once, which allows for multiplexing of assays measuring different targets.

Another important feature for real-time PCR is the ability of the thermocycler to change temperatures rapidly, and to maintain a consistent temperature among all sample wells. Differences in temperature across the block could lead to different PCR amplification efficiencies and varying results. Consistent heating is achieved by using a heating block (Peltier based or resistive), heated air, or a combination of the two. Solid heating blocks generally change temperature much more slowly than heated air, resulting in longer cycling times.

Data generated during the real-time PCR requires appropriate data-acquisition and analysis software. Generally, PCR data are presented by graphical output of assay results including amplification and dissociation (melting point) curves. The amplification curve gives data regarding the kinetics of amplification of the target sequence, whereas the dissociation curve reveals the characteristics of the final amplified product.

In Australia, real-time PCR instruments are available from several manufacturers and differ in configuration and sample capacity as well as overall sensitivity, and may have platform-specific differences in how the software processes data. The price of real-time PCR instruments varies widely, currently about $65,000–$150,000, but is well within purchasing capacity of diagnostic facilities that have the need for high throughput quantitative or qualitative analysis.

3.5 Considerations in the Use of Real-Time PCR

The generation of accurate quantitative PCR results is dependent on the strict control and standardisation of many assay parameters. The main sources of error include the nucleic acid extraction process and the presence of PCR inhibitors in clinical specimens. The best way to control for these factors is by using a robust internal control strategy such as previously suggested, in which a quantification standard of known copy number is incorporated into each sample, and carried through sample extraction, reverse transcription amplification, hybridisation and detection [67].

Also, sequence variation in the primer or probe binding sites may lead to false-negative results or otherwise can have a significant impact on quantitative PCR results. As few as two mismatches at the 3 end of a single primer may affect the efficiency of the PCR reaction. This is particularly important in quantitative PCR and can result in underestimating the true microbial load by up to 3 logs [87]. The overall impact of this on the final result is dependent on the reaction conditions used, the composition of the primers, the annealing temperature and master mix composition.

Similarly, sequence variation may impact upon fluorescent signal. In a real-time PCR assay for respiratory syncytial virus (RSV) using a MGB probe, notable differences were observed in assay results due to sequence divergence between the MGB probe and the target sequence. The amplitude of some linear amplification curves was greatly reduced and in some cases false-negative results were obtained. The authors of this study concluded that MGB probes should be used with caution if mutation in the target sequence is common such as occurs in RNA viruses [88]. Therefore, careful and extensive optimisation of the real-time PCR conditions must be performed to obtain meaningful results, and to ensure that the efficiency of the reaction does not vary due to sequence variation or minor differences between samples.

3.6 Summary

Real-time PCR methodologies for the detection of a wide range of organisms are firmly entrenched in many clinical laboratories and offer major advantages of improved sensitivity and rapidity over traditional methods. However, as the use of real-time PCR assays evolved, there has been an obvious need for standardised reagents and quality assurance programmes in order to obtain reproducible and clinically significant results. Also, we need to take heed of the inherent limitations associated with the targeted nature of PCR. These are often difficult to control, particularly in virology, where the heterogeneous nature of the viral genome may lead to significant difficulties in assay design and may impact on assay performance. However, an awareness of these issues will ultimately result in a better understanding of this new technology, and enable us to fully explore its potential as a modern diagnostic tool.

4 Quantitative PCR

4.1 Introduction

Quantitative PCR (Q-PCR) is routinely performed by many microbiology laboratories with real-time PCR (RT-PCR) capabilities. RT-PCR requires instrumentation capable of PCR product (amplicon) detection in a cycle-by-cycle, or in a “real-time” fashion using fluorescent chemistry.

PCR assays are either qualitative or quantitative. Both assays detect nucleic acids targets (DNA or RNA) in a given sample; the target is amplified by the PCR process and detected by measurement of a fluorescent signal at every PCR cycle until completion of the assay. However, qualitative RT-PCR assays are used for simple detection of nucleic acids targets (DNA or RNA) in a sample; if a specific target is amplified and detected then the sample is considered PCR-positive for the target, or if the target has not been amplified then the sample is considered PCR-negative. Hence, qualitative PCR assays are designed to give either positive or negative results only. These assays are commonly in diagnostics for detection or exclusion of pathogens. In contrast, Q-PCR assays are designed to determine the concentration or copies of a detected target present in the original sample. These assays are commonly used for monitoring response to therapy. This is achieved in Q-PCR using nucleic acid standards and exploiting the predictable kinetics of the PCR reaction.

4.2 PCR Kinetics and Q-PCR

The kinetics of a PCR reaction plotted graphically has a distinctive shape, with three distinct phases; (a) the background, or early phase; (b) the exponential growth phase; (c) the plateau phase (Fig. 2.1). In the early phase the oligonucleotides hybridise to the target sequence and the PCR reaction has commenced, but at an undetectable level. In the exponential growth phase the target is amplified in an exponential fashion and the fluorescence levels become detectable above the background. In the plateau phase the reactants have been consumed or have deteriorated and the PCR reaction is no longer operating efficiently. The exponential phase is the important stage of Q-PCR. During this phase accurate quantification of the target DNA is possible.

Fig. 2.1
figure 1

Quantitative PCR standards and standard curve

To clarify, the exponential phase can be mathematically explained using the equation:

$$\textrm{NC} = {\rm{No}} \times (E + 1)^C$$

where C is the number of cycles, E is amplification efficiency (also expressed as \(\textrm{\%\,E} = E \times 100\%\)), NC is the number of amplicon molecules, and No is the initial number of target molecules. In simple terms, each cycle produces an increase in NC in proportion to amplification efficiency. Hence, 100% efficiency produces a doubling in the number of amplicon molecules. Additionally, the quantity of NC present after any specific number of cycles is dependent on No. Rearrangement of the equation provides the mathematical relationship upon which Q-PCR is based, however in reality the amplification efficiency is less than 100%. The PCR efficiency can be calculated from the slope of the curve. Hence, once a standard curve has been established, unknown samples can be amplified by the same process and compared to the standards to determine the target concentration in the original sample. These standards can be either be external or internal to the assay.

4.3 Absolute Quantification with External Standards

The most common and easiest way to produce quantitative results is to create a standard curve using external standards. External standards are usually 10-fold serial dilutions of the target. The standards are usually plasmids but can be whole organisms or nucleic acid. These standards are amplified and detected using the same assay conditions as for samples. The PCR cycle at which product fluorescence intensity rises or is higher than the background is called the crossing point (Cp). At this point the exponential PCR phase begins. Theoretically, the rate of amplification is maximal with PCR products doubling every cycle; hence quantification is performed at this stage. Following completion of the assay the Cp for each standard is determined and standard curve is prepared using instrument software. A typical standard curve is a plot of the cycle number Cp at the (Y-axis) versus the log of initial template amount (X-axis), derived from an assay based on serial dilutions. The standard curve is a least square fit line drawn through all dilutions (Fig. 2.1; bottom). Using several standards which cover the expected clinical range a line of best fit can be determined using the Cp values. It is then a case of determining the unknown samples Cp and reading the resulting concentration. Absolute quantification with external standards are useful but they do not control for changes in amplification efficiencies which may occur and vary from sample to sample. Hence, if PCR inhibition occurs due to inhibitors in the patient sample, this degree of inhibition will not be represented in the standard curve, therefore a lower quantitative value may be produced. However, for samples demonstrating negligible inhibition, absolute quantification with external standards is relatively reliable and easy to set up.

4.4 Absolute Quantification Using Internal Controls

Absolute quantification using internal controls is an advanced Q-PCR method that controls for nucleic acid extraction and PCR inhibition. This approach involves the use of exogenous control DNA. In this approach, a homologous DNA fragment of known concentration is engineered which has the same primer binding regions as the target, but with different probe-binding sequence and fluorescent marker. This DNA fragment can be added to an individual sample prior to nucleic acid extraction and PCR. The fragment is then co-extracted and amplified along with the target. Two probes with different fluorescent labels are used; one to detect the target and the other to detect the internal control. The advantage of this system is the amplification efficiency of the reaction will be the same for the target as well as the control. So if there is an inhibitor in the sample it will affect the target and the control equally and the result should be more accurate. The disadvantage of this system is the need to generate a second PCR fragment that may compete with the detection of the target if not carefully designed.

4.5 Detection Formats

Real-time PCR instruments currently available use fluorescent dyes to generate a signal. The most commonly used dyes are the following:

4.5.1 Intercalating Dyes

SYBR Green: see Section 2.3.2 for description of action. Disadvantages of using SYBR Green for Q-PCR are that it binds to all double stranded DNA (from cellular DNA in the sample) or non specific DNA products generated during the reaction; and high concentrations of SYBR Green can cause reduced amplification efficiencies due to toxicity to Taq Polymerase. The advantage is SYBR Green is simple to use and does not require the use of probes or multi-colour instrumentation.

4.5.2 Sequence-Specific Fluorescent Dyes

  1. (a)

    Fluorescent Resonance Energy Transfer (FRET): see Section 2.3.2 for description of action. The use of FRET probes has an advantage over hydrolysis probes as they can be used to determine the Melting Temperature (Tm) of a particular PCR fragment. This Tm is the temperature at which 50% of the probe has dissociated from the target. The Tm varies according to the base sequence of the target. For example, a target rich in G/C bases will have a higher Tm than a target with high A/T content. This is due to the stronger hydrogen bonds that join C and G residues. This makes FRET probes particularly useful for rapid genotyping. Using Tm, the degree of specificity can measured and a specific target can be confirmed. Hence FRET is a sequence-dependant method of PCR product detection. However, FRET probes can be difficult to design as the head-to-tail design may span 40 or more conserved nucleotide bases.

  2. (b)

    5–3 Hydrolysis Probes: these assays are typically called TaqMan assays, see Section 1.2.3 for description of action. Unlike FRET probes, hydrolysis probes cannot be used for Tm analysis, however they are easier to design as a span of only 18–25 nucleotide bases of the target is required.

4.6 Clinical Use

The clinical use of Q-PCR is broad, however Q-PCR is most commonly used to determine the concentration or load of a pathogen in a clinical sample. For example, quantitative PCR can be used for determining the baseline level of a particular infecting virus e.g. hepatitis C virus (HCV) in plasma. If patients are on therapy (e.g. Interferon), the HCV viral load can be measured to track the response to treatment. A falling titre indicates successful treatment whilst an increasing or static level may indicate treatment failure.

In other cases, organisms may be present in the host at low levels and may not cause significant disease. Qualitative PCR may be positive in these situations. However, by determining the organism load by quantitative PCR, a break-point level can be determined which indicates the organism is now at a titre which may cause disease. For example, many people have had previous exposure to cytomegalovirus (CMV). The virus may be present in the latent stage and qualitatively positive in white cells. Detection of active CMV viraemia, or reactivation, is important in immunocompromised patients. A rising CMV viral load can indicate reactivation which may lead to CMV disease. Response to therapy can also be measured.

4.7 Conclusion

There are other types and subtypes of molecular-based quantitative methods, however Q-PCR using either SYBR Green, FRET, or hydolysis probes are the more commonly used approaches. Most commercial Q-PCR methods use internal control standards and are automated, thereby increasing reliability and reproducibility and at the same time controlling for PCR inhibition.

5 Multiplex PCR in Diagnostic Microbiology

AbstractMultiplex PCR assays have many attractive features in terms of economy and offer a practical means to provide molecular-based assays for the increasing range of known infectious agents. Apart from the rapidly expanding list of newly discovered viruses, PCRs for bacterial virulence factors and antibiotic resistance genes are likely to be in increasing demand. There are currently several methods which provide multiplex PCR capability and it is uncertain which will become the preferred technology

5.1 Introduction

Conventional PCR assays generally detect a single target nucleic acid sequence using a set of oligonucleotide primers with or without a probe to confirm the identity of the amplified PCR product DNA. Multiplex PCR assays detect multiple targets within a single amplification reaction using corresponding multiple primer pairs. The use of multiplex assays in diagnostic microbiology has many obvious attractions. In theory, the multiplexing of assays should enable many infectious agent targets to be detected with economies in labour, consumables and with the requirement for only a small volume of sample. In fact, since the sample extraction is usually the most expensive component of a PCR assay, the reduction in the sample extract volume required to test for multiple targets probably provides the most attractive feature of multiplex assays. Once a PCR is at peak efficiency the only conceivable method to increase its sensitivity is to increase the effective amount of sample introduced into the assay. For multiple targets then, this often means performing multiple extractions which is expensive of time and consumables and may be limited in some cases by the volume of sample available.

Another expensive PCR reagent is the reverse transcriptase (RT) enzyme needed for RNA targets.This is several times more expensive than DNA polymerase and many respiratory and gastrointestinal disease agents are RNA viruses, the list of which has grown considerably in recent times. To test for a comprehensive range of these agents is impractical for most laboratories if single assays are to be used. Another facet of potential multiplex use is in providing coverage for sequence variations in target agents.

As sequencing capability has become widespread in laboratories, and numerous entries made into public sequence databases, it is becoming evident that sequence variation in target agents is a major factor limiting the long-term reliability of molecular-based assays. This is especially true for many RNA viruses which evolve very rapidly. Multiplex assays then, have the potential to mitigate this problem by targeting multiple gene sequences of an agent, thus reducing the chances of random variations compromising the performance of an assay. In fact, it is probably true that in future this principle will need to be incorporated into many PCR assays for infectious agents to ensure their reliability. These comments apply particularly to real-time assays. Although having considerable advantages compared with PCRs having gel-based amplicon detection, real-time assays are more susceptible to sequence variation failure especially if the variation occurs in the probe target region.

5.2 Multiplex Technologies

There are several alternative multiplex PCR technologies now available. The oldest of these uses primers for different targets which produce amplicons of different sizes distinguishable by gel-electrophoresis [61]. This method can be used as a single PCR or nested for extra sensitivity. It has the advantage that it is less susceptible to sequence variation than real-time methods but is more labour intensive and relies on subjective interpretation of gel electrophoresis results. A recent development is the use of capillary electrophoresis for this purpose and there are several commercial alternative equipment options available which afford very accurate measurement of amplicon size. Real-time technologies are now in general use and can be multiplexed using fluorphore-labelled probes which are detected at different wavelengths. However, in spite of real-time thermocyclers having up to six wavelength options most workers find that it is difficult to devise multiplexes which perform satisfactorily with more than three different probe labels in an assay. It is of course possible to perform a multiplex assay with large numbers of primer sets and to then pass the product to real-time triplex assays. This introduces risk of laboratory cross-contamination, a known hazard of nested PCRs, but this can be reduced by limiting the initial multiplex PCR to 10–20 cycles, a principle used by Stanley and Szewezuk in their Tandem assays [69]. Luminex bead technology has the potential to detect up to 100 targets in an assay but practical considerations probably limit this to about 30. This method consists of performing a generic PCR or one with multiple primer sets in which the primers carry a biotin label. Products from the multiplex are hybridised with Luminex beads carrying probes specific for the multiple targets. The beads can be uniquely identified as the 100 bead types available each contain different proportions of fluorescent material and each probe is thus identified [64]. A fluorescent-labelled avidin is applied and washing steps remove background fluorescent material before the beads are processed in a flow-cytometer. A novel multiplex approach has been suggested recently which involves the use of Mass Tag PCR [19]. Numerous targets can be detected with DNA microarray technologies but the sensitivity of these is limited by the amount of target material available unless generic amplification is first performed [4]. Array technology is also expensive but will become useful for the detection of virulence factors and antibiotic resistance genes in bacterial cultures where target quantity is unlimited and when these genes sequences have been more fully identified.

5.3 Practical Applications

An important consideration in the design of a multiplex PCR assay is whether only one of the agents included is usually present or whether there is a likelihood of multiple infections in the samples to be tested. In the latter case, the question of competition for reagents within the PCR, between the multiple agents becomes an issue. In general, reagent concentration needs to be increased in multiplex assays to allow for full efficiency. It is important to determine whether a large copy number of one agent will inhibit the reactivity of a small copy number of another [81]. For example in a cerebrospinal fluid (CSF) sample one would only expect infection with one flavivirus causing encephalitis but in genital samples multiple HPV virus infections are commonplace and may be mixtures of high and low risk types for the causation of cervical carcinoma in widely varying copy numbers. In our laboratory we are endeavouring to put in place some of the principles described in this article in the in-house PCR assays that we use. Our initial simple use of multiplexing is in providing an inbuilt control of sample extraction efficiency and removal of RT and/or PCR inhibitors, in a semi-quantitative fashion. Standardized amounts of equine herpesvirus (EHV) and/or MS-2 RNA coliphage are incorporated in extraction lysis buffers and assays for these agents are carried out as multiplexed systems with different probe labels than for target agents. This ensures that unsatisfactory extraction or incomplete removal of inhibitors is detected and also checks that thermocycling has been performed properly. We have designed an assay with three probe labels for HSV-1, HSV-2 and varicella virus which has three probe label types and allows for duplicate tests for the viruses and incudes an EHV control. Another assay modifies the principle of tandem PCR suggested by Stanley and Szewezuk [69]. In this assay for Chamydia trachomatis and Neisseria gonorrhoeae, an initial PCR multiplex containing six primer sets is cycled for 20 cycles only in a conventional thermocycler. Diluted products are then transferred to real-time triplex assays for processing in a real-time thermocycler. This multiplex reduces competition for reagents if multiple infection is present and includes three targets for C. trachomatis, two for N. gonorrhoeae and an EHV control. Although in their early stages we have a number of other multiplex assays in development and are confident that this approach, using whichever multiplex technology becomes dominant, will provide the only realistic approach to give wide coverage of infectious agents.

6 Molecular Subtyping

6.1 Introduction

The fundamental taxonomic unit for bacteria remains the species. In higher organisms, a species is effectively defined as a collection of organisms without significant barriers to gene flow. This rule cannot be consistently applied to bacteria because of the considerable variation in propensities for horizontal gene exchange (HGT) among different bacterial taxa. Therefore, bacterial species are currently defined using a pragmatic mix of phylogenetic, gene flow and phenotypic data, with the weights given to these criteria, and the level of diversity within a species varying from case to case. It is however universal that a bacterial species is not composed of a single clone. In other words, bacterial species contain diversity and can be divided into finer taxonomic units. These are generally named and recognised using less formal conventions than are used for the standard Latin scientific nomenclature. An exception to this is the category of subspecies, which is used in conventional formal taxonomy. However, in the great majority of cases, a taxonomic unit that is a subset of a species is known simply as a “type”, or a derivative of this word such as “serotype”, genotype or “subtype”. In the last 30 years, a plethora of bacterial typing technologies and methods have emerged, and bacterial typing has become commonplace in many laboratories.

Many of the more recent methods incorporate the PCR. Deciding what typing method to use is not straightforward. A major reason for this is that all methods have a different performance. A question frequently asked is “Does typing method (X) resolve all strains?”. This question is not meaningful, in part because there is no accepted definition of the term “strain”. Greater than 0.1% of bacterial cell divisions yield a point mutation somewhere in the genome, and other classes of genetic change occur at a much higher frequency than this. Accordingly, there is no such thing as a bacterial population of any significant size in which all the cells are genetically identical, and “clones” or lineages cannot be regarded as immutable entities. In essence, what typing methods do is indicate whether or not the most recent common ancestor of two or more isolates post-dates a particular time in the past. The higher the resolution of the typing method, the more recent that time point.

In general, the absolute value of that time point is not clearly understood. Of course, increasing sophistication of typing methods and associated data analysis and exchange facilities allows degrees of genetic relatedness to be determined, and the relationship between the typing method output and the actual population structure of the relevant species to be elucidated. Detailed descriptions of the complexities of understanding and comparing the performances of typing methods have been published by van Belkum et al. and Faria et al. [22, 80].

The field of bacterial typing is an enormous one, so this essay is of necessity a brief outline. Likewise the references provided are a very small fraction of the relevant literature. They are designed to assist the reader to understand the state of the art and also to find colleagues with expertise in these methods. Therefore, there is some bias to recent publications from Australian researchers.

6.2 Why Bacterial Typing Is Performed

Several bacterial typing methods provide sufficient resolution that only isolates with extremely recent common ancestries will have the same type. Such methods can be used to test hypotheses of direct epidemiological linkage. This is relevant in investigating e.g. outbreaks of food-borne disease, break-downs of infection control in health care facilities, the long distance dissemination of dangerous clones, and biological attacks. While such typing methods often do not directly indicate clinical properties, the association between types and such properties may have been previously determined. Therefore, the clinically relevant properties of an isolate such as the resistance and virulence phenotype can sometimes be inferred from the type by inductive reasoning. Typing methods that provide results that indicate the evolutionary history of the isolate can be used to determine and monitor the diversities and population structures of bacterial species. This is frequently performed as a largely academic exercise. However, the implications can be far from trivial. Mapping the patterns of dissemination onto phylogeny can provide profound insight into the history of a bacterial species. The practical applications of this include investigation of the impacts of new or changed vaccination protocols on circulating serotypes, and the time scales and the likely vectors of important dissemination events of problematic clones of bacterial pathogens.

6.3 Typing Methods Based on Phenotype

In general, early typing methods involved examination of the phenotype of the cell, while more recent typing methods involve genetic analysis. Probably the most widely used phenotypic approach is serotyping, which determines the reactivity of the cells with a standard bank of antisera. This is an indirect means of probing variability of cell surface molecules, principally carbohydrates and proteins. Serotyping is still seen as a valuable approach, and this is due in large part to simplicity of execution. Also, in species such as Streptococcus pneumoniae, for which there are vaccines, close correlation between serotype and vaccine susceptibility can be very useful when determining the impact of vaccination programs [18, 40].

Another widely used phenotype-based method with a long history is phage typing. This has been particularly applied to Salmonella [2] and involves the determination of the susceptibilities of the cells to a standard panel of phages. It is now known that the principal basis for variation in phage susceptibility is variation in the prophage content in the genome. In general, lysogens of a phage are immune to infection by the same phage [36]. Phage typing requires some specialist skills, and has generally been confined to reference labs, in particular with regard to Salmonella. Phage typing is now largely superseded by more direct genetic analyses.

Direct chemical analysis of bacteria has long shown promise as a bacterial identification and typing approach. There has been a recent resurgence of this with the development of dedicated and robust mass spectrometry devices. This approach is primarily used for identification to the species level, but it does have some ability to resolve within species [66].

6.4 Typing Methods Based on Electrophoresis of DNA Fragments Derived from the Whole Genome

Numerous nucleic acid-based bacterial typing methods have been developed in the last two decades. Many of these are based on the conversion of the genome into a series of DNA fragments of varying size that can be resolved by electrophoresis so as to generate a genetic fingerprint. A distinctive feature of this family of methods is that there is nothing inherent in the techniques that allows inference as to just what genetic change causes a change in the fingerprint. In other words, the fragments in the gel are anonymous, although they can usually be identified if the complete genome sequence is known.

A genome can be converted into fragments by either cleavage or synthesis. Probably the most conceptually straightforward method is cleavage of the genome with a very infrequently cutting restriction enzyme, followed by resolution of the resulting very large fragments by variable field agarose electrophoresis. This technique makes use of a hexagonal electrode that allows rapid alteration of the angle of the voltage gradient. For reasons that are not fully understood, this results in the resolution of DNA fragments much larger than can be resolved by conventional agarose electrophoresis. The early versions of this method made use of a field that periodically reversed its direction. This was termed pulsed-field electrophoresis (PFGE). A variant of this approach that makes use of a hexagonal electrode and consequent multiple directions for the electric field has become commonplace. This is correctly known as “(contour) clamped homogenous electric fields” (CHEF) gel electrophoresis, although the PFGE abbreviation remains ubiquitous [57]. Variation in PFGE fingerprints likely arise from large genome rearrangements, and gene gain or loss events that change the size of restriction fragments. Point mutations that create or destroy restriction sites may also occur, but these are likely to be very rare given the very small percentage of the genome that the restriction sites comprise. PFGE provides very high resolving power, and there is a convention that even if there is up to three band differences between the fingerprints of different isolates, very recent epidemiological linkage between these isolates cannot be ruled out [76]. It should be noted that this convention is not fully accepted, and numerous variants have been proposed. There is sufficient detail in the data that relative evolutionary distances can be estimated and phylogenetic trees constructed [92]. PFGE remains an extensively used method. For instance, it is arguably the gold standard for monitoring Staphylococcus aureus dissemination in Australia [55].

Methods that use DNA synthesis to convert a genome into fragments for electrophoresis are essentially all based on the PCR. The simplest approach is termed random amplification of polymorphic DNA (RAPD) analysis [58]. This makes use of one or two (generally one) 10mer PCR primers of random sequence. PCR products will form whenever by chance two primers molecules anneal to the template in opposite orientations with their 3ends facing towards each other, and at a distance that is not too large for PCR product synthesis. The resulting fragments are quite small and can easily be resolved by conventional agarose electrophoresis. A persistent problem with RAPD analysis is poor reproducibility, particularly between laboratories. This is probably due to the weak annealing between the very short 10mer primers and their targets which may not even be 100% complementary with the primer, and consequent extreme sensitivity to reaction conditions.

A variation of the RAPD approach that is designed to have better reproducibility is based upon sequences that are repeated throughout bacterial genomes. These are known variously as “REP”, “ERIC”, and “BOX” sequences [5, 12, 74, 89], depending upon the sequence and the bacterial species, and there is considerable strain to strain variation regarding just where these repeats are. PCR-based typing methods have been developed that are very similar to RAPD methods, but make use of primers that target these repeat sequences. These methods are less prone to reproducibility problems, probably because of stronger annealing between the primers and their genuine targets.

Amplified fragment length polymorphism (AFLP) analysis combines both genome cleavage, and the synthesis of PCR products [23, 38]. The principal behind this method is the cleavage of the genome with two restriction enzymes so as to generate a large number of fragments, and then the amplification by PCR of a subset of these fragments. The number of fragments in the subset is optimised so as to provide a good compromise being large enough to provide high discriminatory power and small enough to for clear band separation. The fragments are amplified by first ligating adaptors to the sticky ends, and then performing PCR using primers that target the adaptor sequences. The selectivity of the amplification is obtained by using primers with 3 extensions that will therefore only anneal to adaptors that by chance are adjacent to a particular base(s) in the amplified fragments, and also by labelling only one primer, so that fragments derived purely from the other primer are not visualised. In general, denaturing thin polyacrylamide slab gels (i.e. DNA sequencing gels) or capillary electrophoresis are used to separate the amplified fragments. AFLP analysis is highly reproducible and discriminatory. Like PFGE, there is sufficient detail in the fingerprints to allow estimation of relative evolutionary distances [46].

6.5 Typing Methods Based upon Known Polymorphic Genes or Sites

The second broad class of DNA-based bacterial typing methods is based on the interrogation of genes that are known to be variable, or specific variable sites within such genes. In general, with these methods it is possible to directly infer the genetic changes that give rise to different types, although there is some variation from method to method with the precision that this can be achieved.

The earliest described examples involve restriction fragment length polymorphism (RFLP) analysis of highly variable loci. This can be done either by cleaving the entire genome with a restriction enzyme and then visualising the fragments of interest by Southern hybridisation, or by amplifying the locus of interest by PCR, and then carrying out the restriction digestion after that. Most methods involve analysis of only one locus. Ribotyping is an automatable Southern hybridization–based method that targets the genes that encode ribosomal RNA, as well as spacer regions [9, 13]. A well known example of a PCR-based method is flaA RFLP analysis in Campylobacter jejuni and Campylobacter coli [53]. In recent years RFLP-based methods have largely been superseded. This is because they are inherently multi-step to perform, and not all sequence changes will change the location of the restriction sites, thus limiting resolving power. In addition, with these methods, direct inference of the precise genetic changes that lead to changes in banding patterns is not possible.

Probably the most conceptually straightforward approach to bacterial typing is sequencing one or more genes or gene fragments. There are several examples of single locus-based sequence typing methods [6, 20, 25, 45, 47]. Many of these target genes that are hypervariable, often because of immune selection directed against the surface-located gene product. These methods are often very informative and efficient. However, they are limited in that horizontal gene transfer (HGT) can mean that a single gene is an inadequate marker for the evolutionary position of an entire genome. Therefore, single locus sequence typing can be very effective for testing hypotheses of epidemiological linkage but less effective for studies of population structure. The response to this was the development just over a decade ago of multilocus sequence typing (MLST) [44]. This involves sequencing of standardized fragments of approximately 450 bp from multiple (almost always seven) housekeeping genes. Housekeeping genes encode cytoplasmic enzymes involved in core metabolism or other fundamental cellular processes. They therefore do not evolve by positive selection, so sequence differences accumulate in a clock-like fashion and do not reflect differing selective pressures in different strains or lineages. A very important aspect of MLST is on-line databases of variants at the loci (alleles), and the alleles that are found together in isolates (sequence types). These web sites also contain information regarding the isolates, and a variety of analytical tools. They are a really powerful resource for studying bacterial population structures. There are now MLST schemes for essentially all the major bacterial pathogens, and the associated web sites can be reached via http://www.mlst.net/ and http://pubmlst.org/. MLST is still primarily a research tool, as carrying out seven sequence determinations remains a time and cost challenge for routine high throughput applications. Also, because slowly evolving housekeeping genes are targeted, the resolution can be insufficient for testing hypotheses of very recent epidemiological linkage.

An interesting development in very recent years is the appearance of methods in which the MLST loci are analysed by mass spectrometry rather than sequence analysis [21, 28]. In these cases, it is base composition rather than the sequence that is obtained. By definition this has less information, but this does not seem to significantly degrade the performance of the MLST.

Another approach derived from MLST has been the use of bioinformatic methods to derive resolution optimised sets of single nucleotide polymorphisms (SNPs) from MLST databases, or similar compendia of known sequence variation [29, 60, 72]. The descriptor of resolution can be the power to identify particular sequence types (STs) or groups of STs, or alternatively the power to discriminate all STs from all STs. In the latter case, this is assessed by calculating the Simpsons Index of Diversity which in this context is the probability that any two STs selected at random will be discriminated if the SNPs are interrogated. Published SNP-based bacterial typing methods in general make use of allele specific PCR, or competitive hybridization of Taqman probes, in real-time PCR devices [8, 60]. This typing format is attractive because real-time PCR is inherently single-step closed-tube. However, there are many methods for interrogating SNPs and in principal any could be applied to SNP-based bacterial typing.

The housekeeping genes used in MLST evolve slowly, and this limits MLST resolution. A direct approach to circumventing this is to use loci that evolve more rapidly. Microsatellite loci in multicellular organisms are composed of, or contain, sequence repeats. The repeating units are very short; 1–∼6 nucleotides. They evolve rapidly due to slipped strand mis-pairing during DNA replication. It is now known that repeat-containing loci are not hard to find in bacteria, and, as expected, they also evolve rapidly and so have high informative power for typing. In bacteria, they are known as variable number tandem repeat (VNTR) loci. There have been many recent publications concerning VNTR-based bacterial typing methods, and this is a small sample [30, 46, 56, 62, 68, 79, 82]. The repeating units in bacterial VNTR loci are more variable in length than microsatellites in higher organisms, but they appear to be usefully polymorphic whatever the repeat length. Variation in loci with long repeat lengths is probably due to homologous recombination rather than slipped strand mis-pairing. VNTR based tying methods can be single locus or multi-locus. Recent trends are towards multilocus methods, and these are known generically as “multilocus VNTR analysis” (MLVA). VNTR loci can be interrogated by DNA sequencing, or, as is commonly done with microsatellite analysis, by length determination. Length-based MLVA methods are inherently efficient because the loci can be amplified in a multiplex PCR reaction using primers labelled with different fluorophores, and the products then resolved by capillary electrophoresis. However, with complex repeat loci, in which not all the repeating units are identical, sequence differences can be missed when only the length is determined.

MLVA methods are very promising, but have one inherent limitation in that VNTR loci can evolve in a reversible fashion i.e. repeat units can be lost then gained again or vice versa. Thus the same allelic states can result from different evolutionary histories i.e. homoplasy can occur. This can on occasion confound MLVA analysis. A strategy to circumvent this is to use a combination of markers that include both VNTRs and more slowly evolving markers that are less prone to homoplasy, such as SNPs that define phylogentic lineages. This strategy has been termed phylogenetic hierarchical analyses using nucleic acids (PHRANA) [33, 34]. The rationale is that the slowly evolving markers divide the species into lineages in which the average evolutionary distance between strains is sufficiently low that the probability of homoplasy in the VNTR markers is greatly reduced. In reality, there are innumerable possible combinations of rapidly and slowly evolving markers, and they will all provide a particular compromise between resolving power, lack of homoplasy, and ease and cost of execution. One emerging example is the use of sequences of multiple rapidly evolving surface antigen encoding genes to type Neisseria gonhorrhoeae [85].

A completely different approach to bacterial typing is to make use of variations in gene content, rather than gene sequence. One of the major unexpected findings to emerge from the explosion of bacterial genome sequencing in the last decade has been the extent of variation in gene content between different isolates in the same species. The term “pan-genome” has been coined to describe the gene content of a species [77], and a pan-genome can contain many more genes than any cell. It therefore follows that typing on the basis of gene presence or absence can have considerable resolving power. This approach is sometimes termed binary typing, because the informative genes exhibit binary variation i.e. they exist in two states: present or not present. The fact that many determinants of virulence and resistance to antimicrobials are carried on mobile elements that are inherently likely to exhibit binary variation increases the potential informative power of this approach. Micro-array technology has been used to study genome-wide binary variation. Comparative genome hybridization (CGH) arrays are used for this task [75, 93]. They are generally equipped with probes deduced from all the open reading frames in multiple genome sequences within a species. Analysis by CGH array is too laborious to be regarded as a bacterial typing method, although smaller arrays with probes for e.g. putative virulence factors are increasingly being applied to the analysis of large numbers of isolates [50].

Binary typing is also used to analyse individual hypervariable loci. This is usually done by conventional multiplex PCR, or real-time PCR. The typing of the SCCmec mobile element that defines MRSA is a good example of this [16]. SCCmec exhibits particularly complex binary variation. In general it is typed using markers or combinations of markers that are diagnostic for known types. However, an alternative approach to marker selection involving maximisation of Simpsons Index of Diversity has been reported [71]. This approach has also been applied to the derivation of resolution-optimised sets of binary markers from CGH data [59].

Essentially any method that detects the presence of specific genes can be applied to binary gene-based bacterial typing. As mentioned above, large scale analysis of binary variation is carried out using micro-arrays. Many extant methods for smaller scale analyses make extensive use of multiplex PCR and conventional agarose gel electrophoresis. Real-time PCR is increasingly being used. Interestingly, a nylon membrane-based reverse line blot technique that is essentially a precursor of array technology has been shown to remain effective and competitive as a medium-high throughput method [35].

The ultimate genetic fingerprint is a complete genome sequence. The emergence of next generation sequencing devices in the last 5 years has resulted in a large reduction in the cost of genome sequencing [3]. However the cost of this is still much higher than the cost of other typing methods, in terms of consumables, instrumentation and time, so this approach is not currently viable as a routine high-throughput typing method.

6.6 Projections of Future Developments

The explosive increase in comparative genome information will continue, and this will lead to increasingly detailed and sophisticated understanding of bacterial population structures. This in turn will make it increasingly difficult to justify using methods that do not clearly place an isolate within a population structure defined by comparative genomics. This probably means that typing methods that interrogate known polymorphic sites or loci will remain in the ascendancy, while those that generate anonymous banding patterns may lose favour. There will probably continue to be a range of techniques. It is easy to envisage a cut-down version of next-generation sequencing that derives sequences from random genome fragments. It could be that in an environment where there are thousands of complete genomes to serve as on-line comparators, sequencing random genes will be just as effective for rapidly obtaining a high resolution fingerprint as sequencing targeted genes. This approach could perhaps be applied directly to mixed cultures or clinical samples.

One method not discussed above is high resolution melting (HRM) analysis. This is rapidly emerging as a very robust and effective approach to resolving sequencing variants [90]. The basis of its most common embodiment is the accurate monitoring of the reduction in fluorescence, as DNA stained with a double strand specific fluorescent dye melts in response to a controlled temperature increase. The attraction of HRM is that the amplification plus the HRM analysis constitutes a homogeneous, single step and closed tube procedure. Its potential for miniaturization is excellent because of a lack of requirement for moving parts and/or microfluidics. HRM analysis can be added to pre-existing PCR-based primary diagnosis or binary typing methods so as to provide additional information at essentially no cost. The consumables costs of stand-alone HRM analyse are typically less than $1.00, because the requirements are only unlabelled primers and a generic PCR master mix. HRM analysis has been shown to be able to discriminate multiple sequence variants [14, 24, 37, 48, 73], and to yield data that can be compared between laboratories [78], which raises the possibility of on-line comparison of HRM curves, in a manner analogous to sequence comparison using Genbank or MLST sites.

In summary, a reasonable prediction is that in the near future, bacterial typing will be performed by either some variant of next-generation sequencing, or by HRM analysis of selected markers, depending on the amount of information required. Analysis of mixed/clinical samples will become commonplace, and typing will be combined with diagnosis. Typing data will be interpreted with reference to massive amounts of comparative genomic data, and this will greatly facilitate the monitoring of dissemination at all scales of time and space, and the inference of clinically relevant properties.