DNA Barcodes pp 365-377 | Cite as

The Practical Evaluation of DNA Barcode Efficacy

  • John L. SpougeEmail author
  • Leonardo Mariño-Ramírez
Part of the Methods in Molecular Biology book series (MIMB, volume 858)


This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman–Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, “the probability of correct identification” (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.

Key words

Barcode efficacy in species identification Probability of correct identification DNA barcode 



This research was supported in part by the Intramural Research Program of the NIH, NLM, NCBI.


  1. 1.
    Hebert PD, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc Biol Sci 270:313–321PubMedCrossRefGoogle Scholar
  2. 2.
    Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850PubMedCrossRefGoogle Scholar
  3. 3.
    Hebert PD, Ratnasingham S, Dewaard JR (2003) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270:S96–S99PubMedCrossRefGoogle Scholar
  4. 4.
    Hajibabaei M, Janzen DM, Burns JM et al (2006) DNA barcodes distinguish species of tropical lepidoptera. Proc Natl Acad Sci U S A 103:968–971PubMedCrossRefGoogle Scholar
  5. 5.
    Hogg ID, Hebert PDN (2004) Biological identification of springtails (hexapoda: Collembola) from the canadian arctic, using mitochondrial DNA barcodes. Can J Zool 82:749–754CrossRefGoogle Scholar
  6. 6.
    Lorenz JG, Jackson WE, Beck JC, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc Lond B Biol Sci 360:1869–1877PubMedCrossRefGoogle Scholar
  7. 7.
    Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422PubMedCrossRefGoogle Scholar
  8. 8.
    Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos Trans R Soc Lond B Biol Sci 360:1879–1888PubMedCrossRefGoogle Scholar
  9. 9.
    Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philos Trans R Soc Lond B Biol Sci 360:1825–1834PubMedCrossRefGoogle Scholar
  10. 10.
    Smith MA, Woodley NE, Janzen DH et al (2006) DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (diptera: Tachinidae). Proc Natl Acad Sci U S A 103:3657–3662PubMedCrossRefGoogle Scholar
  11. 11.
    Chase MW, Salamin N, Wilkinson M et al (2005) Land plants and DNA barcodes: short-term and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895PubMedCrossRefGoogle Scholar
  12. 12.
    Cowan RS, Chase MW, Kress JW, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616CrossRefGoogle Scholar
  13. 13.
    Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci U S A 105:2761–2762PubMedCrossRefGoogle Scholar
  14. 14.
    Cbol Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci U S A 106:12794–12797CrossRefGoogle Scholar
  15. 15.
    Meier R, Shiyang K, Vaidya G, Ng PK (2006) DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728PubMedCrossRefGoogle Scholar
  16. 16.
    Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial coI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174PubMedCrossRefGoogle Scholar
  17. 17.
    Erickson DL, Spouge JL, Resch A et al (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 13:1304–1316Google Scholar
  18. 18.
    Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcl gene complements the non-coding trnh-psba spacer region. PLoS One 2:e508PubMedCrossRefGoogle Scholar
  19. 19.
    Austerlitz F (2007) Comparing phylogenetic and statistical classification methods for DNA barcoding. Paper presented at the second international barcode of life conference, Taipei, Taiwan, 2007Google Scholar
  20. 20.
    Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23:1–27CrossRefGoogle Scholar
  21. 21.
    Altschul SF, Madden TL, Schaffer AA et al (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  22. 22.
    Altschul S (1999) Hot papers – bioinformatics – gapped blast and psi-blast: a new generation of protein database search programs by s.F. Altschul, t.L. Madden, a.A. Schaffer, j.H. Zhang, z. Zhang, w. Miller, d.J. Lipman – comments. Scientist 13:15Google Scholar
  23. 23.
    Wouters MA, Husain A (2001) Changes in zinc ligation promote remodeling of the active site in the zinc hydrolase superfamily. J Mol Biol 314:1191–1207PubMedCrossRefGoogle Scholar
  24. 24.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453PubMedCrossRefGoogle Scholar
  25. 25.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedCrossRefGoogle Scholar
  26. 26.
    Eddy SR (1995) Multiple alignment using hidden markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120PubMedGoogle Scholar
  27. 27.
    Edgar RC (2004) Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113PubMedCrossRefGoogle Scholar
  28. 28.
    Katoh K, Misawa K, Kuma K, Miyata T (2002) Mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res 30:3059–3066PubMedCrossRefGoogle Scholar
  29. 29.
    Matz MV, Nielsen R (2005) A likelihood ratio test for species membership based on DNA sequence data. Philos Trans R Soc Lond B Biol Sci 360:1969–1974PubMedCrossRefGoogle Scholar
  30. 30.
    Nielsen R, Matz M (2006) Statistical approaches for DNA barcoding. Syst Biol 55: 162–169PubMedCrossRefGoogle Scholar
  31. 31.
    Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood appr-oach. J Mol Evol 17:368–376PubMedCrossRefGoogle Scholar
  32. 32.
    Felsenstein J (1988) Phylogenies from molecular sequences – inference and reliability. Annu Rev Genet 22:521–565PubMedCrossRefGoogle Scholar
  33. 33.
    Kuksa P, Pavlovic V (2009) Efficient alignment-free DNA barcode analytics. BMC Bioinformatics 10:S9PubMedCrossRefGoogle Scholar
  34. 34.
    Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9:586–596CrossRefGoogle Scholar
  35. 35.
    Ferguson JWH (2002) On the use of genetic divergence for identifying species. Biol J Linnean Soc 75:509–516CrossRefGoogle Scholar
  36. 36.
    Blaxter M, Mann J, Chapman T et al (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc Lond B Biol Sci 360:1935–1943PubMedCrossRefGoogle Scholar
  37. 37.
    Lambert DM, Baker A, Huynen L et al (2005) Is a large-scale DNA-based inventory of ancient life possible? J Hered 96(3):279–284PubMedCrossRefGoogle Scholar
  38. 38.
    Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123Google Scholar
  39. 39.
    Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120PubMedCrossRefGoogle Scholar
  40. 40.
    Jin L, Nei M (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol 7:82–102PubMedGoogle Scholar
  41. 41.
    Tamura K (1994) Model selection in the estimation of the number of nucleotide substitutions. Mol Biol Evol 11:154–157PubMedGoogle Scholar
  42. 42.
    Waterman MS, Smith TF, Beyer WA (1976) Some biological sequence metrics. Adv Math 20:367–387CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUSA

Personalised recommendations