Abstract
This chapter describes a workflow for measuring a barcode’s accuracy when identifying species. First, assemble a database of specimens with their marker sequences and their species binomials. The species binomials provide a “taxonomic gold standard” for species identification and should be as accurate as possible, to avoid penalizing correct species assignment. Second, select a computer algorithm for assigning species to barcode sequences. Only one algorithm (BLAST+P) has improved notably on the simple strategy of assigning specimens to the species of the database sequence(s) nearest under p-distance. Global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or with multiple sequence alignment algorithms) align entire barcode sequences, using all available information, so they sometimes produce more accurate species identifications than local sequence alignments (e.g., with BLAST), particularly when BLAST produces barcode alignments of small subsequences within the sequences. Finally, consensus has settled on “the probability of correct identification” (PCI) as the appropriate measurement of species identification accuracy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. The chapter discusses some variant PCIs, their calculation and the estimation of their statistical sampling errors. It also discusses good practice in incorporating PCR failure and species with singleton representatives into data summaries. For software relevant to this chapter, see http://tinyurl.com/spouge-barcode.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albu M, Nikbakht H, Hajibabaei M, Hickey DA (2011) The DNA barcode linker. Mol Ecol Resour 11:84–88
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Austerlitz F (2007) Comparing phylogenetic and statistical classification methods for DNA barcoding. In: The second international barcode of life conference, Taipei
Austerlitz F, David O, Schaeffer B, Bleakley K, Olteanu M, Leblois R, Veuille M, Laredo C (2009) DNA barcode analysis: a comparison of phylogenetic and statistical classification methods. BMC Bioinform 10:1
Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E (2005) Defining operational taxonomic units using DNA barcode data. Philos Trans R Soc B Biol Sci 360:1935–1943
CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106:12794–12797
Chase MW, Salamin N, Wilkinson M, Dunwell JM, Kesanakurthi RP, Haidar N, Savolainen V (2005) Land plants and DNA barcodes: short-term and long-term goals. Philos Trans R Soc Lond B Biol Sci 360:1889–1895
Chen J, Zhao JT, Erickson DL, Xia NH, Kress WJ (2015) Testing DNA barcodes in closely related species of Curcuma (Zingiberaceae) from Myanmar and China. Mol Ecol Resour 15:337–348
Collins RA (2012) Barcoding’s next top model: an evaluation of nucleotide substitution models for specimen identification. Methods Ecol Evol 3:457–465
Cowan RS, Chase MW, Kress JW, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55:611–616
DasGupa B, Konwar KM, Mandoiu II, Shvartsman AA (2005) DNA-BAR: distinguisher selection for DNA barcoding. Bioinformatics 21:3424–3426
Eddy SR (1995) Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 3:114–120
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform 5:113
Efron B, Stein C (1981) The Jackknife estimate of variance. Ann Stat 9:586–596
Erickson DL, Spouge JL, Resch A, Weight LA, Kress JW (2008) DNA barcoding in land plants: developing standards to quantify and maximize success. Taxon 13:1304–1316
Farris JS (1972) Estimating phylogenetic trees from distance matrices. Am Nat 106:645–668
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1988) Phylogenies from molecular sequences—inference and reliability. Annu Rev Genet 22:521–565
Ferguson JWH (2002) On the use of genetic divergence for identifying species. Biol J Linn Soc 75:509–516
Ferri G, Corradini B, Ferrari F, Santunione AL, Palazzoli F, Alu M (2015) Forensic botany II, DNA barcode for land plants: which markers after the international agreement? Forensic Sci Int Genet 15:131–136
Floyd R, Abebe E, Papert A, Blaxter M (2002) Molecular barcodes for soil nematode identification. Mol Ecol 11:839–850
Fregin S, Haase M, Olsson U, Alstrom P (2012) Pitfalls in comparisons of genetic distances: a case study of the avian family Acrocephalidae. Mol Phylogenet Evol 62:319–328
Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes distinguish species of tropical Lepidoptera. Proc Natl Acad Sci USA 103:968–971
Hebert PD, Cywinska A, Ball SL, deWaard JR (2003a) Biological identifications through DNA barcodes. Proc Biol Sci 270:313–321
Hebert PD, Ratnasingham S, deWaard JR (2003b) Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc Biol Sci 270(Suppl 1):S96–S99
Hogg ID, Hebert PDN (2004) Biological identification of springtails (Hexapoda: Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes. Can J Zool (Revue Canadienne De Zoologie) 82:749–754
Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, van der Bank M, Chase MW, Cowan RS, Erickson DL, Fazekas AJ, Graham SW, James KE, Kim K-J, Kress WJ, Schneider H, van AlphenStahl J, Barrett SCH, van den Berg C, Bogarin D, Burgessk KS, Cameron KM, Carine M, Chacón J, Clark A, Clarkson JJ, Conrad F, Devey DS, Ford CS, Hedderson TAJ, Hollingsworth ML, Husband BC, Kellya LJ, Kesanakurti PR, Kim JS, Kim Y-D, Lahaye R, Lee H-L, Long DG, Madriñán S, Maurin O, Meusnier I, Newmaster SG, Park C-W, Percy DM, Petersen G, Richardson JE, Salazar GA, Savolainene V, Seberg O, Wilkinson MJ, Yi D-K, Little DP (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106:12794–12797
Huang D, Meier R, Todd PA, Chou LM (2008) Slow mitochondrial COI sequence evolution at the base of the metazoan tree and its implications for DNA barcoding. J Mol Evol 66:167–174
Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066
Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci USA 105:2761–2762
Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102:8369–8374
Kuksa P, Pavlovic V (2009) Efficient alignment-free DNA barcode analytics. BMC Bioinform 10:S9
Kwong S, Srivathsan A, Vaidya G, Meier R (2012) Is the COI barcoding gene involved in speciation through intergenomic conflict? Mol Phylogenet Evol 62:1009–1012
Lambert DM, Baker A, Huynen L, Haddrath O, Hebert PDN, Millar CD (2005) Is a large-scale DNA-based inventory of ancient life possible? J Hered 96:279–284
Little DP (2011) DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. PLoS ONE 6(8):e20552
Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23:1–27
Liu C, Liang D, Gao T, Pang X, Song J, Yao H, Han J, Liu Z, Guan X, Jiang K, Li H, Chen S (2011) PTIGS-IdIt, a system for species identification by DNA sequences of the psbA-trnH intergenic spacer region. BMC Bioinform 12:1
Lorenz JG, Jackson WE, Beck JC, Hanner R (2005) The problems and promise of DNA barcodes for species diagnosis of primate biomaterials. Philos Trans R Soc B Biol Sci 360:1869–1877
Meier R, Shiyang K, Vaidya G, Ng PK (2006) DNA barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55:715–728
Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol 3:e422
Munch K, Boomsma W, Huelsenbeck JP, Willerslev E, Nielsen R (2008) Statistical assignment of DNA sequences using Bayesian phylogenetics. Syst Biol 57:750–757
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Pang X, Liu C, Shi L, Liu R, Liang D, Li H, Cherny SS, Chen S (2012) Utility of the trnH-psbA intergenic spacer region and its combinations as plant DNA barcodes: a meta-analysis. PLoS ONE 7:e48833
Ratnasingham S, Hebert PD (2007) BOLD: the barcode of life data system, Mol Ecol Notes 7:355–364. http://www.barcodinglife.org
Saitou N, Nei M (1987) The neighbor-joining method—a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Sarkar IN, Planet PJ, Desalle R (2008) CAOS software for use in character-based DNA barcoding. Mol Ecol Resour 8:1256–1259
Saunders GW (2005) Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos Trans R Soc B Biol Sci 360:1879–1888
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bolchacova E, Voigt K, Crous PW, Miller AN, Wingfield MJ, Aime MC, An KD, Bai FY, Barreto RW, Begerow D, Bergeron MJ, Blackwell M, Boekhout T, Bogale M, Boonyuen N, Burgaz AR, Buyck B, Cai L, Cai Q, Cardinali G, Chaverri P, Coppins BJ, Crespo A, Cubas PP, Cummings C, Damm U, de Beer ZW, de Hoog GS, Del-Prado R, Dentinger B, Dieguez-Uribeondo J, Divakar PK, Douglas B, Duenas M, Duong TA, Eberhardt U, Edwards JE, Elshahed MS, Fliegerova K, Furtado M, Garcia MA, Ge ZW, Griffith GW, Griffiths K, Groenewald JZ, Groenewald M, Grube M, Gryzenhout M, Guo LD, Hagen F, Hambleton S, Hamelin RC, Hansen K, Harrold P, Heller G, Herrera G, Hirayama, K, Hirooka Y, Ho HM, Hoffmann K, Hofstetter V, Hognabba F, Hollingsworth PM, Hong SB, Hosaka K, Houbraken J, Hughes K, Huhtinen S, Hyde KD, James T, Johnson EM, Johnson JE, Johnston PR, Jones EB, Kelly LJ, Kirk PM, Knapp DG, Koljalg U, Kovacs GM, Kurtzman CP, Landvik S, Leavitt SD, Liggenstoffer AS, Liimatainen K, Lombard L, Luangsa-Ard JJ, Lumbsch HT, Maganti H, Maharachchikumbura SS, Martin MP, May TW, McTaggart AR, Methven AS, Meyer W, Moncalvo JM, Mongkolsamrit S, Nagy LG, Nilsson RH, Niskanen T, Nyilasi I, Okada G, Okane I, Olariaga I, Otte J, Papp T, Park D, Petkovits T, Pino-Bodas R, Quaedvlieg W, Raja HA, Redecker D, Rintoul T, Ruibal C, Sarmiento-Ramirez JM, Schmitt I, Schussler A, Shearer C, Sotome K, Stefani FO, Stenroos S, Stielow B, Stockinger H, Suetrong S, Suh SO, Sung GH, Suzuki M, Tanaka K, Tedersoo L, Telleria MT, Tretter E, Untereiner WA, Urbina H, Vagvolgyi C, Vialle A, Vu TD, Walther G, Wang QM, Wang Y, Weir BS, Weiss M, White MM, Xu J, Yahr R, Yang, ZL, Yurkov A, Zamora JC, Zhang N, Zhuang WY, Schindel D, Fungal Barcoding C (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci USA 109:6241–6246
Smith MA, Fisher BL, Hebert PDN (2005) DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Philos Trans R Soc B Biol Sci 360:1825–1834
Smith MA, Woodley NE, Janzen DH, Hallwachs W, Hebert PDN (2006) DNA barcodes reveal cryptic host-specificity within the presumed polyphagous members of a genus of parasitoid flies (Diptera: Tachinidae). Proc Natl Acad Sci USA 103:3657–3662
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Smith TF, Waterman MS, Fitch WM (1981) Comparative biosequence metrics. J Mol Evol 18:38–46
Spouge JL, Mariño-Ramírez L (2012) The practical evaluation of DNA barcode efficacy. Methods Mol Biol 858:365–377
Suwannasai N, Martin MP, Phosri C, Sihanonth P, Whalley AJS, Spouge JL (2013) Fungi in Thailand: a case study of the efficacy of an ITS barcode for automatically identifying species within the Annulohypoxylon and Hypoxylon Genera. PLoS ONE 8:e54529
Weitschek E, Fiscon G, Felici G (2014) Supervised DNA barcodes species classification: analysis, comparisons and results. Biodata Min 7:1
Weitschek E, Van Velzen R, Felici G, Bertolazzi P (2013) BLOG 2.0: a software system for character-based species classification with DNA barcode sequences. What it does, how to use it. Mol Ecol Resour 13:1043–1046
Wilson EB (1927) Probable inference, the law of succession, and statistical inference. J Am Stat Assoc 22:209–212
Wouters MA, Husain A (2001) Changes in zinc ligation promote remodeling of the active site in the zinc hydrolase superfamily. J Mol Biol 314:1191–1207
Zalasiewicz J et al (2000) Are we now living in the Anthropocene? GSA Today 18:4–8
Acknowledgments
This research was supported in part by the Intramural Research Program of the NIH, NLM, NCBI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Spouge, J.L. (2016). Measurement of a Barcode’s Accuracy in Identifying Species. In: Trivedi, S., Ansari, A., Ghosh, S., Rehman, H. (eds) DNA Barcoding in Marine Perspectives. Springer, Cham. https://doi.org/10.1007/978-3-319-41840-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-41840-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41838-4
Online ISBN: 978-3-319-41840-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)