, Volume 249, Issue 5, pp 1617–1625 | Cite as

Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood

  • Tuo He
  • Lichao Jiao
  • Alex C. Wiedenhoeft
  • Yafang YinEmail author
Original Article


Main conclusion

Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood.

DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8–100%) than distance- (15.1–97.4%) and tree-based methods (11.1–87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.


DNA barcoding Forensic wood identification Identification accuracy Machine learning approaches (MLAs) Pterocarpus SMO classifier 



Barcoding with logic


Convention on International Trade in Endangered Species of Wild Fauna and Flora


Machine learning approaches


National Center for Biotechnology Information


Neighbor Joining


Sequential Minimal Optimization



This work was financially supported by National Natural Science Foundation of China (Grant No. 31600451), the National High-level Talent for Special Support Program of China (Grant No. W02020331), and the China Scholarship Council (Grant No. 2017-3109). We express our gratitude to Professor Xiaomei Jiang, Dr. Min Yu, Dr. Bo Liu and Dr. Prabu Ravindran for their assistance and suggestions on this study. We thank Sarah Friedrich for her help with the figure works.

Supplementary material

425_2019_3116_MOESM1_ESM.pdf (42 kb)
Fig. S1 The confusion matrix generated by the BP-neural network for all the single barcodes and their combinations (PDF 42 kb)
425_2019_3116_MOESM2_ESM.pdf (65 kb)
Fig. S2 The criterions for assessing the SMO classifier based on the four single barcodes and their combinations (PDF 65 kb)
425_2019_3116_MOESM3_ESM.pdf (323 kb)
Fig. S3 The decision trees constructed by the diagnostic position of DNA sequences based on the four barcodes and their combinations (PDF 323 kb)
425_2019_3116_MOESM4_ESM.pdf (20 kb)
Fig. S4 Identification success rates of four barcodes and their combinations based on “best match” and “best close match” functions of TaxonDNA (PDF 19 kb)
425_2019_3116_MOESM5_ESM.pdf (367 kb)
Fig. S5 Phylogenetic trees generated from the four barcodes and their combinations based on neighbor-joining analysis (PDF 367 kb)
425_2019_3116_MOESM6_ESM.xlsx (19 kb)
Table S1 DNA sequences generated from our previous study (Jiao et al. 2018) and downloaded from the NCBI GenBank (XLSX 18 kb)
425_2019_3116_MOESM7_ESM.xlsx (16 kb)
Table S2 The formulae generated by BLOG for discrimination of six Pterocarpus timber species based on the four barcodes and their combinations (XLSX 16 kb)


  1. Bertolazzi P, Felici G, Weitschek E (2009) Learning to classify species with barcodes. BMC Bioinform 10(14):S7Google Scholar
  2. Brancalion PHS, Almeida DRA, Vidal E, Molin PG, Sontag VE, Souza SEXF, Schulze M (2018) Fake legal logging in the Brazilian Amazon. Sci Adv 4(8):aat1192Google Scholar
  3. CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106(31):12794–12797Google Scholar
  4. Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X, Luo K, Li Y, Li X, Jia X, Lin Y, Leon C (2010) Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5:e8613PubMedPubMedCentralGoogle Scholar
  5. Collins RA, Cruickshank RH (2012) The seven deadly sins of DNA barcoding. Mol Ecol Resour 13(6):969–975PubMedGoogle Scholar
  6. Collins RA, Boykin LM, Cruickshank RH, Armstrong KF (2012) Barcoding’s next top model: an evaluation of nucleotide substitution models for specimen identification. Methods Ecol Evol 3(3):457–465Google Scholar
  7. Damm S, Schierwater B, Hadrys H (2010) An integrative approach to species discovery in odonates: from character-based DNA barcoding to ecology. Mol Ecol 19(18):3881–3893PubMedGoogle Scholar
  8. Delgado-Serrano L, Restrepo S, Bustos JR, Zambrano MM, Anzola JM (2016) Mycofier: a new machine learning-based classifier for fungal ITS sequences. BMC Res Notes 9(1):402PubMedPubMedCentralGoogle Scholar
  9. Dormontt EE, Boner M, Braun B, Breulmann G, Degen B, Espinoza E, Gardner S, Guillery P, Hermanson JC, Koch G, Lee SL, Kanashiro M, Rimbawanto A, Thomas D, Wiedenhoelft AC, Yin Y, Zahnen J, Lowe AJ (2015) Forensic timber identification: it’s time to integrate disciplines to combat illegal logging. Biol Conserv 191:790–798Google Scholar
  10. Ekrema T, Willassen E, Stura E (2007) A comprehensive DNA sequence library is essential for identification with DNA barcodes. Mol Phylogenet Evol 43(2):530–542Google Scholar
  11. Gasson P (2011) How precise can wood identification be? Wood anatomy’s role in support of the legal timber trade, especially CITES. IAWA J 32(2):137–154Google Scholar
  12. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99Google Scholar
  13. Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitefield JB, Hebert PDN (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Mol Ecol Resour 6(4):959–964Google Scholar
  14. Han J, Zhu Y, Chen X, Liao B, Yao H, Song J, Chen S, Meng F (2013) The short ITS2 sequence serves as an efficient taxonomic sequence tag in comparison with the full-length ITS. BioMed Res Intl 2013:741476Google Scholar
  15. Han Y, Duan D, Ma X, Jia Y, Liu Z, Zhao G, Li Z (2016) Efficient identification of the forest tree species in Aceraceae using DNA barcodes. Front Plant Sci 7:1707PubMedPubMedCentralGoogle Scholar
  16. Hartvig I, Czako M, Kjaer ED, Nielsen LR, Theilade I (2015) The use of DNA barcoding in identification and conservation of rosewood (Dalbergia spp.). PLoS One 10:e0138231Google Scholar
  17. Hassold S, Lowry PP II, Bauert MR, Razafintsalama A, Ramamonjisoa L, Widmer A (2016) DNA barcoding of Malagasy rosewoods: towards a molecular identification of CITES-listed Dalbergia species. PLoS One 11:e0157881PubMedPubMedCentralGoogle Scholar
  18. He T, Jiao L, Yu M, Guo J, Jiang X, Yin Y (2018) DNA barcoding authentication for the wood of eight endangered Dalbergia timber species using machine learning approaches. Holzforschung.
  19. Hebert PDN, Cywinska A, Ball SL, Dewaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc B Biol Sci 270(1512):313–321Google Scholar
  20. Hendrich L, Morinière J, Haszprunar G, Hebert PDN, Hausman A, Köhler F, Balke M (2015) A comprehensive DNA barcode database for Central European beetles with a focus on Germany: adding more than 3500 identified species to BOLD. Mol Ecol Resour 15(4):795–818PubMedGoogle Scholar
  21. IUCN Red List of Threatened Species (2017) Accessed 5 Feb 2018
  22. Jiao L, Yin Y, Cheng Y, Jiang X (2014) DNA barcoding for identification of the endangered species Aquilaria sinensis: comparison of data from heated or aged samples. Holzforschung 68(4):487–494Google Scholar
  23. Jiao L, Liu X, Jiang X, Yin Y (2015) Extraction and amplification of DNA from aged and archaeological Populus euphratica wood for species identification. Holzforschung 69(8):925–931Google Scholar
  24. Jiao L, Yu M, Wiedenhoeft AC, He T, Li J, Liu B, Jiang X, Yin Y (2018) DNA barcode authentication and library development for the wood of six commercial Pterocarpus species: the critical role of xylarium specimens. Sci Rep 8(1):1945Google Scholar
  25. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260PubMedGoogle Scholar
  26. Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102(23):8369–8374PubMedGoogle Scholar
  27. Lewis SL, Edwards DP, Galbraith D (2015) Increasing human dominance of tropical forests. Science 349(6250):827–832PubMedGoogle Scholar
  28. Li J, Cui Y, Jiang J, Yu J, Niu L, Deng J, Shen F, Zhang L, Yue B, Li J (2017) Applying DNA barcoding to conservation practice: a case study of endangered birds and large mammals in China. BioL Conserv 26(3):653–668Google Scholar
  29. Libbrecht MW, Nobble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332PubMedPubMedCentralGoogle Scholar
  30. Little DP (2014) A DNA mini-barcode for land plants. Mol Ecol Resour 14(3):437–446PubMedGoogle Scholar
  31. Little DP, Stevenson DW (2007) A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 3(1):1–21Google Scholar
  32. Lowe AJ, Dormontt EE, Bowie MJ, Degen B, Gardner S, Thomas D, Clarke C, Rimbawanto A, Wiedenhoeft AC, Yin Y, Sasaki N (2016) Opportunities for improved transparency in the timber trade through scientific verification. Bioscience 66(11):990–998Google Scholar
  33. Lowenstein JH, Amato G, Kolokotronis SO (2009) The real maccoyii: identification tuna sushi with DNA barcodes-contrasting characteristic attributes and genetic distances. PLoS One 4:e7866PubMedPubMedCentralGoogle Scholar
  34. MacLeod N, Benfield M, Culverhouse P (2010) Time to automate identification. Nature 467(7312):154–155PubMedGoogle Scholar
  35. McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82(1):290–297Google Scholar
  36. Meier R, Shiyang K, Vaidya G, Peter KLN (2006) DNA Barcoding and taxonomy in Diptera: a tale of high intraspecific variability and low identification success. Syst Biol 55(5):715–728PubMedGoogle Scholar
  37. More RP, Mane RC, Purohit HJ (2016) MatK-QR classifier: a patterns based approach for plant species identification. BioData Min 9(1):39PubMedPubMedCentralGoogle Scholar
  38. Nalepa J, Kawulok M (2018) Selecting training sets for support vector machine: a review. Artif Intell Rev 6:1–44Google Scholar
  39. NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:7–19Google Scholar
  40. Ng KKS, Lee SL, Tnah LH, Nurul-Farhanah Z, Ng CH, Lee CT, Tani N, Diway B, Lai PS, Khoo E (2016) Forensic timber identification: a case study of a CITES listed species, Gonystylus bancanus (Thymelaeaceae). Forensic Sci Int Genet 23:197–209PubMedGoogle Scholar
  41. Pang X, Song J, Zhu Y, Xu H, Huang L, Chen S (2010) Appling plant DNA barcodes for Rosaceae species identification. Cladistics 27(2):165–170Google Scholar
  42. Parveen I, Singh HK, Malik S, Raghuvanshi S, Babbar SB (2017) Evaluating five different loci (rbcL, rpoB, rpoC1, matK, and ITS) for DNA barcoding of Indian orchids. Genome 60(8):665–671PubMedGoogle Scholar
  43. Patel N, Upadhyay S (2012) Study of various decision tree pruning methods with their empirical comparison in WEKA. Intl J Comput Appl 60(12):20–25Google Scholar
  44. Rach J, DeSalle R, Sarkar IN, Schierwater B, Hadrys H (2008) Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proc R Soc B 275(1632):237–247PubMedGoogle Scholar
  45. Robinson JE, Sinovas P (2018) Challenges of analyzing the global trade in CITES-listed wildlife. Conserv Biol 32(5):1203–1206PubMedGoogle Scholar
  46. Ross HA, Murugan S, Li WL (2008) Testing the reliability of genetic methods of species identification via simulation. Syst Biol 57(2):216–230PubMedGoogle Scholar
  47. Saatchi SS, Harris NL, Brown S, Lefsky M, Mitchard ETA, Salas W, Zutta BR, Buermann W, Lewis SL, Hagen S, Petrova S, White L, Silman M, Morel A (2011) Benchmark map of forest carbon stocks in tropical regions across three continents. Proc Natl Acad Sci USA 108(24):9899–9904PubMedGoogle Scholar
  48. Sarkar IN, Planet PL, Desalle R (2008) CAOS software for use in character-based DNA barcoding. Mol Ecol Resour 8(6):1256–1259PubMedGoogle Scholar
  49. Saslis-Lagoudakis CH, Klitgaard BB, Forest F, Francis L, Savolainen V, Williamson EM, Hawkins JA (2011) The use of phylogeny to interpret cross-cultural patterns in plant use and guide medicinal plant discovery: an example from Pterocarpus (Leguminosae). PLoS One 6:e22275PubMedPubMedCentralGoogle Scholar
  50. Srivathsan A, Meier R (2012) On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature. Cladistics 28(2):190–194Google Scholar
  51. Tanabe AS, Toju H (2013) Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi and land plants. PLoS One 8:e76910PubMedPubMedCentralGoogle Scholar
  52. Velzen RV, Weitschek E, Felici G, Bakker FT (2012) DNA barcoding of recently diverged species: relative performance of matching methods. PLoS One 7:e30490PubMedPubMedCentralGoogle Scholar
  53. Weitschek E, Velzen R, Felici G, Bertolazzi P (2013) BLOG 2.0: a software system for character-based species classification with DNA barcode sequences. What it does, how to use it? Mol Ecol Resour 13(6):1043–1046Google Scholar
  54. Weitschek E, Fiscon G, Felici G (2014) Supervised DNA barcodes species classification: analysis, comparisons and results. BioData Min 7:4PubMedPubMedCentralGoogle Scholar
  55. Wiedenhoeft AC (2014) Curating xylaria. In: Salick J, Konchor K, Nesbitt M (eds) Curating biocultural collections. A handbook. Kew Publishing, London, pp 127–134Google Scholar
  56. Xu C, Dong W, Shi S, Cheng T, Li C, Liu Y, Wu P, Wu H, Gao P, Zhou S (2015a) Accelerating plant DNA barcode reference library construction using herbarium specimens: improved experimental techniques. Mol Ecol Resour 15(6):1366–1374PubMedGoogle Scholar
  57. Xu S, Li D, Li J, Xiang X, Jin W, Huang W, Jin X, Huang L (2015b) Evaluation of the DNA barcodes in Dendrobium (Orchidaceae) from mainland Asia. PLoS One 10:e0115168PubMedPubMedCentralGoogle Scholar
  58. Yan L, Liu J, Möller M, Zhang L, Zhang X, Li D, Gao L (2015) DNA barcoding of Rhododendron (Ericaceae), the largest Chinese plant genus in biodiversity hotspots of the Himalaya-Hengduan Mountains. Mol Ecol Resour 15(4):932–944PubMedGoogle Scholar
  59. Yao H, Song J, Chang L, Luo K, Han J, Li Y, Pang X, Xu H, Zhu Y, Xiao P, Chen S (2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS One 5:e13102PubMedPubMedCentralGoogle Scholar
  60. Yassin A, Markow TA, Narechania A, O’Grady PM, DeSallea R (2010) The genus Drosophila as a model for testing tree- and character-based methods of species identification using DNA barcoding. Mol Phylogent Evol 57(2):509–517Google Scholar
  61. Yu M, Jiao L, Guo J, Wiedenhoeft AC, He T, Jiang X, Yin Y (2017) DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species. Planta 246(6):1165–1176PubMedGoogle Scholar
  62. Yu N, Wei Y, Zhang X, Zhu N, Wang Y, Zhu Y, Zhang H, Li F, Yang L, Sun J, Sun A (2018) Barcode ITS2: a useful tool for identifying Trachelospermum jasminoides and a good monitor for medicine market. Sci Rep 7:5037Google Scholar
  63. Zeng C, Hollingsworth PM, Yang J, He Z, Zhang Z, Li D, Yang J (2018) Genome skimming herbarium specimens for DNA barcoding and phylogenomics. Plant Methods 14:43PubMedPubMedCentralGoogle Scholar
  64. Zhang AB, Sikes DS, Muster C, Li SQ (2008) Inferring species membership using DNA sequences with back-propagation neural network. Syst Biol 57(2):202–215PubMedGoogle Scholar
  65. Zhang A, Muster C, Liang H, Zhu C, Crozier R, Wan P, Feng J (2012) A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Mol Ecol 21(8):1848–1863PubMedGoogle Scholar
  66. Zhang AB, Hao MD, Yang CQ, Shi ZY (2017) BarcodingR: an integrated R package for species identification using DNA barcodes. Methods Ecol Evol 8(5):627–637Google Scholar
  67. Zou S, Li Q, Kong L, Yu H, Zheng X (2011) Comparing the usefulness of distance, mornophyly and character-based DNA barcoding methods in species identification: a case study of Neogastropoda. PLoS One 6:e26619PubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood IndustryChinese Academy of ForestryBeijingChina
  2. 2.Wood Collections (WOODPEDIA)Chinese Academy of ForestryBeijingChina
  3. 3.Forest Products Laboratory, Center for Wood Anatomy ResearchUSDA Forest ServiceMadisonUSA
  4. 4.Department of BotanyUniversity of WisconsinMadisonUSA
  5. 5.Department of Forestry and National ResourcesPurdue UniversityWest LafayetteUSA
  6. 6.Ciências Biológicas (Botânica)Univesidade Estadual PaulistaBotucatuBrazil

Personalised recommendations