Phylogenomics and barcoding of Panax: toward the identification of ginseng species
The economic value of ginseng in the global medicinal plant trade is estimated to be in excess of US$2.1 billion. At the same time, the evolutionary placement of ginseng (Panax ginseng) and the complex evolutionary history of the genus is poorly understood despite several molecular phylogenetic studies. In this study, we use a full plastome phylogenomic framework to resolve relationships in Panax and to identify molecular markers for species discrimination.
We used high-throughput sequencing of MBD2-Fc fractionated Panax DNA to supplement publicly available plastid genomes to create a phylogeny based on fully assembled and annotated plastid genomes from 60 accessions of 8 species. The plastome phylogeny based on a 163 kbp matrix resolves the sister relationship of Panax ginseng with P. quinquefolius. The closely related species P. vietnamensis is supported as sister of P. japonicus. The plastome matrix also shows that the markers trnC-rps16, trnS-trnG, and trnE-trnM could be used for unambiguous molecular identification of all the represented species in the genus.
MBD2 depletion reduces the cost of plastome sequencing, which makes it a cost-effective alternative to Sanger sequencing based DNA barcoding for molecular identification. The plastome phylogeny provides a robust framework that can be used to study the evolution of morphological characters and biosynthesis pathways of ginsengosides for phylogenetic bioprospecting. Molecular identification of ginseng species is essential for authenticating ginseng in international trade and it provides an incentive for manufacturers to create authentic products with verified ingredients.
KeywordsBarcoding Genome Ginseng Marker mPTP NGS Panax Phylogenomics Plastid
Two-Dimensional Correlation Infrared Spectroscopy
Automatic Barcode Gap Discovery
Arbitrarily Primed Polymerase Chain Reaction
Average Support Value
Are We There Yet
Barcode of Life Database
Fourier Transformed-Infrared Spectroscopy
Generalized Mixed Yule Coalescent
International Barcode Of Life
Lowest Common ancestor
Multiplex Amplification Refractory Mutation System-PCR
Mutant Allele Specific Amplification
Methylated CpG-specific Binding protein (MBD2), fused to the Fc fragment of human IgG
Markov Chain Monte Carlo
Multi-rate Poisson Tree Processes
Nuclear Ribosomal DNA
Open Science Framework
Polymerase Chain Reaction
PCR-Restriction Fragment Length Polymorphism
Poisson Tree Processes
Random Amplified Polymorphic DNA
Refined Single Linkage
Ginseng has been used in traditional medicine in China for thousands of years , but it was not until early 18th century that long-term, intense harvest nearly extirpated Panax ginseng C.A.Mey. from the wild . Demand for ginseng roots in the 18th century also fuelled a subsequent boom in wild-harvesting American ginseng (P. quinquefolius L.) that decimated wild populations in North America . Today wild P. ginseng occurs in only a few localities in Russia and China, with the largest distribution in the southern part of the Sikhote-Alin mountain range . P. ginseng is Red-Listed in Russia, and roots and parts thereof from Russian populations are CITES Appendix II/NC listed . Many other Asian ginseng species are also endangered but preliminary data is only available for wild-harvesting and conservation of P. assamicus R.N.Banerjee (synonym of P. bipinnatifidus var. angustifolius (Burkill) J.Wen) , P. japonicus (T.Nees) C.A.Mey.  and P. pseudoginseng Wall. [8, 9].
Elucidating the evolutionary relationships among species in the genus is essential to understand evolution of this Holarctic disjunct genus, but also evolution of derived secondary metabolite pathways. In addition, a phylogenetic framework can be used to develop accurate molecular identification of Panax, and enable identification of ginseng material in trade, both crude drugs and derived products, which is essential for conservation efforts and protection of the remaining wild populations of P. ginseng and related Panax species, since all may be under the pressure of illegal harvesting and international trade . Furthermore, identification of Panax species and authentication of derived products is of great commercial importance as authentic ginseng is costly and the incentive for substitution is significant.
The phylogeny of Panax has been studied using several molecular markers, but lack of variation in the most commonly used markers highlight an important limitation of the method. The nuclear ribosomal ITS yields insufficient resolution for accurate species assignment  and even using multiple markers in combination, matK, trnD, psbK-psbI, rbcL and ycf1 have a limited accuracy in identification of Panax species [12, 13]. The mutation rate of the studied markers does not allow a fine scale resolution, and is insufficient for identification of all Panax species and cultivars. The question of what species are in trade remains a mystery. Aside from phylogenetic approaches, a multitude of molecular and chemical analysis approaches have been developed and applied, including Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) , PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) and Mutant Allele Specific Amplification (MASA) , Random Amplified Polymorphic DNA (RAPD) and High Performance Liquid Chromatography , Fourier Transformed-Infrared Spectroscopy (FT-IR) , Two-Dimensional Correlation Infrared Spectroscopy (2D-IR) , Multiplex Amplification Refractory Mutation System-PCR (MARMS) [18, 19], Microchip Electrophoresis Laser-Induced Fluorescence Detection , and microsatellite markers . Most methods have focused on either positive identification of P. ginseng, or distinguishing P. ginseng and P. quinquefolius L., but most have limited resolution in detecting infraspecific or interspecific substitution, especially with poorly known congeneric species.
Suitability of molecular markers is often measured in interspecific distance using distance methods to estimate the number of variable sites or pairwise distances between sequences. Most current methods are based on the Refined Single Linkage (RESL) algorithm implemented in Barcode of Life Database (BOLD)  or clustering on distance matrices (Crop , OBITools , UCLUST , and Vsearch ) and ideally set a threshold to distinguish between intraspecific and interspecific variation, sometimes referred to as the “barcoding gap” . Several programs and software packages determine and visualize barcoding gaps, including Automatic Barcode Gap Discovery (ABGD)  and Spider . These distance-based methods are fast and suitable for large datasets, but they are not always biologically meaningful, especially when the species groups have complex evolutionary histories, including incomplete lineage sorting, and hybridization [30, 31]. As an alternative, tree-based methods offer several advantages compared to distance based methods. First, these methods do not work with a specified threshold (% variation, no barcoding gap) and second, these accommodate evolutionary processes, making them particularly suitable for species delimitation and identification. Several studies have shown that these methods are also more sensitive and more powerful for accurate species discrimination . Recently proposed methods include the Generalized Mixed Yule Coalescent (GMYC) , Bayesian species identification using the multispecies coalescent (MSC) model , and Poisson Tree Processes (PTP, mPTP) [25, 32]. Despite constant methodological improvements, there is no silver bullet for species delimitation and concerns have been raised that species delimitation approaches are sensitive to the structure of the data tested . Species delimitation methods assess speciation and coalescent processes but also the data structure of the selected markers . From a marker development perspective, tree based methods provide an opportunity to increase the quality of the selection process of the barcoding markers. Here we use the mPTP approach  to test if speciation processes are supported by the barcoding markers and accordingly choose the best markers for delimitation of Panax species. mPTP method has the advantage of being computationally efficient, while at the same time accommodating better to population-specific and sampling characteristics than PTP and GYMC .
Evolution and phylogenetics of Panax
Previous phylogenetic studies of the Araliaceae family have identified four monophyletic groups (the Asian Palmate group, the Polyscias-Pseudopanax group, the Aralia-Panax group, and the greater Raukaua group) [36, 37]. However deep nodes are not well-supported to date [36, 37], and a broad sampling within Aralioideae is necessary to obtain an accurate placement of the Aralia-Panax group. Monophyly of the genus Panax (Araliaceae) is well supported by morphological synapomorphies, such as palmately compound leaves, a whorled leaf arrangement, a single terminal inflorescence, valvate petals in floral buds, and a bi- or tricarpellate ovary, as well as by several molecular phylogenies [12, 38]. A number of species have emerged from the complex of subspecies of P. pseudoginseng in the 1970s, and taxonomic studies have resulted in the description of various new species [38, 39, 40]. Currently 13 species of ginseng are recognized with broad consensus [38, 41], but publication of new taxa at species, subspecies and variety level are common [42, 43].
Previous molecular phylogenies support P. stipuleatus H.T.Tsai & K.M.Feng and P. trifolius L. as the sister group of all other ginseng species. Nevertheless the placement of several other species still remains unclear (e.g., P. binnatifidus, P. ginseng, P. japonicus, P. quiquefolius, P. vietnamensis Ha & Grushv., P. wangianus S.C.Sun, P. zingiberensis C.Y.Wu & Feng). Species delimitation within the genus is problematic due to species of tetraploid origin (e.g., P. bipinnatifidus, P. ginseng, P. japonicus, and P. quinquefolius ), recent speciation events , high intraspecific morphological variation (e.g., P. pseudoginseng Wall.) and ancient genome duplication events [41, 45].
Phylogenetic studies have explored evolutionary relationships in Araliaceae with standard phylogenetic markers, such as the nuclear ribosomal ITS [11, 36, 38, 41, 44, 45, 46] and several plastid markers [11, 12, 13, 41]. More recently, an attempt with seven nuclear genes was tested with moderate results (PGN7, W8, W28, Z7, Z14, Z15, Z16) . The topologies obtained were conflicting and non-consistent with previous evolutionary inferences of the genus, which is likely a result of multiple copies of nuclear genes and ancient whole genome duplication events . Whole genome data have also been used to design microsatellites for species identification, but these have found limited application [21, 48, 49, 50, 51, 52]. Extensive population genetic studies have been done only on P. quinquefolius [53, 54, 55, 56, 57, 58, 59] and P. ginseng [60, 61] due to their major economic importance.
Developments in high throughput sequencing have provided new approaches for genome sequencing: increasing outputs and decreasing costs have made this a cost-effective alternative to Sanger-based amplicon sequencing [62, 63]. Full plastid genome sequencing, i.e. plastome sequencing, has been proposed as an augmented approach to DNA barcoding [64, 65], and is a straightforward method that recovers all standard barcodes plus the full plastome. The limited costs of shotgun sequencing and the availability of a number of Araliaceae reference plastomes facilitates the study of relationships in the family. Plastome phylogenies have helped disentangle evolutionary relationship in a number of plant clades , including Poales , magnoliids , Pinus , Amborella , Equisetum , and Camellia . Single-copy nuclear genes have corroborated the robustness of plastome phylogenies [72, 73, 74, 75], however plastome phylogenies reflect only maternal inheritance, and as such will not always be representative species trees. An advantage of plastome data for phylogenetic studies is the low mutation rate of plastid sequences, the abundance of plastid DNA in most material  and the low cost of generating whole plastid genomes with high throughput sequencing.
In total DNA, the proportion of plastid DNA typically constitutes only ~ 0.01–13% depending on the size of the nuclear genome, tissue and season [77, 78, 79]. Shot-gun sequencing studies might have relatively low efficacy in plastid genome recovery due to the small proportion of plastid DNA in the total DNA. Ginseng species have a large genome size of 5–10 Gb [80, 81], and one can expect a proportion of plastid DNA of 1–5% in the gDNA , which makes shotgun sequencing relatively ineffective in obtaining full plastome data. Several methods have been developed for enriching plastid content prior to sequencing (for a discussion see Du et al. . We apply a new plastid enrichment method to improve the shotgun sequencing efficacy, that utilizes the low methylation of the plastid genome compared to the nuclear genome . The method uses the methyl-CpG-binding domain (MBD2) to partition fragments of genomic DNA into a methylation-poor fraction (e.g. enriched for plastid) and a methylation-rich fraction (e.g. depleted in plastid) . This method has the advantage that it uses a small quantity of dry material (below 40 mg) and is suitable for non-model organisms.
This study has four main aims: (1) to construct a well-supported phylogeny of the genus Panax, while testing if the full plastome data yield sufficient variation to support and resolve phylogenetic relations in Panax, and specifically the position of the economically important P. ginseng; (2) to test if MBD2 can be used to fractionate Panax DNA into eukaryotic nuclear (methyl-CpG-rich) vs. organellar (methyl-CpG-poor) elements, and subsequently sequence the MBD2 depleted DNA to optimize plastome read yield; (3) to determine if the plastid genome can be used for molecular identification of traded species; and 4) to make a case for the need of a resolved plastome phylogeny to be used to design short markers for Panax species identification from processed ginseng products.
Summary information for the four assembled plastome genomes
Number of reads
P. vietnamensis (1)
P. vietnamensis (2)
Library preparation and sequencing
We extracted total DNA from two individuals of those sample collected in Vietnam, using a Qiagen DNeasy plant extraction kit with the provided protocol. The total DNA was quantified prior to library preparation to assess DNA quantity, fragmentation and fragment length distribution on a Fragment Analyzer (Advanced Analytical Technologies, Inc., Ankeny, USA) using the High Sensitivity genomic DNA Reagent Kit (50–40,000 bp) (Additional file 3: Figure S1). We selected one individual per extracted sample based on the yield and fragment size of the total DNA. The selected samples had average fragment sizes in excess of 10 kbp and a minimum DNA concentration of 4.77 ng/μl (Additional file 3: Figure S1).
We used a NEBNext Microbiome DNA Enrichment Kit (New England Biolabs, Ipswich, Massachusetts, USA) that uses IgG1 fused to the human methyl-CpG-binding domain (together “MBD2-Fc”) to pull down a methyl-CpG-enriched fraction from a bead-associated element, leaving a methyl-depleted fraction in the supernatant. About 400 ng template DNA extract was used per sample and the manufacturers recommendations were respected with the following exceptions. The non-methylated DNA fractions were purified using 0.9X AMpure XP beads (Beckman Coulter, Brea, CA, USA) and eluted in 40 μl 1X TE buffer. To capture the methylated DNA, we followed the manufacturer’s protocol. Quality control in terms of size, purity and molar concentration (nmol/l) of both the methylated and the non-methylated fractions were measured using a Fragment Analyzer (Advanced Analytical Technologies Inc., USA) with a DNF-488-33 HS dsDNA Reagent Kit. The DNA was subsequently sheared to ~ 400 bp fragments using a M220 Focused Ultrasonicator (Covaris Inc., Woburn, MA, USA) using microTUBES-50 (Covaris Inc.). We used the NEBNext Fast DNA Library Prep Set for Ion Torrent (NEB) for end repair and adapter ligation of the sheared DNA. The samples were indexed using the IonXpress Barcode Adapter kit (ThermoFischer, Waltham, MA, USA). For each of the four samples both fractions, methyl-CpG-enriched and methyl-CpG-depleted, were indexed and sequenced. After adapter ligation, the four methyl-CpG-enriched fractions were pooled in one library and the four methyl-CpG-depleted fractions were pooled in another library. The adapter-ligated libraries were size selected (450–540 bp) using a BluePippin (Sage Science, Beverly, MA, USA), and subsequently amplified using the NEBNext Fast DNA Library Prep Set for Ion Torrent kit using 12 PCR cycles. The amplified libraries were purified twice using 0.7X AMpure XP beads. The purified amplified libraries were loaded on the sequencing chips using an Ion Chef (LT) and sequenced on an Ion Torrent Personal Genome Machine (LT) using Ion 318 v2 chips (LT) and the Ion PGM Sequencing 400 kit (LT).
Bioinformatic analyses and assembly
Sequencing reads were demultiplexed into FASTQ files using Flexbar version 3.0.3. Trimmomatic version 0.36  was used for adapter trimming and quality filtering of reads using a sliding window of 15 bp and an average Phred threshold of 20. Low-end quality bases below a Phred score of 20 were removed, and only reads longer than 100 bp were retained. MITOBim version 1.7  was used for assembly of the single-end Ion Torrent reads using iterative mapping with in silico baiting using the following reference plastomes, P. vietnamensis (KP036470) and P. stipuleanatus (KX247147).
Inverted repeats and ambiguous portions of the assembly were resequenced using Sanger sequencing. Specific primers were designed and used for DNA amplification of interest regions. PCR was performed on a Mastercycler® Pro (Eppendorf, USA) in a 20 μl final volume containing 2.5 μM of each primer, 1 mM of each dNTP, 10X DreamTaq Buffer, 0.75 U DreamTaq DNA polymerase (ThermoFisher Scientific, USA) and deionized water. The PCR cycling conditions included a sample denaturation step at 94 °C for 2 min followed by 35 cycles of denaturation at 94 °C for 30 s, primer annealing at 50–55 °C for 30 s and primer extension at 72 °C for 1 min, followed by a final extension step at 72 °C for 5 min. PCR products were then purified using GeneJET PCR Purification Kit (ThermoFisher Scientific, USA). Sanger sequencing was performed on an ABI 3500 Genetic Analyzer system using BigDye Terminator v3.1 Cycle Sequencing Kit. Cycle sequencing was performed on a Veriti Thermal Cycler (Applied Biosystems, USA) using 3.2 μM of each primer, 200 ng purified PCR product, 5X BigDye Sequencing Buffer, 2.5X Ready Reaction Premix and deionized water in a 20 μl final volume. The thermocycling conditions included 1 min at 96 °C followed by 25 cycles of denaturation at 96 °C for 1 min, primer annealing at 50 °C for 5 s and primer extension at 60 °C for 4 min, followed by a holding step at 4 °C. Extension products were purified using ethanol/EDTA precipitation with 5 μl of EDTA 125 mM, 60 μl of absolute ethanol. Purified products were denatured at 95 °C for 5 min using 10 μl Hi-Di Formamide. DNA electrophoresis was performed in 80 cm × 50 μ capillary with POP-4 polymer (Applied Biosystems, USA).
In order to test the efficacy of the NEBNext Microbiome DNA Enrichment Kit the proportion of reads belonging to the plastome was estimated for both the methylated and the non-methylated fraction. The P. ginseng whole genome sequencing SRR19873 experiment was used to estimate the starting proportion of plastome reads, by mapping the reads against the plastid genome of P. ginseng (NC_006290) using Bowtie 2. Association of reads to their taxonomic identification and organelles, was made using a tailored database of Panax plastome data representing the same data as that downloaded from public repositories for the phylogenetic analyses. For the mitochondrial data, all angiosperm mitochondrion genomes available on NCBI were used, and for the microbiome all remaining reads were blasted against the full NCBI database. Taxonomic identifications were retrieved using the lowest common ancestor (LCP) algorithm in Megan version 5.11.3, with minimum read length of 150 bp and at least 10 reads for each taxon identified with an e-value of 1e-20 or less. The proportion of plastid DNA in the gDNA was estimated using Bowtie2 by mapping the proportion of reads belonging to the plastid genome for P. ginseng (following SRR experiment SRR1181600).
The plastid genomes were annotated using Geneious version 6.1, and annotations of exons and introns were manually checked by alignment with their respective genes in the same annotated species genome. Representative maps of the chloroplast genomes were created using OGDraw (Organellar Genome Draw, ).
The matrix for phylogenomic analyses consisted of complete aligned plastid genomes, and the global alignment was done using MAFFT version 7.3  with local re-alignment using MUSCLE version 3.8.31 , and manual adjustments where necessary. Aligned DNA sequences have been deposited in the Open Science Framework (OSF) directory (https://osf.io/ryuz6). The final matrix has a total length of 163,499 bp for a total of 61 individuals with no missing data. Single nucleotide polymorphisms (SNPs) were visualized using Circos version 0.69 . Relationships from the nucleotide matrix were inferred using Maximum Likelihood (ML) and Bayesian inference. First, an un-partitioned phylogenetic analysis was performed to estimate a single nucleotide substitution model and branch length parameters for all characters. Next, the data was partitioned in coding regions, introns and intergenic spacers, and a best-fit partitioning scheme for the combined dataset was determined using PartitionFinder version 2.1.1  using the Bayesian Information Criterion (Additional file 4: Table S3). Branch lengths were linked across partitions.
The dataset was analyzed using RAxML version 8.2.10  and mrBayes version 3.2.6 . RAxML and Bayesian searches used the partition model determined by PartitionFinder. For the ML analyses, tree searches and bootstrapping were conducted simultaneously with 1000 bootstrap replicates. Bayesian analysis were started using a random starting tree and were run for a total of ten million generations, sampling every 1000 generations. Four Markov runs were conducted with eight chains per run. We used AWTY to assess the convergence of the analyses . Conflicting data within ML and Bayesian analyses were visualized and explored using the R package phangorn using the consensusNet function .
Barcoding - mPTP
Ion torrent sequencing
After filtering out low-quality reads, 1.9 out of 3.3 and 3.3 out of 4.9 million reads were retained for the pooled MDB2 depleted and enriched fractions respectively. The chloroplast assemblies covered the entire circular plastid genome for all four accessions for the MDB2 depleted fraction (Additional file 6: Figure S2, Additional file 7: Figure S3, Additional file 8: Figure S4, Additional file 9: Figure S5; Table 1). The Sanger generated plastid sequences confirmed the genome assemblies in 18 regions, and also confirmed sequences of the inverted repeat. Complete lengths of the four plastid genomes ranged from 156,036 bp to 156,302 bp (Table 1). All four plastid genomes had the same genome structure and gene arrangement as that of the already assembled Panax plastid genomes.
The enrichment and depletion of methylated DNA by pulling down a methyl-CpG-enriched fraction and leaving a methyl-depleted fraction drastically increased the proportion of organellar DNA within the depleted fraction. P. ginseng SRR experimental data had 5.63% plastid genome reads. In the methylation-depleted fraction, we found a variation of plastome reads ranging from 6 to 33%. In the methylation-enriched fraction, less than 1% of the reads are from the plastome. The enrichment also increased microbiome contamination in the depleted fraction from 0.8 to 4%. Overall, one of the P. vietnamensis samples was the least successful sample in the enrichment and yielded fewer and shorter reads.
Alignment of the plastid genomes for phylogenetic analyses were consistent in length throughout the dataset. Based on the alignment, average plastome pairwise identity for the Araliaceae family is 83% and 99.2% for the Panax clade. The percentage of identical sites is 83.9% and 96.8% respectively. The global plastome alignment has a matrix length of 163,499 bp. Coding regions, introns and intergenic spacers represented 259 original partition schemes, and the best-fit partitioning scheme from PartitionFinder divided the data into 73 partitions (Additional file 4: Table S3).
The basal node segregates two clades, one clade includes two genera, Aralia and Panax. The second clade includes Schefflera, Fatsia, Eleutherococcus, Kalopanax, Metapanax, Brassaiopsis, and Dendropanax. All species included in the study are monophyletic and have maximum support in both Bayesian and ML analyses.
The Araliaceae clade
The Araliaceae clade showed maximum support in the phylogeny except for the Fatsia clade, where the support is 99.6%. Schefflera is sister to the rest of the clade, followed by Dendropanax, then a clade with Brassaiopsis/Eleutherococcus and finally a clade with Fatsia/Kalopanax/Metapanax. A comparison of the partitioned and non-partitioned analyses shows no differences in topology and support in the Aralia-Panax clade, but does in the remaining Araliaceae clade.
The Aralia-Panax clade
The genus Panax is monophyletic and Aralia, represented by two species, A. elata and A. undulata, is the sister group to the genus Panax. Panax stipuleatus and P. binnatifidus form a distinct clade sister to a clade consisting of P. notoginseng and its sister group of P. vietnamensis and P. japonicus, which as a whole is sister to P. quinquefolius and P. ginseng.
In the mPTP analysis for the full plastid dataset, the Average Support Value (ASV) assesses the congruence of support values with the ML delimitation. The analyses return an ASV of 97.9%, suggesting a high confidence for the given species delimitation scheme. Species delimitation recognized 21 distinct entities out of 20 species (Additional file 15: Figure S6). Over-representation and intraspecific variation of the P. ginseng samples has resulted in oversplitting this clade into two discrete entities. The P. stipuleatus / P. binnatifidus clade has lower data structure and the analyses does not strongly support the group as two independent mPTP entities (PP = 0.68). P. quinquefolius has been also divided into two subgroups, but the posterior probability of the subdivision is low (PP = 0.4).
The result of mPTP analyses for all previously used and the newly proposed markers are described in Fig. 5 and the supported nodes for the speciation events have been added to the phylogenetic tree (Additional file 10: Figure S7, Additional file 11: Figure S8, Additional file 12: Figure S9, Additional file 13: Figure S10, Additional file 14: Figure S11). Out of the 15 analysed markers only four can be used to discriminate most species. Figure 5 also shows that regions with the highest density of parsimony informative sites are not necessarily the most efficient for species discrimination, and both skewed aggregated mutations as well as homoplasy can obscure phylogenetic patterns.
Evolution of Araliaceae and ginsengs
The evolution of the Asian palmate group (Fig. 3) is concordant with previously published articles that show Schefflera at the base of the group. The paraphyletic genus Dendropanax was usually the most divergent in the group, but is now basal to the rest of the group. This position might be due to low sampling within the Asian palmate group. Results for Brassaiopsis, Eleutherococcus, Fatsia, Kalopanax and Metapanax, correspond with previously published phylogenies. Early radiations with interlineage hybridizations and genome doubling have been reported in the group  and this could explain the short internal branches. Further phylogenomic and biogeographical studies should be conducted to better understand the radiation of the Araliaceae.
In the Aralia-Panax group, Aralia is sister to Panax, and we find that P. stipuleatus forms a well-supported clade with P. binnatifidus, whereas previous studies have often reported that P. binnatifidus groups with P. omeiensis, P. wangianus, P. zingiberensis and P. major [11, 12, 38, 41], all four of which are however missing here. Due to the difficulty in obtaining material of P. vietnamensis, only three studies have included P. vietnamensis in a phylogeny [13, 96, 112]. The study by Lee et al.  using the plastid marker trnC–trnD does not resolve the position of P. vietnamesis in the phylogeny, but does identify a distinct clade consisting of P. notoginseng, P. japonicus and P. vietnamensis, which is also supported by our data. Komatsu et al.  recover a clade consisting of P. vietnamensis along with P. japonicus and P. pseudoginseng subsp. himalaicus, a synonym of P. bipinnatifidus. Inferring P. japonicus to belong to this clade is contradictory to previous studies that have found a clade consisting of P. quinquefolius, P. ginseng and P. japonicus [12, 38, 41, 112]. The plastome phylogeny supports a sister-relationship of P. ginseng and P. quinquefolius, the two economically most important species of ginseng. Although this full plastome phylogeny significantly differs from previously published molecular phylogenies, the new evolutionary pattern is strongly supported by bootstrap values and posterior probabilities.
Incongruence between markers from different origin
Full length plastid genome data are a major improvement for the Panax phylogeny, and the addition of a bigger dataset has a strong influence on the phylogenetic hypothesis. However, discrepancies between full-length plastid genome phylogenies and nrDNA phylogenies are common in plants. nrDNA has been widely used for phylogenetic studies of Panax [11, 38, 41, 46], but the limitations of this approach have been extensively reviewed in . Drawbacks of nrDNA include difficulties in aligning, and its limited use for phylogenetic inference between closely related and/or recently diverged taxa. It is also a challenge to determine the orthology and the paralogy of nrDNA sequences in the case of hybridization events or incomplete lineage sorting [114, 115, 116]. Bailey et al.  emphasise that despite valuable phylogenetic information from nrDNA, it might not the optimal choice to assess species trees, especially in case of allopolyploids or tetrapolypoids. Since this is also the case in Panax, we argue that nrDNA may be inappropriate to reconstruct the evolutionary history of this genus.
Phylogenetic congruence as well as incongruence of nuclear genomic and plastid marker data is well documented [117, 118, 119]. In the case of Panax, two of the nuclear markers used by  support the clade of P. ginseng and P. quinquefolius (Z14, Z8). However, our topology is incongruent for the remaining clades. Incongruences between the maternally inherited plastid genome and the biparentally inherited nuclear genes can be expected in genera with allopolyploid hybrids, like Panax . Plastid phylogenies are not always representative of the species tree and might conflict with hypotheses of parsimonious morphological evolution [116, 120, 121]. Incongruences between plastome and nuclear gene trees have been reported in wide ranging groups of plants, such as Asclepia , Helianthus  and Silene .
The novel method based on methylation-based enrichment increased the concentration of plastid DNA by 30% which is in the range found by a previous pilot study . It is a suitable method for enriching the organellar genome before sequencing. The methylated fraction shows extremely low amounts of organellar DNA, meaning that we removed more than 99% of the non-methylated DNA from the total DNA. The P. vietnamensis sample had originally more degraded DNA and as a result shows a less successful enrichment. Using MBD2 to increase the concentration of organellar DNA in the total DNA allows multiplexing a larger number of samples. This method is appropriate for building plastid reference genome databases for barcoding projects. In case of degraded samples, we recommend removal of shorter DNA fragments before the enrichment.
Selecting markers for molecular Panax identification
In DNA barcoding and plant product identification and authentication projects it is common to work with degraded DNA substrates for which it might be difficult to use methylation enrichment or the full plastid genome as a barcoding strategy. However, alternatives such as target enrichment and amplicon sequencing are possible [64, 123, 124, 125]. Here we have identified four variable regions that possess sufficient variation and genetic structure to discriminate most ginseng species. The identification of ginseng species is relatively complex because of the recent evolution and hybridization events. P. ginseng and P quinquefolius have recently diverged plastid genomes, and so do P. binnatifidus and P. stipuleatus . Species delimitation using mPTP shows that for such species complexes traditional barcoding markers do not have enough structure for delimiting species. However, if carefully selected, some regions highlight specific structural patterns that enable the discrimination of species. The trnC-rps16 region seems to be particularly promising, as it has enough variation to discriminate most species (Additional file 15: Figure S6). If plastid markers are to be used for barcoding, it is more relevant to use a combination of markers because mPTP analyses are better suited for multi-marker analyses . A concatenated matrix with two, three or four markers combined improves the efficacy in segregating all the Panax species and specifically also those in closely related complexes. Our results suggest that a combination of the following markers: trnC-rps16, trnE-trnM and psbM-trnD (Fig. 5) enables confident identification of the main traded species P. ginseng, P. quinquefolius and P. vietnamensis. For further development, a complete sampling of all Panax species with multiple accessions per taxon should be made to confirm the observed variation in the selected markers.
In order to design accurate markers to monitor the trade of the medicinal species, it is necessary to understand the evolution of the targeted group. Many studies are based on the generic barcodes suggested by iBOL (International Barcode of Life) (rbcL and matK) without having strong evidence for the evolutionary hypotheses of the targeted group and a limited idea a fortiori of the discriminatory power of the used markers. Nonetheless, when a barcoding study targets a specific plant group or genus, and the barcode markers fail to yield a supported phylogeny, then one should aim to construct robust phylogenies with new markers to achieve species discrimination. If the phylogenetic hypothesis is not robust, or if the data are weak in structure as it is often the case with the standard barcoding markers, rbcL and matK, the resulting identifications might be misleading because of inaccurate species delimitation hypotheses .
The addition of genomic data for the phylogeny of Panax radically changes what is known about the evolution of the genus. The implications in terms of phylogeography are still unclear due to missing taxa, and the addition of population data and additional species should improve our insight into the evolutionary history of the genus. The development of species delimitation methods changes perspectives in molecular identification and DNA barcoding by incorporating evolution hypotheses at the species level. The newly proposed molecular markers allow for accurate identification of Panax species and enable authentication of ginseng and derived products and monitoring of the ginseng trade, while ultimately aiding conservation of wild ginseng.
The authors wish to thank the following people and organizations, Jarl Andreas Anmarkrud for his assistance for the enrichment, members of de Boer group for their helpful discussions and feedback during manuscript preparation. This work was performed on the Abel Cluster, owned by the University of Oslo and the Norwegian metacentre for High Performance Computing (NOTUR), and operated by the Department for Research Computing at USIT, the University of Oslo IT-department. http://www.hpc.uio.no/. We would like to acknowledge the support of Vietnam Academy of Science and Technology (VAST).
This project was supported by Vietnam Academy of Science and Technology (grant No. VAST02.01/16–17) and the European Union’s Seventh Framework Programme for research, technological development and demonstration under the Grant agreement no. 606895 to the FP7-MCA-ITN MedPlant, “Phylogenetic Exploration of Medicinal Plant Diversity”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The raw sequence data from the P. bipinnatifidus, P. stipuleanatus, and P. vietnamensis samples have been to submitted to GenBank on the following accessions: SRR5725242, SRR5725240, SRR5725505, SRR5725492, SRR5738925, SRR5738922, SRR5738920, SRR5738920, SRR5738927.The DNA matrix used for the phylogenomic analyses are available on Open Science Framework, ( https://doi.org/10.17605/OSF.IO/Z7RWE). The plastome sequences of P. bipinnatifidus, P. stipuleanatus, and P. vietnamensis (2) have been submitted to NCBI GenBank, (MF377620, MF377621, MF377622, MF377623).
The project was conceived and designed by HdB, HLTT, NVH, and VM. NNL performed the laboratory work. VM performed data analysis. AK, VM and HdB drafted the manuscript. All other authors gave useful contribution on the analysis of data and text of the manuscript. All authors have read and approved the final version of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.Robbins CS. American ginseng: the root of North America's medicinal herb trade: Traffic North America; 1998.Google Scholar
- 2.Millspaugh CF. American medicinal plants: an illustrated and descriptive guide to plants indigenous to and naturalized in the United States which are used in medicine: Dover Publications; 1892.Google Scholar
- 3.Kimmens AC. Tales of the ginseng: Morrow; 1975.Google Scholar
- 5.programme SotcoitiesowffUNe: CITES (convention on international trade in endangered species) handbook: convention on species of wild fauna and flora, July 2016: CITES Secretariat de la Convention sur le commerce international des espèces de faune et de flore sauvages menacées d'extinction; 2016.Google Scholar
- 6.Basnet D, Dey K. Studies on seed germination of an Indian ginseng (Panax assamicus Ban. spec. nov.) for successful cultivation and conservation. Indian J For. 2008.Google Scholar
- 8.Joshi G, Tiwari K, Tiwari R, Uniyal M. Conservation and large scale cultivation strategy of Indian ginseng- Panax pseudoginseng wall. Indian Forester. 1991;117(2):131–4.Google Scholar
- 9.Jain A. Vulnerable and threatened plants of economic value: Panax pseudo-ginseng wall. (The Himalayan Ginseng) MFP News. 1994;4:21.Google Scholar
- 24.Boyer F, et al. Obitools: a unix–inspired software package for DNA metabarcoding. Molecular Ecology Resources. 2016;16(1):176–82.Google Scholar
- 32.Kapli P, et al. Multi-rate poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo. Bioinformatics. 2017;33(11):1630–638.Google Scholar
- 33.Fujisawa T, Barraclough TG. Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets. Syst Biol. 2013;62(5):707–24.Google Scholar
- 34.Yang Z, Rannala B. Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses. Mol Ecol. 2017;26(11):3028–36.Google Scholar
- 37.Plunkett GM, Wen J, Lowry Ii PP. Infrafamilial classifications and characters in Araliaceae: insights from the phylogenetic analysis of nuclear (ITS) and plastid (trnL-trnF) sequence data. Pl Syst Evol. 2004;245(1):1–39.Google Scholar
- 39.Hara H. On the Asiatic species of the genus Panax. J Jpn Bot. 1970.Google Scholar
- 40.Zhou J, Huang W, Wu M, Yang C, Feng K, Wu Z. Triterpenoids from Panax Linn. and their relationship with taxonomy and geographical distribution. Acta Phytotaxonomica Sin. 1975;13(2):29–45.Google Scholar
- 46.Wen J, Plunkett GM, Mitchell AD, Wagstaff SJ. The evolution of Araliaceae: a phylogenetic analysis based on ITS sequences of nuclear ribosomal DNA. Syst Bot. 2001;26(1):144–67.Google Scholar
- 53.Joly S, et al. Genetic structure of the American ginseng (Panax quinquefolius L.) in Eastern Canada using reduced-representation high-throughput sequencing. Botany. 2016;95(4):429–34.Google Scholar
- 59.Boehm C, Harrison H, Jung G, Nienhuis J. Organization of American and Asian ginseng germplasm using randomly amplified polymorphic DNA (RAPD) markers. J Am Soc Hortic Sci. 1999;124(3):252–6.Google Scholar
- 64.Coissac E, et al. From barcodes to genomes: extending the concept of DNA barcoding. Mol Ecol. 2016;25(7):1423–28.Google Scholar
- 66.Jansen RK, Ruhlman TA. Plastid genomes of seed plants. Genomics of chloroplasts and mitochondria: Dordrecht: Springer; 2012. p. 103–26.Google Scholar
- 70.Karol KG, Arumuganathan K, Boore JL, Duffy AM, Everett KD, Hall JD, Hansen SK, Kuehl JV, Mandoli DF, Mishler BD. Complete plastome sequences of Equisetum arvense and Isoetes flaccida: implications for phylogeny and plastid genome evolution of early land plant lineages. BMC Evol Biol. 2010;10(1):321.PubMedPubMedCentralCrossRefGoogle Scholar
- 73.Schmickl R, et al. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae). Mol Ecol Resour. 2016;16(5):1124–35.Google Scholar
- 79.Twyford AD, Ness RW. Strategies for complete plastid genome sequencing. Mol Ecol Resour. 2017;17(5):858–68.Google Scholar
- 80.Obae GS. Nuclear DNA, content and genome size of American ginseng. J Med Plant Res. 2012;6.Google Scholar
- 81.Pan YZ, Zhang YC, Gong X, Li FS. Estimation of genome size of four Panax species by flow cytometry. Plant Diversity Res. 2014;36Google Scholar
- 92.Chen Q, Feng X, Li M, Yang B, Gao C, Zhang L, Tian J. The complete chloroplast genome sequence of Fatsia japonica (Apiales: Araliaceae) and the phylogenetic analysis. Mitochondrial DNA Part A. 2016;27(4):3050–1.Google Scholar
- 98.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–120.Google Scholar
- 99.Hahn C, Bachmann L, Chevreux B. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads—a baiting and iterative mapping approach. Nucleic acids research. 2013;41(13):e129–e129.Google Scholar
- 110.Quinlan AR: BEDTools: the Swiss-army tool for genome feature analysis. Curr Protoc Bioinformatics. 2014;11–12.Google Scholar
- 114.Bailey C. Characterization of angiosperm nrDNA polymorphism, paralogy, and pseudogenes. Mol Phylogenet Evol. 2003;29Google Scholar
- 116.Soltis DE, Kuzoff RK. Discordance between nuclear and chloroplast phylogenies in the Heuchera group (Saxifragaceae). Evolution. 1995;49(4):727–42.Google Scholar
- 119.Novikova PY, et al. Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism. Nat Genet. 2016;48(9):1077.Google Scholar
- 121.Wendel JF, Doyle JJ. Phylogenetic incongruence: window into genome history and molecular evolution. Molecular systematics of plants II. Boston: Springer; 1998. p. 265–96.Google Scholar
- 125.Veldman S, et al. High-throughput sequencing of African chikanda cake highlights conservation challenges in orchids. Biodivers Conserv. 2017;26(9):2029–46.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.