Comparative analysis of the complete chloroplast genome sequences of six species of Pulsatilla Miller, Ranunculaceae
- 106 Downloads
Baitouweng is a traditional Chinese medicine with a long history of different applications. Although referred to as a single medicine, Baitouweng is actually comprised of many closely related species. It is therefore critically important to identify the different species that are utilized in these medicinal applications. Knowledge about their phylogenetic relationships can be derived from their chloroplast genomes and may provide additional insights into development of molecular markers.
Genomic DNA was extracted from six species of Pulsatilla and then sequenced on an Illumina HiSeq 4000. Sequences were assembled into contigs by SOAPdenovo 2.04, aligned to the reference genome using BLAST, and then manually corrected. Genome annotation was performed by the online DOGMA tool. General characteristics of the cp genomes of the six species were analyzed and compared with closely related species. Additionally, phylogenetic trees were constructed, based on single nucleotide polymorphisms (SNPs) and 51 shared protein-coding gene sequences in the cp genome among all 31 species via maximum likelihood.
The size of cp genomes of P. chinensis (Bge.) Regel, P. chinensis (Bge.) Regel var. kissii (Mandl) S. H. Li et Y. H. Huang, P. cernua (Thunb.) Bercht. et Opiz f. plumbea J. X. Ji et Y. T. zhao, P. dahurica (Fisch.) Spreng, P. turczaninovii Kryl. et Serg, and P. cernua (Thunb.) Bercht. et Opiz. were 163,851 bp, 163,756 bp, 162,481 bp, 162,450 bp, 162,795 bp, and 162,924 bp, respectively. Each species included two inverted repeat regions, a small single-copy region, and a large single-copy region. A total of 134 genes were annotated, including 90 protein-coding genes, 36 tRNAs, and eight rRNAs across all species. In simple sequence repeat analysis, only P. dahurica was found to contain hexanucleotide repeats. A total of 26, 39, 32, 37, 32 and 43 large repeat sequences were identified in the genic regions of the six Pulsatilla species. Nucleotide diversity analysis revealed that the rpl36 gene and ccsA-ndhD region have the highest Pi value. In addition, two phylogenetic trees of the cp genomes were constructed, which laced all Pulsatilla species into one branch within Ranunculaceae.
We identified and analyzed the cp genome features of six species of P. Miller, with implications for species identification and phylogenetic analysis.
KeywordsPulsatilla chinensis Pulsatilla Miller Chloroplast genome Phylogeny
large single copy
small single copy
Baitouweng is the dry root of Pulsatilla chinensis, Ranunculaceae. It is a traditional Chinese medicine that has been used to alleviate fever and treat dysentery . A total of 43 species have been identified in Europe and Asia, with 11 found in China . Triterpenoid saponins are thought to be one of the main active components in Baitouweng . It is included in the Chinese Pharmacopoeia as a genuine medicinal material, but not all species are included. Our previous investigation into the market of Chinese medicine found that there are many counterfeits, partially due to mistaking closely related species. This seriously affects the quality of medicinal materials and clinical efficacy. Previously, our team has studied DNA barcodes of Pulsatilla . However, there are limited studies on the phylogenetic position and species diversity of these species, which could be improved by focusing on their chloroplast (cp) genomes.
The chloroplast is an important organelle in plants, which provides energy through photosynthesis and plays an important role in carbon uptake. Additionally, it contains its own genome, which takes the form of a cyclic double-stranded DNA molecule, with a maternal inheritance pattern [5, 6, 7, 8]. Typical cp genome structure consists of four stable parts—two inverted repeats (IRs), a large single copy (LSC) region, and a small single copy (SSC) region . In general, the cp genome contains an average of about 120 kb of unique sequences. In addition to the rRNA and tRNA genes, the number of protein-coding genes in the cp genome is about 100 . In recent years, molecular identification has been widely used to discern true Chinese medicines from their counterfeits [10, 11]. With the rapid development of next generation sequencing, the acquisition of genomes is faster and cheaper than traditional Sanger sequencing . Compared with nuclear genome DNA, cp genome DNA has a low molecular weight, multiple copies, and a simple structure, which are conducive to cp genome analysis . Simple sequence repetition (SSR) has high mutation rate and multiple copies, and SSR markers have been widely used in genetic diversity and evolutionary research [14, 15].
In this study, the cp genomes of P. chinensis, P. chinensis var. kissii, P. cernua f. plumbea, P. dahurica, P. turczaninovii, and P. cernua were sequenced to analyze their structures and explore differences at molecular level. Meanwhile, compared with the cp genomic characteristics of Aconitum carmichaelii (NC_030761.1) and Coptis chinensis (NC_036485.1), whether there was a characteristic variation. We also analyzed SSRs, large sequence repeats, IR boundaries, and nucleotide diversity in an attempt to identify differences. Phylogenetic analysis was carried out to determine the evolutionary relationship and phylogenetic positions of six Pulsatilla species.
DNA extraction and sequencing
Fresh leaves of six species of Pulsatilla were collected from Liaoning Provincial Preservation Nursery of Key Species of Chinese Medicinal Plants in Dalian Campus of Liaoning University of Traditional Chinese Medicine (N 39°06′, E 121°87′, Dalian, Liaoning Province, China). P. chinensis was introduced in Dalian, Liaoning Province. P. chinensis var. kissii was introduced in Anshan, Liaoning Province. P. cernua f. plumbea was introduced in Jiaohe, Jilin Province. P. dahurica was introduced in Yichun, Heilongjiang Province. P. turczaninovii was introduced in Tongliao, Inner Mongolia Autonomous Region. P. cernua was introduced in Dandong, Liaoning Province. All the species were introduced by Xu Liang. Professor Kang Tingguo at Liaoning University of Traditional Chinese Medicine, identified the certificate specimens (P. chinensis 10162180425513LY, P. chinensis var. kissii 10162180429514LY, P. cernua f. plumbea 10162180503515LY, P. dahurica 10162180503516LY, P. turczaninovii 10162180504517LY, P. cernua 10162180504518LY) and deposited them in the Herbarium of Liaoning University of Traditional Chinese Medicine. Approximately 5 g of fresh leaves was harvested for cp DNA isolation using a modified cetyl trimethylammonium bromide method . After DNA isolation, 1 μg of purified DNA was fragmented and used to construct short-insert libraries (insert size 430 bp) according to the manufacturer’s instructions (Illumina), then sequenced on the Illumina Hiseq 4000 .
Genome assembly and annotation
Prior to assembly, raw reads were filtered to remove reads with adaptor contamination, low quality (Q < 20), or a high percentage of uncalled bases (> 10%). The cp genome was reconstructed using a combination of de novo and reference-guided assemblies . First, the filtered reads were assembled into contigs using SOAP denovo 2.04 . Contigs were then aligned to the reference genome using BLAST and aligned contigs (≥ 80% similarity and query coverage) were ordered according to the reference genome. Finally, the clean reads were mapped to the assembled draft cp genome for base correction, and most gaps were filled through local assembly. The cp genes were annotated using an online DOGMA tool  with default parameters to predict protein-coding genes, transfer RNA (tRNA) genes, and ribosome RNA (rRNA) genes. A whole cp genome BLAST  search was performed against five databases, with cutoffs of < 1e−5 E-value and minimum alignment length percentage of > 40%. Searched databases included KEGG (Kyoto Encyclopedia of Genes and Genomes) [22, 23, 24], COG (Clusters of Orthologous Groups) [25, 26], NR (Non-Redundant Protein Database databases), Swiss-Prot , and GO (Gene Ontology) . The sequencing data and gene annotations were then submitted to GenBank and assigned accession numbers (P. chinensis: MK860682, P. chinensis var. kissii: MK860683, P. cernua f. plumbea: MK860684, P. dahurica: MK860685, P. turczaninovii: MK860686, P. cernua: MK860687).
The cp genomes were then mapped using Organellar Genome Draw (OGDRAW) (Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg, Potsdam, Germany) (http://ogdraw.mpimp-golm.mpg.de/index.Shtml) .
Comparative analysis of the cp genomes
The SSR software MicroSAtellite (MISA) (http://pgrc.ipk-gatersleben.de/misa/) was used to identify SSR sequences and tandem repeats of 1–6 nucleotides were considered microsatellites. The minimum numbers of repeats were set to 10, 6, 5, 5, 5, and 5 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively. The maximum number of bases interrupting two SSRs in a compound microsatellite was set to 100. The data were then compared with A. carmichaelii (NC_030761.1) and C. chinensis (NC_036485.1), with an emphasis on perfect repeat sequences . Web-based REPuter (http://bibiserv.techfak.uni-bielefeld.de/reputer/) was used to analyze the long repeat sequences, which included forward, reverse, complement, and palindromic repeats with minimum sequence length of 30 bp and edit distances of 3 bp . DnaSP v5.10 was utilized to determine the average number of nucleotide differences between the six genomes .
In order to analyze the relationship between the phylogenetic position of Pulsatilla and other genera in Ranunculaceae, phylogenetic trees were constructed by aligning cp genome sequences from 31 species, 25 of which were obtained from GenBank. Among the 31 species, there were two outgroups: Arabidopsis thaliana (NC_003076.8) and Panax ginseng (NC_006290.1). Phylogenetic analysis included 29 species in Ranunculaceae, one species of Cruciferae, and one species of Araliaceae. Single nucleotide polymorphisms (SNPs) and 51 shared protein-coding genes of the cp genome for all 31 species were analyzed. The PhyML V3.0 software was used to construct a phylogenetic tree by maximum likelihood method (ML), and a model GTR+I+G was selected for ML analyses with 1000 bootstrap replicates to calculate bootstrap values .
Results and discussion
Comparison of general characteristics of the cp genomes of the eight Ranunculaceae species
P. chinensis var. kissii
P. cernua f. plumbea
GC content (%)
LSC length (bp)
SSC length (bp)
IR length (bp)
Gene number in IR regions
Protein-coding gene number
rRNA gene number
tRNA gene number
A total of 134 genes were observed in each Pulsatilla cp genome, which was the same as that observed in the earliest differentiated group of flowering plants, Amborella trichopoda . All genes, including 36 tRNAs, eight rRNAs and 90 protein-coding genes, were consistent in number. The six species of Pulsatilla all had 14 tRNAs and all eight rRNAs located in the IR region. Based on short read sequencing, we found that the LSC regions of the six Pulsatilla cp genomes were very similar in size, indicating that the evolution of the cp genes from the six Pulsatilla species was highly conserved (Table 1).
List of the genes in the cp genomes of six species of Pulsatilla
Subunits of ATP synthase (6)
atpA, atpB, atpE, atpFa, atpH, atpI
Subunits of NADH dehydrogenase (12)
ndhAa, ndhBa (x2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Subunits of cytochrome (6)
petA, petBa, petDa, petG, petL, petN
Subunits of photosystem I (5)
psaA, psaB, psaC, psaI, psaJ
Subunits of photosystem II (15)
psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Subunit of rubisco (1)
Subunit of Acetyl-CoA-carboxylase (1)
c-type cytochrome synthesis gene (1)
Envelop membrane protein (1)
Large subunit of ribosome (13)
rpl2a (x2), rpl14 (x2), rpl16a (x2), rpl20, rpl22 (x2), rpl23 (x2) rpl33, rpl36
DNA dependent RNA polymerase (4)
rpoA, rpoB, rpoC1a, rpoC2
Small subunit of ribosome (18)
rps2, rps3 (x2), rps4, rps7 (x2), rps8 (x2), rps11, rps12a (x3), rps14, rps15, rps16a, rps18, rps19 (x2)
rRNA Genes (8)
rrn4.5 (x2), rrn5 (x2), rrn16 (x2), rrn23 (x2)
tRNA Genes (36)
trnA-UGC (x2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU (x2), trnI-GAU (x2), trnK-UUU, trnL-CAA (x2), trnL-UAA, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU (x2), trnP-UGG, trnQ-UUG, trnR-ACG (x2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnV-GAC (x2), trnV-UAC, trnW-CCA, trnY-GUA
Conserved open reading frames (5)
ycf 1, ycf 2 (x2), ycf 3b, ycf4
Introns play an important role in regulating gene expression. Recent studies have found that many introns can enhance the expression of exogenous genes at specific times and locations, with implications for agronomic trait improvement . The protein coding genes of all six Pulsatilla species contained the same number of introns. rps16, rpoC1, atpF, petB, petD, rpl16, rpl2, ndhB, rps12, and ndhA all display one intron, while ycf3 and clpP have two introns (Additional file 6: Table S1).
Repeat sequence analysis
Microsatellite markers, also known as SSR markers, are PCR-based DNA molecular markers . Because of the characteristics of neutral markers, the highly variable numbers of repeats and the relative conservation of flanking sequences of SSRs, they have been widely utilized in genotyping. SSR marker are also easy to design and have high repeatability and codominant inheritance among alleles, making them the best choice for evaluating the genetic diversity of crop species [43, 44, 45].
SSRs identified in the cp genomes of the six Pulsatilla species
P. chinensis var. kissii
P. cernua f. plumbea
Taking P. chinensis as the representative species, the distribution of palindromic-type SSR (p-type SSR) was analyzed (Additional file 8: Table S3). The repeat sequences were mainly distributed in the non-coding sequences (CNS), intergenic spacers, and intron regions. Some were found in the coding regions of certain genes, including matK, psbC, rpoB, rpoC2, rps2, atpA, ndhJ, atpB, accD, and others. The other five Pulsatilla species were similar to P. chinensis in terms of p-type SSR (Additional file 9: Table S4, Additional file 10: Table S5, Additional file 11: Table S6, Additional file 12: Table S7 and Additional file 13: Table S8).
Large repeat analysis
Many repeats are present in gene deserts, although whole-genome sequencing has shown that they can occur in functional regions as well . Repeat of 30 bases or more are typically considered large repeats. Forty-nine (P. chinensis), 49 (P. chinensis var. kissii), 38 (P. cernua f. plumbea), 38 (P. dahurica), 49 (P. turczaninovii) and 49 (P. cernua) pairs of large repeat sequences were found in the six Pulsatilla cp genomes, with sequence identity exceeding 90%. The repeats from P. chinensis and P. chinensis var. kissii ranged from 30 to 59 bp in length, and in P. cernua f. plumbea, P. dahurica, P. turczaninovii and P. cernua, the repeats ranged from 30 to 52 bp in length. A total of 26, 39, 32, 37, 32 and 43 large repeats were located in the genic regions of the six Pulsatilla species, respectively (Additional file 14: Table S9, Additional file 15: Table S10, Additional file 16: Table S11, Additional file 17: Table S12, Additional file 18: Table S13 and Additional file 19: Table S14).
Analysis of the LSC, SSC, and IR border regions
Nucleotide diversity analysis
Pulsatilla chinensis and P. chinensis var. kissii clustered into one branch, implying that they are very closely related. P. chinensis var. kissii is also recorded as a variety of P. chinensis in both Flora Reipubicae Popularis Sinicae and Herbaceous Flora of Northeast China [2, 58], which was further validated by our clustering results. P. cernua f. plumbea and P. dahurica were clustered into one branch. In fact, P. cernua f. plumbea is considered a forma of P. cernua , but our results showed that they are relatively distant within the Pulsatilla branch. P. turczaninovii was relatively far from other Pulsatilla species. This indicates that P. turczaninovii and other Pulsatilla species were relatively distantly related. In reality, P. turczaninovii is also recorded as a separate species in some literature [2, 58, 60]. Among these six species, only P. chinensis is included in the Chinese Pharmacopoeia as a original plant, but other Pulsatilla members also clustered with P. chinensis in our tree. Whether other Pulsatilla species can replace P. chinensis requires further verification.
In our phylogenetic tree, Anemoclema glaucifolium and Clematis terniflora were clustered in one clade, and they both were closely related. At the same time, they both were clustered in one large clade with six species of Pulsatilla. This indicates that Pulsatilla species were more closely related to both than was previously thought. Ranunculus repens, R. macranthus, R. occidentalis and R. reptans gathered in one clade, with R. repens and R. macranthus clustered into one small branch. Trollius chinensis and Thalictrum coreanum were also closely related. C. quinquesecta and C. chinensis were gathered in one clade. In addition, Kingdonia uniflora was clustered in a single branch, which indicated that it was distantly related to other Ranunculaceae species. Aconitum L. were clustered into one branch, with A. pseudolaeve, A. longecassidatum, A. angustius, A. finetianum and A. sinomontanum gathered in one small clade. Previous studies on A. longecassidatum and A. pseudolaeve have shown that they have highly conserved cp genome structure , which fits with our results. A. monanthum, A. ciliare, A. carmichaelii, A. kusnezoffii, A. chiisanense, A. austrokoreense and A. coreanum gathered in one small clade, which was also consistent with previous research results . Arabidopsis thaliana and Panax ginseng were located at the bottom of the phylogenetic tree, and clustered into one branch. The phylogenetic relationship of cp genomes in Ranunculaceae was analyzed by SNPs sequence, which indicated the cp genomes in Ranunculaceae was relatively conserved, and the six species of Pulsatilla were closely related.
Analysis of the cp genome sequences of six Pulsatilla species showed that they had very similar genome sizes. Comparison to A. carmichaelii and C. chinensis in Ranunculaceae revealed differences in genome sizes. For example, the size of annotated cp genes in the six Pulsatilla was different, particularly rpl16 and rps12. These differences may be useful for marker development and phylogenetic analysis. In addition, P. dahurica contained six types of SSRs, while the other five Pulsatilla species had only five SSR types (with no hexanucleotides identified). Thirty-eight to 49 pairs of large repeats were found in the six Pulsatilla cp genomes, which are valuable for marker development and phylogenetic analysis. In addition, the size of IR regions was conserved among Pulsatilla species but different from that of A. carmichaelii and C. chinensis in Ranunculaceae. We also analyzed the nucleotide diversity of 108 genes and 105 non-coding regions, and rpl36 and ccsA-ndhD were the most variable, which are potentially suitable for marker design. The phylogenetic analysis was conducted based on SNPs of the whole cp genome and 51 shared chloroplastic protein-coding, and the position of six Pulsatilla species in Ranunculaceae was determined. These two phylogenetic trees will provide a reference for studying the evolutionary history of Ranunculaceae.
We thank the Shanghai BIOZERON Biotechnology Co., Ltd. for processing the raw sequencing data.
Conceptualization, TZ; Methodology, TZ; Software, JW; Validation, LX and TK; Formal analysis, DZ; Investigation, SL; Resources, GB; Data curation, YX; Writing—original draft, TZ; Writing—review and editing, TZ; Visualization, YY; Supervision, ZZ; Project administration, LX; Funding acquisition, LX and TK. All authors read and approved the final manuscript.
This research was funded by the National Natural Science Foundation of China (General Program, Grant Numbers 81874338, 81773852), the Major Expenditure Increase and Reduction Project at the Central Level “Capacity Building for Sustainable Utilization of Precious Traditional Chinese Medicine Resources” (Grant Number 2060302) and the Liaoning Province Education Department (Liaoning Higher School Outstanding Young Scholar Growth Plan, Grant Number LJQ2014101).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 1.Chinese Pharmacopoeia Commission. Chinese Pharmacopoeia. 342nd ed. Beijing: China Medical Science and Technology Press; 2015. p. 104.Google Scholar
- 2.Editorial Board of Flora Reipubicae Popularis Sinicae. Flora Reipubicae Popularis Sinicae. Beijing: Science Press; 1980. p. 62.Google Scholar
- 4.Liang YM, Chen SY, Xu L, Wang B, Kang TG. Identification of plants and herbs of Pulsatilla genus based on ITS2 barcode. J Chin Med Mater. 2017;40:1547–51.Google Scholar
- 9.Dyer TA. chloroplast genome: its nature and role in development. Top Photosynth. 1984;5:23–69.Google Scholar
- 10.Xin TY, Yao H, Luo K, Xiang L, Ma XC, Han JP, et al. Stability and accuracy of the identification of Notopterygii Rhizoma et Radix using the ITS/ITS2 barcodes. Acta Pharm Sin. 2012;47:1098–105.Google Scholar
- 11.Luo K, Ma P, Yao H, Xin TY, Hu Y, Zheng SH, et al. Identification of gentianae macrophyllae radix using the ITS2 barcodes. Acta Pharm Sin. 2012;47:1710–7.Google Scholar
- 13.Huang Y, Li CL, Ma C, Wu NH. Chloroplast DNA and its application to plant systematic studies. Chin Bull Bot. 1994;11:11–25.Google Scholar
- 27.Magrane M, Consortium U. UniProt Knowledgebase: a hub of integrated protein data. Databases (Oxford). 2011;2011:bar009.Google Scholar
- 40.Hansen DR, Dastidar SG, Cai ZQ, Penafior C, Kuehl JV, Boore JL, et al. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early diverging angiosperms: buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol Phylogenet Evol. 2007;45:547–63.PubMedCrossRefPubMedCentralGoogle Scholar
- 46.Xing YP, Xu L, Chen SY, Liang YM, Wang JH, Liu CS, et al. Comparative analysis of complete chloroplast genomes sequences of Arctium lappa and A. tomentosum. Biol Plantarum. 2019;63:565–74.Google Scholar
- 56.Liang YM, Xu L, Chen SY, Wang JH, Wang B, Kang TG. Classification of Pulsatilla Adans and molecular identification of DNA barcodes based on ITS2 sequence in Liaoning Province. Chin J ETMF. 2018;24:36–42.Google Scholar
- 57.Zhang TT, Liang YM, Xu L, Yang YY, Xing YP, Liu T, et al. Study on DNA molecular identification of mix samples of five species of Baitouweng medicinal materials based on high-throughput sequencing technology. Acta Pharm Sin. 2018;53:1918–23.Google Scholar
- 58.Liaoning Forestry Soil Research Institute. Herbaceous flora of Northeast China, vol. 3. Beijing: Science Press; 1975. p. 162.Google Scholar
- 59.Jin JX, Zhao YT. A new form of genus pulsatilla from northeast China. Bull Bot Res. 1989;9:69–70.Google Scholar
- 60.Editorial Board of Flora of China. Flora of China. Beijing: Science Press; 2001. p. 332.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.