Complete chloroplast genomes of all six Hosta species occurring in Korea: molecular structures, comparative, and phylogenetic analyses
The genus Hosta is a group of economically appreciated perennial herbs consisting of approximately 25 species that is endemic to eastern Asia. Due to considerable morphological variability, the genus has been well recognized as a group with taxonomic problems. Chloroplast is a cytoplasmic organelle with its own genome, which is the most commonly used for phylogenetic and genetic diversity analyses for land plants. To understand the genomic architecture of Hosta chloroplasts and examine the level of nucleotide and size variation, we newly sequenced four (H. clausa, H. jonesii, H. minor, and H. venusta) and analyzed six Hosta species (including the four, H. capitata and H. yingeri) distributed throughout South Korea.
The average size of complete chloroplast genomes for the Hosta taxa was 156,642 bp with a maximum size difference of ~ 300 bp. The overall gene content and organization across the six Hosta were nearly identical with a few exceptions. There was a single tRNA gene deletion in H. jonesii and four genes were pseudogenized in three taxa (H. capitata, H. minor, and H. jonesii). We did not find major structural variation, but there were a minor expansion and contractions in IR region for three species (H. capitata, H. minor, and H. venusta). Sequence variations were higher in non-coding regions than in coding regions. Four genic and intergenic regions including two coding genes (psbA and ndhD) exhibited the largest sequence divergence showing potential as phylogenetic markers. We found compositional codon usage bias toward A/T at the third position. The Hosta plastomes had a comparable number of dispersed and tandem repeats (simple sequence repeats) to the ones identified in other angiosperm taxa. The phylogeny of 20 Agavoideae (Asparagaceae) taxa including the six Hosta species inferred from complete plastome data showed well resolved monophyletic clades for closely related taxa with high node supports.
Our study provides detailed information on the chloroplast genome of the Hosta taxa. We identified nucleotide diversity hotspots and characterized types of repeats, which can be used for developing molecular markers applicable in various research area.
KeywordsHosta Chloroplast genome Repeats Codon usage Sequence divergence Phylogeny
Akaike information criteria
Large single-copy region
Effective number of codons
Next generation sequencing
Whole CP genomes
Relative synonymous codon usage values
Small single-copy region
Simple sequence repeats
The genus Hosta Tratt. (Asparagaceae) is a group of economically important perennial herbs and distributed exclusively in eastern Asia [1, 2, 3]. As the plants have showy flowers and foliage, many Hosta species and the cultivars (~ 2500) are heavily exploited for gardening throughout all temperate regions . The plants in Hosta are commonly called as plantain lily (bibichu in Korean) and have grown the popularity in gardens due to the advantages in cultivating due to the tolerance to shade and high soil moisture contents [5, 6]. Coupled with the horticultural importance, Hosta species provide critical values in medical areas. Recent studies revealed that the species are rich in saponins and amaryllidaceae alkaloids that are inhibiting tumor related and inflammatory activities [7, 8]. The Hosta plants also have been used as a folk medicine for treating multiple symptoms including multiple inflammatory diseases such as urethritis and pharyngolaryngitis in China and Japan .
The genus Hosta is placed in the family Asparagaceae since it was moved to the family from Liliaceae in the 1930s based on the cytological characteristics (2n = 60) . There are approximately 22–25 species in the genus [1, 4], although the number of species (43 in Schmid)  and the relationships among the taxa have been problematic due to the extensive variability in morphology. The challenges in taxonomy of Hosta are also attributed to the confusions brought from the abundance of cultivars (number of cultivars reported > 2500) [2, 4]. The taxonomic difficulties are further complicated by the dearth of diagnostic characters as well as lack of comparative investigations on taxonomic keys between the dried herbarium specimens and the living plants from natural populations across varying environments . In Korea, approximately 14 Hosta (11 species, 2 varieties, 1 cultivar) taxa have been reported thus far, however the number of species varies from 5 to 11 depending on the scholars working on the genus .
Organization of CP genomes are conserved throughout higher plants at the structural and genic level [11, 12]. Generally, in nearly all land plants, CP genomes are consisting of a single circular DNA molecule  and show quadripartite structure, i.e. a large single-copy region (LSC) and a small single-copy region (SSC) separated by inverted repeats (IRs). Although the extent of variation is not very large across flowering plants, the genome sizes of chloroplasts differ between species ranging from 107 kb (Cathaya argyrophylla) to 280 kb (Pelargonium) [11, 12]. There are approximately 120 to 130 genes in chloroplast genomes contributing to photosynthesis, transcription and translation . The CP genomes are usually transmitted from one of the parents (supposedly no recombination occurring), mostly the mother in angiosperms . The sequences of the CP genomes are conserved among taxa, thus the genomes often provide robust markers for phylogenetic analysis and divergence time estimation particularly at a higher taxonomic level .
Over a dozen of regions within the CP genome e.g. ndhF, matK, and trnS-trnG have been widely amplified for the purpose of species identification, barcoding and phylogenies [15, 16]. Certainly, there is no universal region of CP genome that works best for all plant taxa. Also, despite the wide utilities of CP markers for taxonomic studies, the taxonomy of the most closely related taxa based on those markers often remains unresolved in many taxa due to the limited variation . With the advent of next generation sequencing (NGS) technology, sequencing the whole CP genomes (plastome) for multiple taxa is feasible at a low cost. Recently the complete plastome sequences have been applied to reconstruct phylogenies on problematic taxa and has successfully resolved the enigmatic relationships [14, 17, 18]. Currently, four Hosta plastomes have been sequenced and two of those are publicly available in NCBI Organelle Genome Resources (http:// www.ncbi.nlm.nih.gov/genomes) [3, 19, 20]. In this study, we investigated the plastomes of all six Korean Hosta summarized by Chung and Kim . We newly sequenced and assembled the whole plastomes of four species (H. clausa, H. jonesii, H. minor, and H. venusta). The plastome of H. yingeri (MF990205.1)  and H. capitata (MH581151)  were downloaded and added to the comparative analysis. The aims of our study were: 1) to determine the complete structure of plastomes for the four Korean Hosta species; 2) to compare sequence variation and molecular evolution among the six Korean Hosta; 3) to infer the phylogenetic relationship among the six Korean Hosta and reconstruct the phylogeny of the six species within the subfamily Agavoideae.
Chloroplast genome assembly
Sample information and summary of chloroplast genome characteristics for four Hosta species in Korea. The species acronyms are as followings: CLA- H. clausa; MIN- H. minor; VEN- H. venusta; JON- H. jonesii
Mt. Daeam, Gangwon-do
Mt. Gaejwa, Busan-si
NCBI accession No.
Reads after trimming
Total length (bp)
LSC length (bp)
SSC length (bp)
IRa length (bp)
IRb length (bp)
Total GC content (%)
Total number of genes
Chloroplast genome annotation
List of genes within chloroplast genomes of six Hosta species in Korea. ×2 refers to genes duplicated in the IR regions
Group of genes
Names of genes
Transcription & Translation
Ribosomal protein, LSU
rpl33, rpl20, rpl36, rpl14, rpl16 (× 2)a, rpl22, rpl2(× 2)a, rpl23(× 2), rpl32
Ribosomal protein, SSU
rps16, rps2, rps14, rps4, rps18, rps12(× 2)a, rps11, rps8, rps3, rps19(× 2), rps7(× 2), rps15
rpoC2, rpoC1a, rpoB, rpoA
rrn16(×2), rrn23(× 2), rrn4.5(× 2), rrn5(× 2)
trnL-UAAa, trnF-GAA, trnV-UACa, trnM-CAU, trnW-CCA, trnP-UGG, trnH-GUG(×2), trnI-CAU(× 2), trnL-CAA(× 2), trnV-GAC(× 2), trnI-GAU(× 2)a, trnA-UGC(× 2)a, trnR-ACG(× 2), trnN-GUU(× 2), trnL-UAG, trnR-UCU, trnD-GUC, trnC-GCA, trnQ-UUG, trnE-UUC, trnG-UCC, trnK-UUUa, trnfM-CAU, trnS-GCU, trnS-UGA, trnS-GGA, trnT-GGU, trnT-UGA, trnY-GUA, trnG-GCCa, trnT-UGU
psaB, psaA, psaI, psaJ, psaC
psbA, psbK, psbI, psbM, psbD, psbC, psbZ, psbJ, psbL, psbF, psbE, psbB, psbT, psbN, psbH, petN
ndhJ, ndhK, ndhC, ndhB(×2)a, ndhF, ndhD, ndhE, ndhG, ndhI, ndhAa, ndhH
Cytochrome b6/f complex
petN, petA, petL, petG, petBa, petDa
atpA, atpFa, atpH, atpI, atpE, atpB
Rubisco large subunit
ATP-dependent protease subunit P
Chloroplast envelope membrane protein
Subunit Acetyl- CoA-Carboxylate
Photosystem I assembly& stability
infAψ (MIN/CAP), ycf15ψ (MIN/CAP), rps16ψ (JON), rps11ψ (JON)
Comparative chloroplast genome structure and polymorphism
Codon usage pattern
According to the codon usage analysis, overall 64 codons were present across of the six Korean Hosta species encoding 20 amino acids (AAs). Total number of codons for protein coding genes found was 26,505 in all six Korean Hosta. The effective number of codons were as followings: 3158 (H. clausa); 4002 (H. capitata); 4006 (H. minor); 5007 (H. venusta); 5018 (H. yingeri) and 4004 (H. jonesii). The most abundant AA among the 20 AAs was leucine (number of codons encoding leucine = 2735, 10.3%) followed by isoleucine (number of codons encoding isoleucine = 2287, 8.6%). Alanine was the least frequent AA in the Korean Hosta, which is encoded only by 309 codons (1.2%). The codon usage based on relative synonymous codon usage values (RSCU) did not vary among the six Korean Hosta species except for some decreases found in three AAs of H. venusta and H. yingeri (Additional file 1: Figure S2). Of the six Hosta species, H. venusta and H. yingeri had 47 codons more frequently used than the expected usage at equilibrium (RSCU > 1) while the rest of four Hosta species showed the codon usage bias (RSCU > 1) in 59 codons. All six Hosta had 59 codons less frequently used than the expected usage at equilibrium (RSCU < 1). Codons with A and/or U in the third position take up ~ 30% and ~ 24% of all codons respectively. The frequency of use for the start codons AUG and UGG, encoding methionine and tryptophan, showed no bias (RSCU = 1) in all Korean Hosta taxa.
Tandem repeat and SSR
Distribution of simple sequence repeats (SSRs) in six Hosta species in Korea. c denotes for compound SSR of which comprised more than two SSRs adjacent to each other. The number of polymorphic SSRs were counted when the SSRs are polymorphic at least in one species
Number of SSRs (No. of polymorphic SSRs)
Species in the genus Hosta are economically well recognized plants endemic to eastern Asia with taxonomic disputes due to the high morphological variabilities in Korea, China and Japan [1, 2, 3]. In the present study, we newly sequenced whole CP genomes for four Korean Hosta taxa and conducted comparative analyses on all six Korean Hosta CP genomes to understand the architecture of CP genomes in the taxa. We characterized gene organization along with codon usage pattern and found structural and size variations across the six Hosta taxa, which might be applicable for phylogenetic and population genetics studies.
Angiosperm plastomes have shown very little variation in size, structure and gene content [11, 12]. The Hosta plastomes that we analyzed revealed the typical quadripartite structure and fell in the expected size range (~ 15.7kbp) for angiosperm plants. Approximately 129 genes are present with 18 genes harboring introns across the angiosperm plastomes and the gene contents are also conserved [11, 21]. The gene annotation results in our study were consistent with the genetic properties of angiosperm plastomes. The number of genes found in CP genome from six Korean Hosta was ~ 130 and there were 18 genes with introns. The intron number is highly conserved throughout eudicots and most of monocots . Our study found the same number of introns, 18, suggesting that the intron contents in Hosta are also similar to the ones from most of flowering plant clades. Although significant gene loss (> 30 genes) are observed in a small group of taxa (64 taxa), most of plant groups, only a handful of gene losses are detected . It is believed that the most common gene losses in angiosperm, infA might have derived from transferring of the gene to the nucleus . We found infA within two Hosta plastomes (H. minor and H. capitata), however the gene was pseudogenized by an internal stop codon.
Apart from a few exceptions, e.g. tobacco (171kbp) and geranium (217kbp), the plastome size variation is limited in angiosperm [11, 18]. The large size changes almost exclusively are accompanied by an elongation or deletion of inverted repeat regions, whereas most sequence variations are attributable to rather small length mutations mainly occurring in noncoding regions [11, 23]. In a recent comparative analysis of CP genomes across all land plants, monocots revealed a relatively high variation in size with an average plastome size of 14kbp . The Hosta plastomes we analyzed showed a rather limited size variation (size difference < 85 bp) with one exception found in H. capitata. In the mVISTA result, there was 278 bp sequence deletion on H. capitata in the intergenic region around the trnK-UUU gene (Fig. 4). Our amplification result of the region indicates that the deletion is a unique feature of H. capitata (Additional file 1: Figure S1). The large length variations ranged from 50 to 1200 bp are not common in angiosperm plastomes . The position of this large sequence deletion (around the border of LSC and IRb) coincides with the ones observed in angiosperms . Although the causal mechanism for this large mutation is still elusive, it might offer valuable information on the evolution of plastome architecture as most of these variations shown in phylogenetic hotspots .
Besides the large length variation, we found sequence polymorphism in both genic and non-genic regions. Consistent with the diversity patterns found in most angiosperms [24, 25, 26, 27], sequence divergence in non-coding regions (0.0011) was higher than the one in coding regions (0.0006). The overall nucleotide variability in Hosta plastomes was relatively lower than the ones found in other taxa (average pi = 0.009 in three Papaver; average pi = 0.003 in three Cardiocrinum) [25, 27]. Despite the lowered sequence variation, we identified four hyper-variable sites located in the SSC region (Fig. 3). We further examined the level of sequence polymorphism to determine whether these sites can be good candidates for a shallow level taxonomic studies i.e. inter- and intra-specific taxa in Hosta group. Notably, the results revealed very limited polymorphism for both inter- and intra-specific level. However, there was significantly high polymorphism found for H. clausa in ndhD gene. The number of variable sites among the two H. clausa samples from two different collection sites was 18, which is surprisingly high considering the limited number of variable sites (0–2) observed in the other genes and species (Additional file 1: Table S3 and Table S4). The highly inflated polymorphism may be in part due to long-term population isolation or the two samples might harbor different species or genetically distinct lineages. However, since our data set have limited sample size, the explanation must be taken with great caution. With the finding that our study discovered, some might further investigate diversity pattern of ndhD gene with larger sample size to determine the evolutionary history of the gene in the light of the species and population diversification.
It is hypothesized that the structural integrity of the whole plastomes is highly linked to the IR structure and the changes in plastome structure are often associated with IR expansions and contractions . We investigated six Korean Hosta plastome structures and compared the sizes and the borders of the three components, LSC, SSC and IRs. Overall, our data suggests varying distribution of variations across the four plastome components with the least variation found in IRs (Figs. 3 & 4). The limited variation in IRs are largely consistent with the results of recent studies [25, 26]. However, we found IR expansions (H. capitata) and contractions (H. minor and H. venusta; Fig. 4). As the extent of expansions and contractions are small (< 20 bp), the IR structure changes doesn’t seem to significantly influence the whole plastome integrity.
Codon assignments for each of 20 amino acids are same across nearly all living organisms, yet the preference over individual codons largely differ among taxa . Genome composition and selection towards increased translation efficiency are the two major factors affecting codon usage pattern [30, 31]. In the CP genome, the compositional bias associated with A/U rich positions is the primary cause of codon usage bias [32, 33]. The six Hosta CP genomes are low in GC content. In the six Korean Hosta taxa, we found a slight bias toward the nucleotide pair A/U. ~ 55% of total codons were with A/U at third position of the codons. However, the proportion of A/U at third position is significantly high for the biased codons with RSCU > 1. Among the codons with RSCU > 1 (more frequently used codons), over 76% had A/U at the third position.
On average, our plastome data found ~ 55 SSRs across the six Hosta taxa, which is slightly less than the ones reported in other angiosperm taxa (SSR numbers = 105 in Betula; 130 in Paris; 50 in Chenopodium; 250 in Aconitum; 48, in Fagopyrum) [24, 34, 35, 36, 37]. We found inter-specific polymorphism in about 30 to 40% of the total SSRs (Table 3). Of the six Hosta taxa, H. jonesii harbored the highest number of SSRs that are polymorphic among species (Table 3). Simple sequence repeats, so called microsatellites are the tandem repeats that are most commonly used in population genetics studies due to the abundance, codominant mode of inheritance, and hyper-polymorphic nature . The individual level of polymorphism may not be as high as the inter-specific polymorphism. However, the polymorphism we found only with a few species suggested that the SSRs we identified might be applicable for various population genetics studies on the Hosta taxa.
Aside from the two copies of inverted repeats, approximately 50 small repeats were dispersed throughout coding and non-coding regions of the six Hosta taxa. The repeat numbers are not significantly higher but comparable to the ones found in other angiosperms (dispersed repeat number in Papaver spp. = 49.; 21 in Paris spp.; 36 in Passiflora; 37 in Aconitum,) [24, 27, 36]. Repeats are highly correlated with the plastome rearrangement in various angiosperm taxa and can be a signature of recombination . Repeats can provide recognition signals during recombination process as the repeated sequences have the potential to form secondary structures . It has been believed that recombination rarely occurs in flowering plants due to the predominance of uniparental inheritance. However, evidence of intermolecular homologous recombination in flowering plants have been mounting [41, 42]. There was no record of plastome recombination in Asparagaceae, however plastome studies examining the recombination in the taxa are completely lacking thus far. Given higher number of repeats observed in our Hosta data, inter- and intra-specific plastome recombination might not be unlikely.
The genus Hosta have gained notorious recognition by the taxonomic confusion among the taxa due to morphological similarities, high variability of taxonomic characteristics and copious forms of cultivars [2, 4]. The taxonomic studies for Hosta taxa have been conducted mostly on pollen, flower and leaf morphology and a few molecular markers [9, 10], which may in part complicate the problems. Use of whole CP genome sequences has shown the considerable values to reconstruct the phylogenetic relationships among the complex taxa at various taxonomic levels [14, 18, 26]. We utilized the complete CP genome sequences of 21 taxa in subfamily Agavoideae (Asparagaceae) to infer phylogenetic relationships among the six Korean Hosta taxa and the related taxa. The plastome sequence of Asparagus officinalis (Asparagaceae) was assigned to an outgroup. There was no difference in the tree topology between the ML and NJ phylogenies with robust supports for the most clades suggesting a high confidence in the relationships among the clades and taxa (Fig. 5). The overall phylogenetic relationships among the 21 taxa computed from the complete plastome sequences (Fig. 5) were congruent to the one shown in the recent phylogenetic studies for family Asparagaceae [3, 43]. However, there was a slight conflict found on the relationships among the Korean Hosta taxa between our plastome based phylogeny and the phylogeny computed by 16 CP DNA restriction site mutations . The latter put H. yingeri on a clade with H. capitata, whereas our plastome data support the clade of H. yingeri with H. jonesii. According to Chung et al. , H. yingeri showed more morphological similarities with H. jonesii than H. capitata by sharing the same smooth scape and spike-like inflorescence types. The high morphological similarity between H. yingeri and H. jonesii suggests that the complete plastome phylogeny might have a better resolution on those three species. These results suggest that the whole CP sequences provide a powerful tool for resolving specific level phylogeny.
In conclusion, our study revealed the structural characteristics, distribution of sequence variation and repeats, gene content and organization for complete CP genomes in the six Korean Hosta species. Although structural variations are limited among the six Hosta plastomes, there were small IR region expansions and contractions in three taxa. We identified highly polymorphic regions of nucleotide variation that are potential molecular markers for phylogenetic studies. SSRs found in our plastome data might also provide intra-specific level polymorphic markers that can be used for population genetics studies. The increased number of dispersed repeats open to further evolutionary questions. Inter- and intra- specific recombination events might have happened in the past are likely be one plausible explanation for the increased number. Future studies might use the information of plastome architecture that we provided in this study and explore the characteristics of repeat elements.
Sampling, DNA isolation and sequencing
We collected fresh young leaf samples for four Hosta plants from four different localities listed in Table 1. The plants were identified based on the key morphological characters provided in Chung and Kim  and Jo and Kim . The leaf samples were quickly dried with silica gel in a zip lock plastic bag upon the sampling and stored at room temperature until further use. We achieved all required permits for the protected areas from National Park Services and local governments. We prepared the voucher specimen for all four samples used and deposited them in the National Institute of Biological Resources with the accession numbers listed in Table 1.
Total Genomic DNA were extracted from each of the four Hosta plants using a DNeasy Plant Mini Kit (Qiagen Co., Hilden, Germany) following the manufacturer’s protocol. The extracted DNA were quantified in NanoDrop ND1000 (Thermo Fisher Scientific, Massachusetts, USA; quality cutoff, OD 260/280 ratio between 1.7–1.9) and visualized in a 1% agarose-gel electrophoresis for the quality check. Illumina paired-end (PE) libraries (read length: 2 × 125 bp) with insert sizes of 270 to 700 bp for each of the four Hosta species were constructed and sequenced on MiSeq platform (Illumina Inc., San Diego, CA) by Macrogen Inc. (http://www.macrogen.com/, Seoul, Korea). We removed poor quality reads (PHRED score of < 20) using the quality trim function implemented in CLC Assembly Cell package v. 4.2.1 (CLC Inc., Denmark).
Genome assembly and annotation
We employed the low-coverage whole-genome sequence (dnaLCW) method  to assemble the complete CP genomes using both CLC de novo assembler in CLC Assembly Cell package and SOAPdenovo (SOAP package v. 1.12) with default parameters. Gaps were filled by the Gapcloser fuction in the SOAP package. To improve the CP genome assembly, we also conducted reference-based genome assembly using the CP genome sequences of H. ventricosa (GenBank accession = NC_032706.1). The contigs obtained from the primary de novo assemblies were aligned to the reference CP genome, then the aligned contigs were assembled to each chloroplast genome in Geneious v. 2019.0.4 (http://www.geneious.com).
We annotated the CP genomes assembled using the online tool, DOGMA (Dual Organellar GenoMe Annotator)  with a few adjustments for start and stop codons. Protein-coding genes were defined based on the plastid-bacterial genetic code. We also scanned all tRNAs with tRNAscan-SE  using the default settings to confirm the tRNA boundaries identified by DOGMA. The visual presentations of the plastome circular map were drawn in OGDRAW (http://ogdraw.mpimp-golm.mpg.de/). The annotated CP genome sequences of the four newly sequenced Hosta species in our study were then deposited in GenBank under the accession numbers listed in Table 1.
Genome structure and comparative analysis
We compared the overall genome structure, genome size, gene content and repeats across all six Korean Hosta species including the CP genomes downloaded from GenBank (H. yingeri MF990205.1, H. capitata MH581151) . The GC content was compared using Geneious. The whole plastome sequences of the six Hosta plants were aligned with MAFFT (http://mafft.cbrc.jp/alignment/server/) and visualized using Shuffle-LAGAN mode in mVISTA (http://genome.lbl.gov/vista/mvista/submit.shtml). For the mVISTA plot, we used the annotated CP genome of H. ventricosa as a reference. To determine whether 278 bp sequence deletion is a unique property of H. capitata or the result of sequencing error, we amplified the trnK-UUU/trnQ-UUG region, where the deletion is placed for the six Hosta species. The detailed method of amplification and data analysis are provided in the supplementary information (Additional file 1: S1). We also examined the sequence divergence among the six Korean Hosta species through a sliding window analysis computing pi among the chloroplast genomes in DnaSP v. 6.0 . For the sequence divergence analysis, we applied the window size of 600 bp with a 200 bp step size. We further examined the level of polymorphism for the hyper-variable sites based on pi (psbA, ndhD, trnL, and ndhF-rpl32 IGS). Two to three individuals were collected from different populations for the six Korean Hosta species (in total 13 individuals; Additional file 1: Table S3). We then extracted DNA from the 13 individuals and amplified the DNA using four primer pairs (Additional file 1: S2). The detailed conditions of amplification and the data analysis are provided in the supplementary information (Additional file 1: S2).
We found repeat elements using two approaches. Web-based simple sequence repeats finder MISA-web (https://webblast.ipk-gatersleben.de/misa/) was employed to identify SSRs with thresholds of 10 repeat units for mono-, 5 repeat units for di-, 4 repeat units for tri-, and 3 repeat units for tetra-, penta-, and hexa-nucleotide SSRs. Among the SSRs of each type, the polymorphic SSRs among the six species were counted by comparing the size of SSRs. We also investigated the size and type of repeats in the six Korean Hosta plastomes using REPuter . For REPuter analysis, we set the parameters as follows: a minimal repeat size of 30 bp, hamming distance of 3 kb, and 90% or greater sequence identity. We analyzed codon usage to examine the distribution of codon usage using CodonW (http://codonw.sourceforge.net/) with RSCU ratio for all protein-coding genes.
We used the complete plastome sequences from all six Korean Hosta species with 14 plastome sequences of subfamily Agavoideae (Asparagaceae) obtained from GenBank including 1 Hosta species (H. ventricosa; genome size and the GenBank accession numbers are listed in Additional file 1: Table S2). Asparagus officinalis (Asparagaceae) was set as an outgroup for the phylogeny. The 21 plastome sequences including the outgroup were aligned using MAFFT and manually edited on Geneious alignment viewer. Gaps of sequences were treated as missing. We inferred the phylogeny using two approaches, a Neighbor joining and a Maximum likelihood analyses. The NJ phylogeny was performed according to Tamura-Nei distance  in Geneious Tree Builder. We constructed ML phylogeny using RAxML v. 8.2.4 with GTR GAMMA model with 1000 bootstrap replicates for evaluating the node support. To determine the best fitting substitution model, the Akaike information criteria (AIC) implemented in jModelTest v. 2.1.10  was used.
We thank SeA Ryu, Su-Min Han, Wunggi Lee, Seong-Won Lee, Bong-Seok Kim, Eui-Ho Eom, and Prof. Sang-Tae Kim for sampling, preparing voucher specimen and laboratory assistance throughout the project.
CE designed the project. CE and BY conceived ideas and prepared funding and samples. SR analyzed the data and wrote the manuscript. KH performed additional experiments and analysis for verification of the analyzed assemblies. All authors have read and approved the manuscript.
This research was supported by National Institute of Biological Resources (NIBR), Ministry of Environment, Korea (NIBR201831101 & NIBR201922101). NIBR provided the fund required for the project and evaluated the process of running fund and overall performance of the project.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 2.Chung MG, Kim JW. The genus Hosta Tratt. (Liliaceae) in Korea. SIDA, Contrib to Bot. 1991;14:411–20.Google Scholar
- 5.Schmid WG. The genus Hosta – Giboshi Zoku (ギボウシ属). London and Portland: Batsford/Timber Press; 1991.Google Scholar
- 8.Li R, Wang M-Y, Li X-B. Chemical constituents and biological activities of genus Hosta (Liliaceae). J Med Plant Res. 2012;6:2704–13.Google Scholar
- 15.Dong W, Liu J, Yu J, Wang L, Zhou S. Highly variable chloroplast markers for evaluating plant phylogeny at low taxonomic levels and for DNA barcoding. PLoS One. 2012;7:1–9.Google Scholar
- 18.Gitzendanner MA, Soltis PS, Yi TS, Li DZ, Soltis DE. Plastome phylogenetics: 30 years of inferences into plant evolution. In: Chaw S-M, Jansen RK, editors. Advances in botanical research. London: Academic; 2018. p. 293–313.Google Scholar
- 21.Jansen RK, Cai Z, Raubeson LA, Daniell H, de Pamphilis CW, Leebens-Mack J, et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci U S A. 2007;104:19369–74.PubMedPubMedCentralCrossRefGoogle Scholar
- 24.Gao X, Zhang X, Meng H, Li J, Zhang D, Liu C. Comparative chloroplast genomes of Paris Sect. Marmorata: insights into repeat regions and evolutionary implications. BMC Genomics. 2018;19(Suppl 10):133–44.Google Scholar
- 26.Zhang Y, Du L, Liu A, Chen J, Wu L, Hu W, et al. The complete chloroplast genome sequences of five Epimedium species: Lights into phylogenetic and taxonomic Analyses. Front Plant Sci. 2016;7:1–12.Google Scholar
- 27.Zhou J, Cui Y, Chen X, Li Y, Xu Z, Duan B, et al. Complete chloroplast genomes of Papaver rhoeas and Papaver orientale: molecular structures, comparative analysis, and phylogenetic analysis. Molecules. 2018;23:1–15.Google Scholar
- 38.Lopez L, Barreiro R, Fischer M, Koch MA. Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants. BMC Genomics. 2015;16. https://doi.org/10.1186/s12864-015-2031-1.
- 49.Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993;10:512–26.Google Scholar
- 50.Darriba D, Taboada GL, Doallo R, Posada D, Europe PMC Funders Group. jModelTest 2: more models, new heuristics and high-performance computing. Nat Methods. 2015;9:6–9.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.