Background

Plant mitochondrial genomes are remarkable from both evolutionary and comparative genomics stand-points. Like their animal counterparts, plant mitochondrial genomes generally are characterized as circular chromosomes [1] (barring notable exceptions, e.g., [2, 3]) that contain a variable number of genes interspersed within non-coding DNA; however, this simplistic generalization belies the dynamic and complex nature of plant mitochondrial genomes [4]. Not only is the overall structure mitochondrial genomes an oversimplification of their possible morphologies [57], but recent comparative analyses among flowering plants have demonstrated extensive fluidity in plant mitochondrial genomes [2, 8, 9]. The structure and evolution of angiosperm mitochondrial genomes are driven by extremely high rates of recombination and rearrangement, with major rearrangements detected even in hybrid plants [10]. Paradoxically, mitochondrial genes are among the slowest evolving, and this rate paradox can be partially explained by DNA repair mechanisms [11]. DNA repair in the coding regions of the mitochondria is biased toward gene conversion, reducing the mutation rates within genes, whereas the more inaccurate break-induced replication (BIR) is common in the noncoding regions, leading to the expansions and rearrangements observed outside of genes [1215]. Consequently, plant mitochondrial genomes vary remarkably both in size and composition within plant families and genera [7, 9, 16, 17], with genome sizes ranging from 30 kilobases in some algae to several megabases in certain angiosperms [2, 3, 18]. Intraspecies comparisons suggest that plant mitochondrial genomes can be highly divergent even among different varieties of the same species [19, 20], and together with the observed genomic diversity within a single order of angiosperms [21], further indicates the remarkable diversity in mitochondrial genomes among green plants [22].

Perhaps the two most surprising recent realizations regarding plant mitochondrial genome evolution are the extensive variability in mitochondrial genome size and the compositional changes that have led to this variability. Plant mitochondrial genomes vary by an amazing 870-fold, from the ultra-compact, 12 kb (12,998 bp) genome (Accession Number: NC 010357) of the alga Polytomella capuana [23] to the spectacularly bloated 11,319 kb genome (11,318,806 bp) of Silene conica [2]. The evolutionary dynamics that underlie this remarkable variation are not fully understood; however, it is clear from several analyses that plant mitochondrial genomes are repositories for DNA from myriad sources [24]. These not only include the nucleus and chloroplast genomes of the host species itself, but may also include sequences derived from the chloroplast and mitochondrial genomes of other species [3]. Much of this sequence is large (>1 kb) and repetitive in nature [25], providing sufficient tracts of homology to promote the highly dynamic recombination evident in plant mitochondrial genomes [2527]. Indeed, it is the high rates of sequence acquisition/loss and recombination that give plant mitochondrial genomes their reputation for rapid intergenic evolution, leading to low levels of non-genic homology among even closely related species [2, 8, 28]. Furthermore, this propensity for recombination can have additional intriguing consequences, such as the generation of substoichiometic recombinant molecules [29, 30], variable chromosomal structures [7, 31, 32], and novel cytoplasmic male sterility (CMS)-inducing open reading frames (ORFs) [19, 20, 33, 34].

Despite the extensive variation in sizes and structures of plant mitochondrial genomes, their coding sequences rank among the most slowly evolving genes known [35, 36]. Although considerable gene- and lineage-specific variation in rates of gene retention/loss exist for both protein and tRNA genes [37], most sequenced angiosperm mitochondrial genomes have ~50–60 genes, including subunits of respiratory complexes, ribosomal RNAs (rRNAs), and transfer RNAs (tRNAs) [37], and a variable number of pseudogenized forms and/or copies of mitochondrial genes [3842].

Sea Island cotton (Gossypium barbadense L.) is a New World allotetraploid (2n = 52) grown in many countries because of its superior quality fiber [43]. Upland cotton (G. hirsutum), however, is more commonly grown because it is earlier maturing and has a higher yield potential, and accordingly it now accounts for about 90 % of world fiber production. Sea Island cotton (G. barbadense) accounts for only approximately 5 % of present global commerce [44]. In addition to its superior spinning performance and unique high quality fiber characteristics, Sea Island cotton is a potential source of genes for resistance to Verticillium wilt [45, 46]. The objective of the present study was to complement earlier efforts [47, 48] to generate a high-quality sequence of the mitochondrial genome of G. barbadense. We provide this sequence and compare it to the mitogenome of G. hirsutum [41], resulting in insights to the evolution of structural variation and new fields into mtDNA duplicated copy gene.

Methods

Plant materials and mitochondrial DNA extraction

Mitochondria were isolated from week-old etiolated seedlings of “Pima 90–53”, a variety of Sea Island cotton (G. barbadense L.) whose seeds were obtained from Hebei Agricultural University [40, 49]. Mitochondrial DNA was extracted from isolated organelles as reported [40, 41]. Briefly, the extraction protocol for the mtDNA of Sea Island cotton was as follows:

  1. (1)

    The seeds were planted in sand and the seedlings were kept in darkness to obtain etiolated seedlings. From these, 7 d-old etiolated seedlings were ground and used to isolate mitochondria.

  2. (2)

    Ground seedlings were collected and further purified by centrifugation in a discontinuous sucrose-density gradient (60 %, 52 %, 36 % and 20 % M/V) in purification buffer (10 mM Tris–HCl pH 7.4 and 20 mM EDTA) (Additional file 1: Figure S1)

  3. (3)

    The mitochondria band from the interface between 52 % and 36 % was carefully collected and washed with 0.3 mol · L−1 sucrose buffer to obtain the intact mitochondrial fractions.

  4. (4)

    The mitochondrail fraction was lysed in cetyltrimethyl ammonium bromide (CTAB) for release of mtDNA, and further purified by proteinase K digestion, phenol-chloroform extraction, and ethanol precipitation.

The plastid band was located in the interface between 36 % and 20 % sucrose, while the nuclei were precipitated to the bottom. PCR validation failed to detect nuclear contamination, but did detect partial contamination from plastid DNA (Additional file 2: Figure S2). To avoid contamination from chloroplast, we filtered the reads based on the sequence of Gossypium barbadense chloroplast genome before assembly.

Mitochondrial genome sequencing and assembly

Isolated Sea Island cotton mitochondrial DNA was cloned into whole-genome shotgun libraries using CopyControl Fosmid Library Production Kit (Epicentre, Cat. No. CCFOS110) and sequenced to about 700 × coverage with Solexa using paired-end, 90 bp read at Beijing Genomics Institute (BGI). Adaptor and contaminant sequences were removed from the raw reads and the clean reads were assembled using ABySS [50]. Since nuclear and chloroplast contamination is possible in the extraction procedure, BLASTn [51] against nt/nr database was used to identify and remove contaminant contigs. In addition, known mitochondrial genome sequences of G. hirsutum [41] and G. harknessii (unpublished) were also used to identify mitochondrial-type contigs. Contigs were ordered/oriented and gaps were closed via additional fosmid and BAC sequencing. Primers representing both conserved mitochondrial genes and scaffold terminals were used to screen both a fosmid library [40] and a BAC library [48, 49]. Twenty fosmid clones (also previously associated with G. barbadense mitochondria; see Fig. 5 in [40]) and two BAC clones were selected by this PCR screen and independently sequenced by Solexa and 454 sequencing methods in BGI and Shanghai Majorbio Bio-pharm Biotechnology, respectively. The resulting clones were assembled with SOAPdenovo [52] and Newbler (Version 2.53), respectively; these were then used to anchor and orient the previously assembled mitochondrial contigs into supercontigs. To close the remaining gaps, the known relationships of the fosmids were used to predict the order and orientation of contigs, and the remaining gaps were filled by LA-PCR (Long and Accurate Polymerase Chain Reaction) using the primers listed in Additional file 3: Table S1. These primers were also used to verify each contig joined.

Genome annotation and sequence analysis

Mitochondrial genes were annotated as reported [16], using the genes annotated in the G. hirsutum mtDNA as references. Functional genes (other than tRNA genes) were identified by local blast searches against the database, whereas tRNA genes were predicted de novo using tRNAscan-SE [53]. A genome map (Fig. 1) was generated using OGDRAW [54] and the repeat map was drawn by Circos [55].

Fig. 1
figure 1

Genome map of Gossypium barbadense mitochondrial genome. The map shows both the gene map (outer circle) and repeat map (inner map). Genes exhibited on the inside of outer circle are transcribed in a clockwise direction, while genes on the outside of outer circle are transcribed in a reverse direction. The inner circle reveals the distribution of repeats in G. barbadense mt genome with curved lines and ribbons connecting pairs of repeats and width proportional to repeat size. The red ribbons represent > = 1 Kb repeats and the blue lines represent repeats between 100 bp to 1 Kb. The numbers give genome coordinates in kilobases

The newly generated G. barbadense sequence was aligned to the published G. hirsutum mitochondrial sequence [41], and the values of dS and dN/dS were evaluated with PAML4 [56]. PipMaker was used to identify repeated sequences within G. barbadense [57], and repetitive DNA from nuclear sources was identified using RepeatMasker (http://www.repeatmasker.org) and a custom, Gossypium-enriched repeat database. Dot matrix comparisons were generated between the mitochondrial genome of G. barbadense and those of Arabidopsis thaliana, Carica papaya, and G. hirsutum using the nucmer program of MUMmer with the parameters: 100-bp minimal size for exact match and 500-bp minimal interval between every two matches [58]. We used Circos plots [55] to show the collinear relationships between G. barbadense and G. hirsutum mitochondrial genome sequences. Possible pseudogenes and non-functional tRNAs were predicted using previously published mitochondrial genomes, and the distribution of pseudogenes was drawn by program pheatmap in R. A phylogenetic tree was constructed based on 17 conserved mitochondrial genes (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad9, cob, cox1, cox2, cox3, atp1, atp4, atp6, atp8, atp9) using maximum likelihood (ML) method with the model GTR + G + I in MEGA5.05 [59].

Results and discussion

Assembly of the complete G. barbadense mitochondrial genome

A total of 607 Mbp (sequence coverage: 867×) of clean reads was generated for the G. barbadense mitochondrial genome. These reads were initially assembled into 14 contigs (average length = 43,530 bp; putative contaminant contigs removed), ranging in size from 10,246 bp to 105,651 bp. Because repeated sequences hinder the assembly of these contigs into a single circular chromosome, 20 Fosmid clones and two BAC clones were sequenced and used to inform the order and orientation of these contigs. In addition, the previously published physical map of the G. barbadense mitochondrial genome was also used [40]. The order and orientation of contigs was confirmed and remaining gaps were filled using PCR (Additional file 3: Table S1). The Sea Island mitochondrial genome was assembled as a 677,434 bp circular molecule with four large repeats (Fig. 1) (Genbank Accession Number KP898249), similar to an earlier prediction of mitogenome size (690–700 kb) [60].

Comparative analysis of G. barbadense and G. hirsutum mitochondrial genomes

The mitochondrial genomes of G. barbadense and G. hirsutum [41] are largely similar; however, as observed in other genera, many differences exist even between these closely related species (Table 1). The size difference between the mitochondrial genomes of G. barbadense and G. hirsutum is about 9 %, representing almost 56 kb of additional sequence in G. barbadense. In terms of nucleotide composition, the two mitochondrial genomes are almost identical, with the GC content of G. barbadense and G. hirsutum being 44.98 % and 44.95 %, respectively. Likewise, a similar number of genes were predicted for both, with G. barbadense having seven more functional genes annotated than did G. hirsutum (75 versus 68 genes, respectively; Table 2), including 4 additional protein coding genes, 2 additional rRNA genes, and one more tRNA gene, generating a slightly higher gene length in the G. barbadense than in the G. hirsutum mitochondrial genome (36.4 kb versus 31.7 kb). In total, 40 protein coding genes, 6 rRNA genes, and 29 tRNA genes were predicted for G. barbadense. Most of these genes were intact, even in the duplicate copies; however, both the sole nad1 and the rps3 copy displayed deviations from expectations for intact genes (compared to G. hirsutum, the G. barbadense mitochondrial genome contains extra nad1b and nad1c exons. The truncated rps3 is 544 bp shorter than the intact copy in G. hirsutum.).

Table 1 General features of mitochondrial genomes of G. barbadense and G. hirsutum
Table 2 Genes identified in the G. barbadense mitochondrial genome

As with the annotated genes, the amount of chloroplast-derived sequence was similar between the two mitochondrial genomes, with G. barbadense having 1.42 kb less identifiable chloroplast-derived sequence (Table 1). In G. barbadense, 19 fragments ranging from 35 bp to 2,203 bp in size, contribute 5,383 bp of sequence to the genome (>1 %; Table 1 and Additional file 4: Table S2) versus 6,833 bp in G. hirsutum. Most of the inserted sequences in both cases were either non-coding or were tRNAs. With respect to tRNAs, both have nearly the same set of tRNAs; however, G. barbadense has additional copy of trnD(GTC)-cp, but lacks one of the five conserved cp-derived tRNAs [41] (chloroplast-derived trnP).

Together, the differences between the two cotton mitochondrial genomes attributable to gene or chloroplast-derived sequence represent a small fraction of the difference in genome size (~5 % of the total size difference). As expected from the nature of plant mitochondrial genomes, the greatest difference was in the proportion of repeated sequences, with approximately 1.8 times more sequence in G. barbadense derived from repetitive sequences than in G. hirsutum (21.27 %). Interestingly, the amount of sequence attributable to identifiable transposable elements comprised only 17.3 % and 26.6 % of the repetitive sequences detected in the G. barbadense and G. hirsutum mitochondrial genomes, respectively. The remainder of the sequence was unclassified repetitive sequences contained within the mitochondrial genomes themselves. As with nuclear genomes, gypsy elements comprised the largest fraction of the identifiable repetitive sequences, and followed by unclassified LTR-retrotransposons and transposable elements.

The presence and distribution of short repeats also distinguished the two mitochondrial genomes, with 207 and 343 repeats larger than 19 bp in G. barbadense and G. hirsutum, respectively (Table 3). As in G. hirsutum, G. barbadense short repeats were typically small (20 bp to 39 bp) [41]. Therefore, while the short repeats were more numerous, their small length had relatively little effect compared to the large repeats (>10 kb; average size in G. barbadense = ~28 kb) (Fig. 1 and Additional file 5: Table S3). In fact, most of the genome expansion in G. barbadense is attributable to the largest repeat (R1 = 63,904 bp), contributing a full 18.9 % of the genome, as well as several duplicated rRNA genes (rrn5 and rrn18). Such large repeats have precedence in plant mitochondrial genomes, including, for example, a 120-kb repeat in maize [5] and an 87-kb repeat in Beta [61]. In total, the proportion of repeats in G. barbadense was nearly 1.5 times that of G. hirsutum (Table 3).

Table 3 Frequency distribution of repeat lengths in the mitogenomes of G. hirsutum and G. barbadense

Syntenic regions and rearrangement

Syntenic regions were identified between G. barbadense and A. thaliana, C. papaya, and G. hirsutum, respectively. Plant mitochondrial genomes are known to experience myriad synteny-disrupting rearrangements over short evolutionarily timescales, and, reflecting this, appreciable synteny was limited to the G. barbadense - G. hirsutum comparison (Fig. 2). A set of 8 sequence blocks larger than 10 Kb with high homology (>99.8 %) were detected between the G. barbadense and G. hirsutum mitochondrial genomes, here named block 1 to block 8, respectively (Additional file 6: Table S4). The sizes of these eight syntenic blocks ranged from 33.0 kb (block 4; Fig. 3) to 131.5 kb (block 8; Fig. 3). Interestingly, after the four large repeats (R1-R4) were identified on the G. barbadense mitochondrial genome (Fig. 1 and Additional file 7: Figure S3), we also found a short direct repeat “R08” (Additional file 5: Table S3) at the ends of large repeat R1 (Fig. 1 and Additional file 7: Figure S3). Interestingly, R1 is duplicated in G. barbadense whereas it exists as single copy in G. hirsutum, suggesting either a gain in G. barbadense or a loss in G. hirsutum. Compared to the bordering syntenic block 2 and block 8 (Additional file 7: Figure S3), the small repeats at the ends of R1 that might account for the large duplication event and supply some information on the origin of R1 since the divergence from a common ancestor. It bears noting, however, that the assembled circular map likely represents only one of several possible actual configurations of the genome. Mitochondrial repeats frequently recombine, resulting in an equilibrium composed of multiple configurations (Additional file 7: Figure S3). As both species of cotton probably include several isoforms, differing by repeat-based configurations. The placement of these repeats relative to other syntenic blocks suggest there exists interspecies reorganization during the evolution of G. barbadense and G. hirsutum. Notably, however, the rearrangements detected between these two mitochondrial genomes did not disrupt gene clusters, which mostly were in syntenic regions. Further sequencing of additional cotton mitochondrial genomes will be necessary to elucidate the extent and fluidity of genomic rearrangements in cotton mitochondrial genomes.

Fig. 2
figure 2

Dot matrix analyses between G. barbadense and G. hirsutum, C. papaya, A. thaliana (individually) by whole genomic alignment. The blue and red lines refer inverted and direct syntenic regions, respectively

Fig. 3
figure 3

Syntenic blocks larger than 10 Kb between G. barbadense and G. hirsutum with curved ribbons connecting pairs of syntenic blocks and width proportional to blocks size. The numbers give genome coordinates in kilobases

Nucleotide-level changes in cotton mitochondrial genomes

Synonymous substitution rates (Ks values) of orthologous gene pairs serve as a useful measure of evolutionary distance [62]. The average Ks values for 35 collinear mitochondrial gene pairs were 0.051 for either G. barbadense or G. hirsutum versus C. C. papaya (Fig. 4), about 1/10th the value for nuclear genes [63]. These data indicate the commonly observed low mutation rates for mitochondrial genes, likely because of efficient DNA repair mechanisms [12, 13]. These data, as well as paired t-tests (P = 0.957 > 0.05) indicate that the two Gossypium mitochondrial genomes have had equal mutation rates. dN/dS ratios for six genes (nad6, ccmB, ccmFN, sdh3, sdh4, matR) in both mitochondrial genomes were greater than 1 (Table 4), suggesting that these genes may have experienced positive selection during divergence from the common ancestor of Gossypium and C. papaya.

Fig. 4
figure 4

Distribution of Ks values between two cotton mitochondrial genomes and C. papaya

Table 4 dS and dN/dS values of 35 genes between two cotton mitochondrial genomes and C. papaya

Pseudogenes in mitochondrial genomes of land plants

As mentioned above, the suite and synteny of genes was largely conserved between G. barbadense and G. hirsutum. Likewise, both cotton genomes shared the relatively few potential pseudogenes. This is interesting because while complex I, III, IV, and V genes (nad, cox, cob, and atp genes, respectively) are generally universally conserved in land plant mitochondrial genomes [37], pseudogenes also are ubiquitous [64, 65]. To explore further the patterns of pseudogenization in mitochondrial genes, we analyzed all 41 currently sequenced mitochondrial genomes deposited in NCBI (Table 5). This comparison revealed that: (1) pseudogenes may arise from any category of mitochondrial genes and from the chloroplast genome; (2) the frequency of pseudogenization (Fig. 5) is highest for ribosomal protein genes, and lower for genes encoding subunits of the respiratory chain proteins. This is consistent with a prior analysis of pseudogene distribution of 41 protein-coding genes among 20 land plants mitochondrial genomes [3], who also reported that pseudogenes mainly occurred in complex II subunit of the respiratory chain (sdh genes) and ribosomal protein genes (rps genes and rpl genes); (3) some pseudogenes are lineage-specific (e.g., in Oryza sativa subsp. japonica and Oryza sativa subsp. indica, Table 5); and (4) the presence of multi-copy pseudogenes in some mitochondrial genomes (e.g., rpl16, atp9, rps3, etc.), as observed here and in previously [39] (Vitis vinifera), may indicate further duplication during pseudogene formation. Recent research has shown that some pseudogenized genes followed endogenous functional gene transfer to the nucleus [37] leading to the gradual mutational degradation of the corresponding mitochondrial copies. In addition, the tendency for ribosomal genes to pseudogenized more frequently may be associated with three sets of translation systems in a single cell compartment that leads to more “gene replacement” [37]. Analysis of additional mitochondrial genomes will help illuminate these trends.

Table 5 Pseudogenes in 41 mitochondrial genomes sequenced
Fig. 5
figure 5

Distribution of pseudogenes in 41 mitochondrial genomes sequenced of land plants. The scale of pseudogene number is depicted on the side, and ranges from 0 copies to 2 copies, with 0.5 representing possible pseudogenes. The numbers 1–41 on the x-axis correspond to the number given to each mitochondrial genome (see Table 5)

rps3 gene transfer in pieces into the Gossypium mitochondrial genome

Like Vitis, rps3 have partial duplicated copy in the mtDNA of the Gossypium. In both the mitochondrial genomes of G. barbadense and G. hirsutum, there was a duplicated copy of rps3 (rps3-2) which was nearly identical to its corresponding ortholog rps3. Horizontal gene transfer (HGT) into mitochondrial genomes is a propensity noted previously [16, 3739, 66], however, the primary source of the divergent copy of rps3 is not from HGT but rather from the mtDNA of Gossypium itself. The full-length gene of rps3 in Gossypium is 3,401 bp, and contains two exons and one intron (Fig. 6). In both Gossypium mitochondrial genomes, however, rps3-2 is truncated at the end of the second exon (Fig. 6a). The missing part of this exon in rps3-2 was not found elsewhere in either cotton mitochondrial genome, even when using a relaxed BLAST of 1e−10 to 1e−6. To explore the possibility that the latter half of this exon was copied within the cotton mitochondrial genome and then subsequently migrated to the nucleus, we used the published genomes of G. raimondii (D5) [67, 68] and G. arboreum (A2) [69] as BLAST databases (with a cutoff of 1e−10). Interestingly, the latter half of exon 2 (exon 2–2; Fig. 6) was recovered from G. arboreum chromosome 5 (only), along with 755 bp of additional mitochondrial sequence derived from the flanking region of rps3 (Figs. 6 and 7). The percent identity between the intact mitochondrial sequence and the nuclear copy is ~97 %, which is similar to the average difference in non-coding regions for nuclear genes in the A- and D- genome cottons. These observations are interesting for two reasons. First, the recovery of this mitochondrial sequence from the G. arboreum (A-genome) only, which is also the model maternal progenitor for both G. barbadense and G. hirsutum [70], suggests that this mitochondria to nuclear transfer occurred subsequent to the divergence of the A- and D- genomes of cotton, which is estimated to have been 5–10 mya; the level of sequence divergence suggests that the transfer occurred shortly after the divergence of the A- and D- lineages. Second, the formation of rps3-2 was complex, involving both sequence duplication and intracellular transfer. As shown in Fig. 6, the sequence R2′ (28,235 bp) was duplicated (sequences in red rectangular box in Fig. 6b), including part of rps3 transferred to nuclear genome, and the remnant sequences of R2′ remained in mitochondrial genome. These remnant sequences became rps3-2 and R2, respectively.

Fig. 6
figure 6

The putative origin mechanism of rps3-2 and R2. a A structural comparison of rps3 and rps3-2 is shown in the top panel. Exons are depicted as black bars, introns as straight lines, and the striped box indicates the exonic sequence lost in rps3-2. Locations for each of the exons are given in parentheses b The lower panel illustrates the possible formation mechanism of rps3-2 and R2. A2: Chr. 5 represents chromosome 5 in the G. arboreum genome and MT represents mitochondrial genome of G. barbadense, with the top MT graph indicating the arrangement before transferring and the bottom indicating the arrangement after intracellular transferring. The red rectangle and red bar indicate the transferred sequences from the mitochondrial genome to the nuclear genome, respectively. The blue bar represents the flanking sequences transferred along with the latter half of exon 2, depicted again as a striped box. Included in these graphs are the bordering regions between rp3 and R2′ (28,235 bp) and between rps3-2 and R2 (26,936 bp)

Fig. 7
figure 7

Alignment of rp3 from the G. barbadense mitochondrial genome and the corresponding transfer found in the G. arboreum nuclear genome. Here, 1–755 bp in the alignment represents the 5′ flanking regions of rps3, and 756–1,298 bp consists of exon2-2 from the intact and transferred rps3 copy, respectively

Patterns of tRNA presence in plant mitochondrial genomes

While plant mitochondrial genomes possess native tRNAs, nuclear-encoded tRNAs need to be imported from the cytosol to compensate for those that are missing [7173]. In both Gossypium genomes, four (trnA, trnL, trnR and trnT) of the 20 tRNAs are absent from the mitochondrial genome, and therefore must be imported from the cytosol. To evaluate the patterns of loss of tRNAs during the evolution of plant mitochondrial genomes, we analyzed tRNAs in 37 land plant mitochondrial genomes (Fig. 8). Of the genomes analyzed, only the non-seed plants Marchantia polymorpha, Pleurozia purpurea and Treubia lacunose have a complete set of tRNAs. Patterns of presence/absence suggest that trnA was lost early in the evolution of seed plants, while trnL, trnR, trnT, and trnV were lost during the evolution of the eudicots. Interestingly, trnV exists in both Gossypium and B. vulgaris; however, these may both represent subsequent gains, as BLAST comparison of the trnV copy in Gossypium shows more than 99 % identity to the corresponding copy in the Gossypium chloroplast (Table 1). Similar to the observation for the eudicots, trnG was lost early during monocot evolution. Finally, S. latifolia and P. dactylifera experienced rapid loss of large numbers of tRNAs [74]. Overall, only trnC, trnE, trnM, trnP and trnY are present in all species evaluated, indicating that these tRNAs may be most conserved in plant mitochondrial genomes.

Fig. 8
figure 8

The loss of tRNAs in 37 plant mitochondrial genomes. The phylogenetic tree was constructed based on nucleotide sequences of 17 mitochondrial genes including nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad9, cob, cox1, cox2, cox3, atp1, atp4, atp6, atp8 and atp9 using maximum likelihood (ML) method with the model GTR + G + I in MEGA5.05 [59]. The amino acids at the nodes represent the corresponding tRNA gene losses from mitochondrial genomes, and the arrow stands for tRNA loss in S. latifolia

Conclusion

Mitochondrial genomes of plants are evolutionarily intriguing because of their highly conserved genic content and slow rates of genic evolution [1113], features which contrast sharply with their highly labile genomic structure, genome size, DNA repair mechanisms and recombination induced by different types and origins of repeated sequences. Common evolutionary modifications of mitochondrial genomes include gene loss [75, 76]; intracellular, intergenomic transfers [37, 75, 77, 78]; sequence acquisitions, horizontal transfers from other, sometimes distantly related species [3]; multiple sequence rearrangements [21] and DNA repair mechanisms [1113]. Here we compare the mitochondrial genomes of two closely related allopolyploid cotton species, which diverged only 1–2 mya and share the same organellar ancestry [70, 79]. Despite the short divergence time separating G. barbadense and G. hirsutum, many of the hallmark features of mitochondrial genome evolution are evident, including differential genic content, gains/losses of multiple small and large repeats, and genome rearrangements, horizontal transfer, and the evolution of duplicated genes. We illustrate how phylogenetic analysis combined with divergence data can illuminate the timing of duplicated gene formation and of differences in mitochondrial tRNA and protein coding gene content. Increasing insight into the mechanisms and functional consequences of mitochondrial gene and genome variation are expected as additional plant mitochondrial genome sequences become available.

Availability of supporting data

The data sets supporting the results of this article are included within the article and its additional files.