Comparative genomics reveals conservative evolution of the xylem transcriptome in vascular plants
- 7.1k Downloads
Wood is a valuable natural resource and a major carbon sink. Wood formation is an important developmental process in vascular plants which played a crucial role in plant evolution. Although genes involved in xylem formation have been investigated, the molecular mechanisms of xylem evolution are not well understood. We use comparative genomics to examine evolution of the xylem transcriptome to gain insights into xylem evolution.
The xylem transcriptome is highly conserved in conifers, but considerably divergent in angiosperms. The functional domains of genes in the xylem transcriptome are moderately to highly conserved in vascular plants, suggesting the existence of a common ancestral xylem transcriptome. Compared to the total transcriptome derived from a range of tissues, the xylem transcriptome is relatively conserved in vascular plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known proteins and transcription factors are relatively more conserved in vascular plants. A total of 527 putative xylem orthologs were identified, which are unevenly distributed across the Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis revealed that evolution of the xylem transcriptome has paralleled plant evolution. We also identified 274 conifer-specific xylem unigenes, all of which are of unknown function. These xylem orthologs and conifer-specific unigenes are likely to have played a crucial role in xylem evolution.
Conifers have highly conserved xylem transcriptomes, while angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a common ancestral xylem transcriptome. The xylem transcriptomes of vascular plants are more conserved than the total transcriptomes. Evolution of the xylem transcriptome has largely followed the trend of plant evolution.
KeywordsVascular Plant Conifer Species Secondary Xylem Wood Formation Primary Xylem
million base pairs
million years ago
expressed sequence tag
E-value or expected-value
pine gene index
spruce gene index
Populus gene index
- PGI xylem
gene index from pure xylem tissues of pines
- SGI xylem
gene index from pure xylem tissues of spruces
- PplGI xylem
gene index from pure xylem tissues of Populus
- VSX genes
vascular plant-specific xylem genes
caffeic acid 3-O-methyltransferase
fasciclin-like arabinogalactan protein
cinnamyl alcohol dehydrogenase.
The evolution of xylem was a critical step that allowed vascular plants to colonize vast areas of the earth's terrestrial surface. Xylem plays an essential role in the transport of water and nutrients and provides mechanical support for vascular plants. Herbaceous vascular plants develop primary xylem, and woody plants also produce secondary xylem (or wood). Evolution from tracheids to vessels reflects a more efficient way for angiosperms to develop secondary xylem [1, 2, 3]. Primary xylem consists of cellulose, hemicellulose, pectin and proteins, while secondary xylem contains higher amounts of cellulose and lignin. From a practical point of view, wood represents a renewable natural resource for the timber, fibre and biofuel industries and it is a major carbon sink in natural ecosystems.
In the last few decades the molecular basis of xylem formation and evolution has been investigated. Syringyl (S) lignin biosynthesis was found to be an evolved lignin pathway in angiosperms, representing an addition to the ancient and predominant guaiacyl (G) lignin pathway conserved in land plants [3, 4, 5, 6]. However, the S lignin pathway was also recently observed in lycophytes , suggesting a complex evolutionary history of lignin biosynthesis in vascular plants. Genes involved in wood formation have been identified in many plant species [8, 9, 10, 11, 12, 13], allowing investigation of xylem evolution at the transcriptome level. A core xylem gene set was identified in white spruce among which 31 transcripts are highly conserved in Arabidopsis . The expanding genomic resources of model plant species [15, 16, 17, 18] are invaluable for exploring many aspects of plant development and evolution, including xylem formation and evolution. Recent studies suggest whole-genome duplication and reorganization [15, 16, 17] have been a major driving force in plant evolution.
Comparative genomics is a powerful tool for investigating plant evolution at the whole-genome level [19, 20, 21, 22, 23, 24, 25, 26, 27]. Lower collinearity across dicots and monocots [16, 21] and lineage-specific genes have been observed using comparative genomics [28, 29]. Comparisons between rice and Arabidopsis showed that 50% of the rice genome was homologous to Arabidopsis , while as much as 88% of the poplar genome shares homology with Arabidopsis . Comparative genomics of poplar with Arabidopsis revealed at least 11,666 protein-coding genes were present in the ancestral eurosid genome . The recently sequenced Selaginella moellendorffii , moss  and Eucalyptus grandis  genomes provide new opportunities for comparative genomics in vascular plants.
Many commercial crops and forest trees do not have sequenced genomes; thus, comparative genomics in these species must rely on studies of the transcriptome (expressed sequence tags, ESTs). Using a transcriptome approach about half of loblolly pine xylem unigenes did not match Arabidopsis unigenes , and 39-45% of pine contigs had no hits in the genomes of Arabidopsis, rice and poplar . In white spruce and sitka spruce between 30 and 36% of the unigenes lacked homologs in poplar, Arabidopsis and rice [12, 33]. Similar results were also observed in other comparisons between gymnosperms and angiosperms [19, 34]. By contrast, comparison of different gymnosperm species revealed highly conserved transcriptomes. For example, about 84% of the white spruce transcripts had matches in the Pine Gene Index , and 60-80% of spruce contigs had homologs in loblolly pine and other gymnosperms .
Although comparative genomics has increased our understanding of plant evolution at the genome level, most studies to date have focussed on the entire genome sequence, and/or mixed EST resources developed from a range of tissues. In order to examine xylem evolution in vascular plants we focused our attention on genes expressed specifically in xylem. We selected ten species which represent different categories of vascular plants, including gymnosperms (pine and spruce), angiosperms (woody dicots, herbaceous dicots, and monocots) and lycophytes. The non-vascular plant moss was included as a reference for xylem evolution. Up-to-date public genome sequences and xylem transcriptomes of selected plant species were used for comparative genomic analysis, aiming to explore the evolution of the xylem transcriptome in vascular plants.
The xylem transcriptome is highly conserved in conifer species
Conifer xylem transcriptomes are significantly distinct from the xylem transcriptome or genomes of angiosperms, lycophytes and moss (Additional file 1A-B). For example, at the nucleotide level only 1-4% (E ≤ 1e-50) and 15-32% (E ≤ 1e-5) of radiata pine xylem unigenes have homologs in angiosperms, Selaginella and moss. Furthermore, average percentage of homologs between conifer xylem transcriptomes and angiosperms (xylem transcriptome or entire genome) at the nucleotide level is only 2% (E ≤ 1e-50) or 23% (E ≤ 1e-5) (Figure 1A); while even lower percentages (1% or 16%) are observed between conifers and Selaginella & moss. Comparisons between two xylem transcriptomes would be expected to reveal fewer homologs than comparisons between a xylem transcriptome and a total transcriptome or a whole genome. The low percentage of homologs detected is statistically significant (P < 0.001) compared to the high percentage of matches among conifer xylem transcriptomes. Unsurprisingly, the deduced amino acid sequences of conifer xylem unigenes have relatively more homologs in non-coniferous plants (Additional file 1B and Figure 1B). However, they are still significantly (P < 0.001) lower than the homologous xylem unigenes among the three conifers.
We further examined the nucleotide identity of homologous unigenes in the xylem transcriptomes of different conifers as well as homologs between conifer xylem transcriptomes and unigenes of other plant species. The nucleotide identity between radiata pine and loblolly pine is consistently higher (97.5%) at different E-value cut-offs ranging from 0 to 1e-5, than between radiata pine and the two spruce species (91%) (Additional file 1C). By contrast, lower nucleotide identity (81-83%) was observed in the homologs between radiata pine and angiosperms (Populus, Arabidopsis and rice) or moss. Similar results were also observed in the analyses with loblolly pine (Additional file 1C). The average nucleotide identity of homologous xylem unigenes in different conifers is about 92%, significantly higher (P < 0.01) than the homologs between conifers and angiosperms & moss (about 82%) (Additional file 1D). Thus, comparisons of nucleotide identity provide further evidence of the higher conservation of the xylem transcriptomes in conifers and its divergence in angiosperms and other plants.
The xylem transcriptome has evolved considerably in angiosperms
The xylem transcriptome is relatively more conserved in vascular plants
The conservative evolution of the xylem transcriptome was more clearly observed when using deduced amino acid sequences (Figure 3). For example, 37-46% of the conifer xylem transcriptome matched the genomes of angiosperms (Populus, Arabidopsis and rice), lycophyte and moss; nearly two times more than the matches with the total conifer transcriptome. Similar results were also obtained with the poplar xylem transcriptome in the comparisons with the total transcriptome. Therefore, the functional gene sequences in the xylem transcriptome are significantly more conserved in vascular plants than that in the total transcriptome.
Different functional gene groups have distinct patterns of evolution
Cell wall genes were identified according to the CellWallNavigator  and MAIZEWALL  databases. We compared the level of conservation of cell wall genes to non-cell-wall genes (Figure 4A-D). In the four woody species (radiata pine, loblolly pine, white spruce and poplar) the cell wall genes have 1.5 to two times more homologs than the non-cell-wall genes in the genomes of the four herbaceous model plants (Figure 4A-D). This suggests that cell wall genes have been highly conserved across diverse land plants. Comparisons of transcription factors (TFs) (based on PlantTFDB ) with non-TFs and known functional genes with unknowns revealed significantly more conservation of transcription factors and known functional genes among different groups of vascular and non-vascular plants (Additional file 2A-D).
Identification of putative xylem orthologs in vascular plants
Of the 1,003 loblolly pine xylem unigenes with homologs (tblastx or blastx, E ≤ 1e-50) in the six vascular plants, 94% and 100% have hits in the UniProt known proteins and TIGR gene indices databases, respectively, and 94% are assigned with GO terms (Additional file 3). Among these unigenes 349 (35%) were identified as cell wall genes, and the largest gene families are tubulin (16 unigenes) and cellulose synthase (CesA) (10). Several primary wall genes are abundant including pectate lyase (8), pectinesterase (7), XET (7), expansin (6) and peroxidase (5). Lignin biosynthesis-related genes are well represented in the cell wall gene lists, including SAMS (11), laccase (9), methionine synthase (cobalamin-independent) (6), 4CL (3), COMT (2), CAD (2) and CCoAMT (3). In addition, 49 transcription factors were found, including NAC, PHD, HB, MYB, LIM, CCCH, etc. The large number of transcription factors suggests that wood formation involves considerable transcriptional regulation. Interestingly, aquaporins (16 unigenes) is one of the largest gene families in the 1,003 unigenes, reflecting their central importance in water transport in xylem.
The 1,003 loblolly pine xylem unigenes have between 749 and 894 close homologs (blastx or tblastx, E ≤ 1e-50) in white spruce, sitka spruce, poplar, Arabidopsis, rice and Selaginella, with 527 xylem orthologs common to the above 6 species. These common xylem orthologs matched (blastx, E ≤ 1e-50) 501 unique gene models in moss, indicating that at least 501 ancestral xylem orthologs are shared in land plants. We also identified all loci in Arabidopsis and rice with homology to the 1,003 loblolly pine xylem unigenes. All possible homologs in Arabidopsis are represented by 3,115 (E ≤ 1e-50) or 6,563 unique loci (E ≤ 1e-5), and in rice by 3,057 or 8,458 unique loci. Based on the numbers of close homologs in Arabidopsis (785) and rice (776), an average of 4-8 and 4-11 paralogs may have arisen from each xylem ortholog in these two species, respectively.
Molecular evolution of xylem orthologs in vascular plants
From individual phylogenetic trees of 501 ancestral xylem orthologs, 235 (46.9%) genes showed an evolutionary pattern that generally followed the trend of plant evolution (Figure 7C). These genes are likely more sensitive to evolutionary forces. Molecular evolution of other xylem orthologs (266, 53.1%) did not closely resemble the general pattern of plant evolution. A large proportion of orthologs in this category (158, 31.5%) had trees with similar patterns between angiosperms (poplar, Arabidopsis and rice) and ancient plant species (moss and/or Selaginella), but quite distinct branches for the gymnosperms (pine and spruce) (Figure 7D). These genes have possibly undergone slower evolution, although considerable evolution appears to have occurred within the gymnosperms. This type of genes may have unique and important roles in softwood formation. Other orthologs (108, 21.6%) exhibit complex evolutionary patterns which are poorly correlated with plant evolution, suggesting that genetic shift might be a major driven force for their evolution.
Using GO terms at various levels we compared gene functions between xylem orthologs which paralleled and diverged from plant evolution. Genes involved in cellular components and biological process are significantly (P-value < 0.05) more abundant in the divergent orthologs (Additional file 7). Further analyses with GO terms at lower levels (Additional file 7) suggest that genes involved in ribosome, transporter activity and translation tend to parallel plant evolution; while genes with functions in cell parts, binding and metabolic process have distinct patterns of evolution. Among cell wall genes 4CL-like, XET and beta-galactosidase have a molecular evolutionary pattern paralleling plant evolution. However, many other cell wall genes have not tightly followed plant evolution, such as CCoAMT, pectate lyase, pectinesterase, sucrose synthase, actin, alpha tubulin, COMT, CesA, FLA, laccase and peroxidase.
Identification of putative conifer-specific xylem genes
A total of 274 loblolly pine xylem unigenes which matched unigenes of white spruce and sitka spruce (tblastx, E ≤ 1e-50), but have no homologs in the gene models of poplar, Arabidopsis, rice, Selaginella and moss, as well as no hits in any other angiosperms in the UniProt known protein database (blastx, E > 1e-5), were identified as putative conifer-specific xylem unigenes (Additional file 8). All conifer-specific xylem unigenes are unknown protein sequences based on the UniProt database, and only 16% were assigned with GO terms in the TIGR gene index database. This contrasts with 94% of the xylem orthologs annotated with GO terms. The poor annotation of the conifer-specific xylem genes is likely due to the intense functional characterisation of genes that has taken place in angiosperms.
Of the conifer-specific xylem unigenes several relatively abundant transcripts have low homology to arabinogalactan-like proteins (AGPs), glycine/proline-rich proteins (GPRP), metallothionein-like class II proteins, neurofilament triplet H proteins, zinc finger proteins, cytochrome c1, etc (Additional file 8). Genes with low similarity to other cell wall proteins (such as alpha tubulin, cellulose synthase like and extensin-like proteins) are present. The conifer-specific xylem unigenes also include genes similar to transcription factors, i.e. Aux/IAA, NAC, CCAAT-binding, WRKY, R2R3-MYB, BHLH and CCAAT-binding (Additional file 8). These conifer-specific xylem unigenes may share the functions of related homologues, but some may have unique roles in gymnosperm wood formation.
Gymnosperms and angiosperms have distinct patterns of xylem transcriptome evolution
Our data revealed that the xylem transcriptome is highly conserved in conifers but has undergone considerably more diversification in angiosperms. This suggests the xylem transcriptome has a distinct pattern of evolution in angiosperms compared to gymnosperms. Pine and spruce diverged 120-140 Ma , slightly earlier than the radiation of angiosperms (100-120 Ma) [16, 17]. The rapid evolution of the angiosperm xylem transcriptome suggests greater sensitivity to evolutionary forces in flowering plants. Our results also showed that poplar has a relatively distinct genome and xylem transcriptome, thus poplar is unlikely to be a useful model for investigating wood formation and other developmental processes in gymnosperms. Mechanisms underlying the diverse patterns of xylem transcriptome evolution in angiosperms and gymnosperms are not well understood. Gene duplications and segmental chromosome rearrangements may be the major driving forces for this diversification; however, the role of genome size or gene number remains unclear.
The common ancestor of the xylem transcriptome in vascular plants
Our data demonstrated that the nucleotide and protein sequences of the xylem transcriptome are highly diversified among different categories of vascular plants, but that their functional domains are moderately to highly conserved in vascular plants. This is consistent with the diversification of all vascular plants from a common ancestor which is thought to have lived about 400 Ma . Some of these conserved patterns are also consistent with comparisons using the total transcriptome in previous studies [32, 43]. The conserved functional domains of the xylem transcriptome suggest the existence of a shared ancestral xylem transcriptome of vascular plants.
The common ancestral xylem transcriptome should be present in the earliest ancient vascular plants, which produced hilate/trilete spores during the Late Ordovician . The lycophyte Selaginella is among the closest living relatives of these plants and its xylem transcriptome is likely to resemble most closely the ancestral xylem transcriptome. The 5,047 (27%) xylem unigenes of loblolly pine with homologs (E ≤ 1e-5) in the six vascular plants including Selaginella could largely represent the ancestral xylem transcriptome. Because the vast majority (95%) of the ancestral xylem transcriptome has homologs (E ≤ 1e-5) in the moss genome, the xylem transcriptome is likely to have evolved largely from the cell wall transcriptome of non-vascular plants, which can be traced back as far as 450 Ma .
Conservative evolution of the xylem transcriptome in vascular plants
Xylem functions in the transport of water and solutes throughout the plant and together with the phloem makes up the vascular transport tissues of plants. Our data suggests that the xylem transcriptome is relatively more conserved than the total transcriptome in vascular plants. Several functional gene groups within the xylem transcriptome are significantly more conserved among vascular plants, including cell wall genes, ancestral xylem genes, transcription factors and known protein genes. This implies that xylem has evolved more slowly compared to other organs of vascular plants such as leaves, flowers and seeds. On the other hand, the unknown function and VSX genes have been more rapidly evolving in vascular plants. The rapid evolution of VSX genes contrasts with the generally conservative evolution of the total xylem transcriptome, suggesting VSX genes are particularly sensitive to evolutionary forces. These genes will be useful targets for functional characterisation in order to understand xylem formation and evolution in vascular plants.
Our data showed that the loblolly pine xylem transcriptome has significantly more homologs in spruce (white spruce and sitka spruce) than in poplar and Eucalyptus, thus it is possible to identify genes specific to gymnosperm wood formation (softwoods). Interestingly, the number of homologs of the loblolly pine xylem transcriptome in woody angiosperms (poplar and Eucalyptus) is similar to that in herbaceous angiosperms (Arabidopsis and rice), suggesting that evolution in a small number of xylem genes in angiosperms gave rise to woody plants. Furthermore, the number of homologs of the conifer xylem transcriptome in angiosperms is only slightly more than the number in lycophytes and moss. The limited number of xylem genes specific to woody plants suggests that differential transcriptional regulation may have played an important role in xylem evolution; in a similar way that transcriptional regulation gives rise to plant diversity .
Comparison of gene expression between softwoods and hardwoods
A previous study in Cryptomeria japonica identified 56 putative conifer-specific transcripts, including three specific to reproductive organs and one (unknown) specific to woody tissues . Here we identified 274 conifer-specific xylem unigenes, all of which are of unknown function. Transcripts with low homology to some cell wall genes (i.e. tubulin, CesA and AGP) are abundant or present in the conifer-specific xylem unigenes (Additional file 8). These cell wall gene families may include some unique members which are specific to conifer wood formation. The identified conifer-specific xylem genes could provide clues to the molecular processes that give rise to the distinct cell wall structures and chemical properties of softwoods compared to hardwoods. Genes related to lignin biosynthesis are not present in the conifer-specific xylem unigenes, which is consistent with the earlier conclusion that the conifer lignin pathway is conserved in other vascular plants .
Transcripts similar to arabinogalactan protein (AGP) genes are the most abundant genes in the conifer-specific xylem unigenes. In loblolly pine six AGPs including PtAGP4 were identified in xylem tissues  and 11 AGPs were identified in radiata pine. Radiata pine PrAGP4 is highly abundant in earlywood  and preferentially transcribed in earlywood at different tree ages across a rotation period . The conifer-specific xylem AGPs and AGP-like genes had no homologs with any of the 47 AGPs of Arabidopsis [48, 49, 50]. AGPs are a large class of hydroxyproline-rich glycoproteins. Most AGPs are anchored to the plasma membrane  and released into the cell wall after cleavage of the GPI anchor . AGPs may act as cell wall plasticizers, enlarging the pectin matrix, and allowing wall extension and cell expansion . Radiata pine AGPs are located in the compound middle lamella of newly developed tracheids . Expression of AGPs containing FLA domains has recently been found to influence the mechanical strength of stems of the herbaceous (Arabidopsis) and woody (Eucalyptus) angiosperms . However, fasciclin-like domains (FLAs) were not found in the AGP genes specific to conifer wood formation, suggesting different AGPs may have distinct roles in softwood and hardwood formation.
We identified 527 xylem orthologs in vascular plants, including many primary and secondary cell wall genes as well as transcription factors. Conservation of cell wall genes suggests that cell wall biosynthesis is a central event in xylem formation in vascular plants. These xylem orthologs maintain the stability of the basic xylem machinery in vascular plants and may regulate development of xylem structures and properties shared by different vascular plants, such as the common features of softwoods and hardwoods. COMT was previously thought to be one of three enzymes (F5H/Cald5 H, COMT and SAD) specifically involved in S lignin synthesis of angiosperms . The occurrence of COMT genes in the xylem orthologs of vascular plants suggests their involvement in diverse functions other than S lignin synthesis.
The xylem transcriptome is highly conserved in conifer species, but the nucleotide sequences of the xylem transcriptome are significantly distinct among angiosperms, thus considerable evolution of the xylem transcriptome has occurred in angiosperms. The functional domains of the xylem transcriptome are moderately to highly conserved among vascular and non-vascular plants. This suggests that vascular plants share an ancestral xylem transcriptome which is likely to have evolved predominantly from the cell wall transcriptome of non-vascular plants. In comparison to the total transcriptome from a wide range of tissues, the xylem transcriptome has evolved conservatively in vascular plants. Several functional gene groups within the xylem transcriptome, including cell wall genes, ancestral xylem genes, transcription factors and known function genes, are relatively more conserved in vascular plants. A total of 527 xylem orthologs of vascular plants and 274 conifer-specific xylem genes were identified in this study. These genes provide good candidates for molecular investigations of xylem formation and evolution in softwoods (conifers) and hardwoods (woody angiosperms). The uneven distribution of xylem orthologs within Arabidopsis chromosomes suggests genome rearrangements have played an important role in xylem evolution. Phylogenetic analysis of the 501 ancestral xylem orthologs suggests that molecular evolution of the xylem transcriptome has largely paralleled the trend of plant evolution despite several gene classes did not tightly track it.
We selected ten species to represent three major classes of vascular plants (gymnosperms, angiosperms and lycophytes). Four conifer species: Pinus radiata (radiata pine), Pinus taeda (loblolly pine), Picea glauca (white spruce) and Picea sitchensis (sitka spruce), represent gymnosperms. Four species of dicots including Populus tremula × Populus tremuloides (hybrid aspen), Populus trichocarpa, Eucalyptus grandis and Arabidopsis thaliana, and one monocot (Oryza sativa, rice) represent angiosperms. The recently sequenced Selaginella moellendorffii  represents lycophytes. The non-vascular plant Physcomitrella patens (moss) was included as a reference. The seven woody species (four conifer and three woody dicot species) both undergo primary and secondary xylem formation. Arabidopsis also develops secondary xylem and has been used as a model system for secondary xylem development [10, 11, 55, 56, 57, 58]. The monocot rice and the lycophyte S. moellendorffii only produce primary xylem, while the non-vascular plant moss has no xylem at all.
Public genomic resources
We previously developed a radiata pine xylem transcriptome resource with 3,304 xylem unigenes . Unigenes of loblolly pine, white spruce, sitka spruce, hybrid aspen, P. trichocarpa, Arabidopsis, rice and moss were retrieved from the NCBI UniGene database  (Additional file 9). The gene indices of pine, spruce and poplar (Additional file 9) were collected from the TIGR gene indices database . These gene indices represent the total transcriptome from a wide range of tissues (such as stem, shoots, xylem, leaves, flowers, bark, roots and seeds, etc). A sub-set of xylem gene indices of pine, spruce and poplar was separately retrieved from pure xylem libraries. These included genes expressed in pure xylem tissues, while genes expressed in mixed tissues of xylem and phloem (such as shoots, cambial regions, etc) were not considered as they were likely to contain phloem-specific transcripts. After removing redundant sequences a total of 14,527, 15,262 and 9,109 xylem gene indices of pine, spruce and poplar were identified. To minimize the bias in comparative genomics, pure xylem gene indices were reassembled using the same method and parameters as used in the EST assembly of radiata pine . A total of 18,320, 12,489 and 7,991 pure xylem unigenes were finally obtained for pine, spruce and poplar, respectively.
The public transcriptome resources (unigenes and gene indices) are unlikely to include all transcripts expressed in xylem formation due to sampling limitations. Therefore, selected fully sequenced plant species (P. trichocarpa, E. grandis, Arabidopsis, rice, S. moellendorffii and P. patens) were added for comparative genomics in this study (Additional file 9). Gene models of P. trichocarpa, S. moellendorffii and P. patens (moss) were downloaded from the JGI website . E. grandis scaffolds were downloaded from EucalyptusDB . A. thaliana gene models were downloaded from the TAIR website  and O. sativa gene models were downloaded from the MSU website . Detailed information on the public genomic resources used in this study is listed in Additional file 9.
Comparative genomic analysis
Comparisons of sequences were performed locally using the BlastStation2 software (TM software, Inc., CA) with default parameters, including various blast programs (blastn, tblastx, blastx, blastp and tblastn). Expected-value (E-value or E), sequence identity and bit scores were collected for evaluating the similarity between two sequences. Percentage of hits, average sequence identity and average bit score at different E-value cut-offs were calculated for the comparisons of transcriptomes or genomes between different plant species.
The E-value cut-offs for inferring sequence similarity vary from 1e-3 [17, 24] to 1e-50 [64, 65], but 1e-10 to 1e-30 have been widely used [19, 32, 43, 66, 67, 68, 69]. To ensure high confidence as previously suggested  and to increase the likelihood of inferring biological significance, we used the following thresholds of E-values: (a) E = 0, the two sequences are identical and considered to derive from the same gene; (b) 1e-50 ≤ E < 0, the two sequences are highly similar and likely to be from the same gene family; (c) 1e-5 ≤ E < 1e-50, the two sequences are similar in one or more regions along the whole sequence and likely to contain the same functional domain(s) or motif(s); (d) E < 1e-5, the two sequences are likely to be unrelated genes. As a general rule, E ≤ 1e-50 was interpreted in this study to indicate whole sequence similarity of two given sequences (gene conservation or apparent homologs); and 1e-5 to 1e-50 was interpreted to indicate partial sequence similarity (domain or motif conservation). Possible influences of different blast programs and the size of databases were not considered in setting the above E-value thresholds.
Phylogenetic analysis of the ancestral xylem orthologs was carried out using the predicted amino acid sequences of loblolly pine and white spruce, and protein sequences from the five model species (poplar, Arabidopsis, rice, Selaginella and moss). For each species all these sequences were combined into one sequence. The sequences from the above 7 species were aligned using three methods: ClustalX2, Kalign and Mafft  with default settings. Phylogenetic trees were created using the neighbour-joining algorithm with a bootstrap of 1,000. Sequences of individual genes were also used to build phylogenetic trees using CLC Genomics Workbench 3 (CLC bio, Denmark). All individual trees were visually compared with the trees from the combined sequence data.
This work was funded by Forest and Wood Products Australia (FWPA), ArborGen LLC, the Southern Tree Breeding Association (STBA), Queensland Department of Primary Industry (QDPI) and the Commonwealth Scientific and Industrial Research Organization (CSIRO). We thank Iain Wilson, Shannon Dillon, Colleen MacMillan, Bryan Clarke and Jason Bragg for their critical comments on the manuscript.
- 6.Osakabe K, Tsao CC, Li LG, Popko JL, Umezawa T, Carraway DT, Smeltzer RH, Joshi CP, Chiang VL: Coniferyl aldehyde 5-hydroxylation and methylation direct syringyl lignin biosynthesis in angiosperms. Proc Natl Acad Sci USA. 1999, 96 (16): 8955-8960. 10.1073/pnas.96.16.8955.PubMedCentralCrossRefPubMedGoogle Scholar
- 8.Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R, et al: Gene discovery in the wood-forming tissues of poplar: Analysis of 5,692 expressed sequence tags. Proc Natl Acad Sci USA. 1998, 95: 13330-13335. 10.1073/pnas.95.22.13330.PubMedCentralCrossRefPubMedGoogle Scholar
- 12.Pavy N, Paule C, Parsons L, Crow JA, Morency MJ, Cooke J, Johnson JE, Noumen E, Guillet-Claude C, Butterfield Y, et al: Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics. 2005, 6: 144-10.1186/1471-2164-6-144.PubMedCentralCrossRefPubMedGoogle Scholar
- 14.Pavy N, Boyle B, Nelson C, Paule C, Giguere I, Caron S, Parsons LS, Dallaire N, Bedon F, Berube H, et al: Identification of conserved core xylem gene sets: conifer cDNA microarray development, transcript profiling and computational analyses. New Phytol. 2008, 180 (4): 766-786. 10.1111/j.1469-8137.2008.02615.x.CrossRefPubMedGoogle Scholar
- 18.EucalyptusDB. [http://eucalyptusdb.bi.up.ac.za/]
- 19.Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y, Yoshimura K, Tsumura Y: Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers. Plant Mol Biol. 2005, 59 (6): 895-907. 10.1007/s11103-005-2080-y.CrossRefPubMedGoogle Scholar
- 20.Quesada T, Li Z, Dervinis C, Li Y, Bocock PN, Tuskan GA, Casella G, Davis JM, Kirst M: Comparative analysis of the transcriptomes of Populus trichocarpa and Arabidopsis thaliana suggests extensive evolution of gene expression regulation in angiosperms. New Phytol. 2008, 180 (2): 408-420. 10.1111/j.1469-8137.2008.02586.x.CrossRefPubMedGoogle Scholar
- 24.Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A, Carninci P, Hayashizaki Y, Shinozaki K, et al: Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: Implication for land plant evolution. Proc Natl Acad Sci USA. 2003, 100 (13): 8007-8012. 10.1073/pnas.0932694100.PubMedCentralCrossRefPubMedGoogle Scholar
- 30.Selaginella moellendorffii gene models. [http://genome.jgi-psf.org/Selmo1/Selmo1.home.html]
- 32.Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci USA. 2003, 100 (12): 7383-7388. 10.1073/pnas.1132171100.PubMedCentralCrossRefPubMedGoogle Scholar
- 33.Ralph SG, Chun HJE, Kolosova N, Cooper D, Oddy C, Ritland CE, Kirkpatrick R, Moore R, Barber S, Holt RA, et al: A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics. 2008, 9: 484-10.1186/1471-2164-9-484.PubMedCentralCrossRefPubMedGoogle Scholar
- 34.Brenner ED, Katari MS, Stevenson DW, Rudd SA, Douglas AW, Moss WN, Twigg RW, Runko SJ, Stellari GM, McCombie WR, et al: EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes. BMC Genomics. 2005, 6: 143-10.1186/1471-2164-6-143.PubMedCentralCrossRefPubMedGoogle Scholar
- 35.Girke T, Lauricha J, Tran H, Keegstra K, Raikhel N: The cell wall navigator database. A systems-based approach to organism-unrestricted mining of protein families involved in cell wall metabolism. Plant Physiol. 2004, 136 (2): 3003-3008. 10.1104/pp.104.049965.PubMedCentralCrossRefPubMedGoogle Scholar
- 36.Guillaumie S, San-Clemente H, Deswarte C, Martinez Y, Lapierre C, Murigneux A, Barriere Y, Pichon M, Goffner D: MAIZEWALL. Database and developmental gene expression profiling of cell wall biosynthesis and assembly in maize. Plant Physiol. 2007, 143 (1): 339-363. 10.1104/pp.106.086405.PubMedCentralCrossRefPubMedGoogle Scholar
- 40.Segmental chromosome duplication of Arabidopsis. [http://www.tigr.org/tdb/e2k1/ath1/Arabidopsis_genome_duplication.shtml]
- 41.Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M, Lascoux M, Gyllenstrand N: Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics. 2006, 174 (4): 2095-2105. 10.1534/genetics.106.065102.PubMedCentralCrossRefPubMedGoogle Scholar
- 43.Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH, et al: A Populus EST resource for plant functional genomics. Proc Natl Acad Sci USA. 2004, 101 (38): 13951-13956. 10.1073/pnas.0401641101.PubMedCentralCrossRefPubMedGoogle Scholar
- 47.Li X, Wu HX, Southerton SG: Seasonal reorganization of the xylem transcriptome at different tree ages reveals novel insights into wood formation in Pinus radiata. New Phytol. 2010, doi:10.1111/j.1469-8137.2010.03333.xGoogle Scholar
- 59.NCBI UniGene database. [ftp://ftp.ncbi.nih.gov/repository/UniGene/]
- 60.The TIGR plant gene index database. [http://compbio.dfci.harvard.edu/tgi/plant.html]
- 61.Gene models of Populus trichocarpa (v1.1), Selaginella moellendorffii (v1.0) and Physcomitrella patens (moss) (v1.1). [http://genome.jgi-psf.org/]
- 62.Arabidopsis thaliana gene models (TAIR8). [ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release/]
- 63.Oryza sativa (rice) gene models (v6.0). [ftp://ftp.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/]
- 64.Brenner SE: Practical database searching. Bioinformatics. 1998, 9-12.Google Scholar
- 69.Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S, Thibaud-Nissen F, Hamilton J, Childs K, et al: Expressed Sequence Tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis. Plant Mol Biol. 2006, 62 (4-5): 485-501. 10.1007/s11103-006-9035-9.CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.