Phylogenetic analysis of the tenascin gene family: evidence of origin early in the chordate lineage
- 6.6k Downloads
Tenascins are a family of glycoproteins found primarily in the extracellular matrix of embryos where they help to regulate cell proliferation, adhesion and migration. In order to learn more about their origins and relationships to each other, as well as to clarify the nomenclature used to describe them, the tenascin genes of the urochordate Ciona intestinalis, the pufferfish Tetraodon nigroviridis and Takifugu rubripes and the frog Xenopus tropicalis were identified and their gene organization and predicted protein products compared with the previously characterized tenascins of amniotes.
A single tenascin gene was identified in the genome of C. intestinalis that encodes a polypeptide with domain features common to all vertebrate tenascins. Both pufferfish genomes encode five tenascin genes: two tenascin-C paralogs, a tenascin-R with domain organization identical to mammalian and avian tenascin-R, a small tenascin-X with previously undescribed GK repeats, and a tenascin-W. Four tenascin genes corresponding to tenascin-C, tenascin-R, tenascin-X and tenascin-W were also identified in the X. tropicalis genome. Multiple sequence alignment reveals that differences in the size of tenascin-W from various vertebrate classes can be explained by duplications of specific fibronectin type III domains. The duplicated domains are encoded on single exons and contain putative integrin-binding motifs. A phylogenetic tree based on the predicted amino acid sequences of the fibrinogen-related domains demonstrates that tenascin-C and tenascin-R are the most closely related vertebrate tenascins, with the most conserved repeat and domain organization. Taking all lines of evidence together, the data show that the tenascins referred to as tenascin-Y and tenascin-N are actually members of the tenascin-X and tenascin-W gene families, respectively.
The presence of a tenascin gene in urochordates but not other invertebrate phyla suggests that tenascins may be specific to chordates. Later genomic duplication events led to the appearance of four family members in vertebrates: tenascin-C, tenascin-R, tenascin-W and tenascin-X.
KeywordsMajor Histocompatibility Complex Major Histocompatibility Complex Gene Heptad Repeat Xenopus Tropicalis Major Histocompatibility Complex Region
Tenascins are a family of extracellular matrix glycoproteins charcterized by an N-terminal globular domain and heptad repeats, which facilitate multimerization; one or more tenascin-type epidermal growth factor (EGF)-like repeats (consensus sequence X4CX3CX5CX4CXCX8C); a series of fibronectin (FN) type III domains, and a C-terminal fibrinogen-related domain (FReD). Diversity within the family exists at many levels. Each species of vertebrate examined to date has more than one tenascin gene, and the gene products themselves are frequently alternatively spliced (e.g., see [1, 2]). In addition, electron microscopy reveals purified tenascins with 6 arms (hexabrachions) as well as trimers, dimers and monomers [3, 4]. Tenascins are particularly abundant in the embryonic extracellular matrix, but some reappear in the adult during regeneration, inflammatory disease, tumorigenesis and wound healing [2, 5, 6]. Tenascins act through interactions with cell surface receptors (reviewed by ; see also ) as well as by binding to and blocking sites on other extracellular matrix molecules (e.g., see ).
No tenascins have been identified in the Caenorhabditis elegans genome or in arthropod genomes, and the only complete cDNA sequences from poikilotherms come from D. rerio [17, 20, 21]. This has limited the phylogenetic analysis of the origins of tenascins, the identification of evolutionarily conserved regions, domains and potential receptor recognition motifs, as well as the ability to clarify the relationships between the different members of the tenascin gene family. Recently the genomes of a variety of organisms have been completely, or nearly completely, sequenced, facilitating the identification of tenascin genes in fish, amphibians, and invertebrate chordates. Using searches based on conservation of amino acid sequences and domain architecture, we have identified in silico the putative tenascin gene products of the freshwater green pufferfish, Tetraodon nigroviridis, the Japanese tiger pufferfish, Takifugu rubripes, the pipid frog, Xenopus tropicalis, and the ascidian, Ciona intestinalis. These studies show that the basic domain architecture of tenascins is highly conserved between an invertebrate chordate and vertebrates and that there are only four members of the tenascin family in vertebrates: tenascin-C, tenascin-R, tenascin-W and tenascin-X.
Ciona intestinalis tenascin
To analyze the expression pattern of this novel tenascin we raised an antiserum against the second FN type III domain expressed in E. coli (see Methods). In whole mounts this antiserum recognized the thin tunic that envelopes the larva, the extracellular matrix at the base of the tail, and periodically arrayed structures found on either side of the nerve cord that may correspond to the dorsal portions of the primary muscles of the tail (Figure 2C). This latter staining pattern is reminiscent of the immunostaining of the sclerotome in chick and mouse embryos with antibodies against tenascin-C . The tunics of the larvae incubated with preimmune serum were often labelled, but the matrix at the base of the tail and associated with the tail musculature was not (Figure 2D).
The genome sequencing of another urochordate, Oikopleura dioica, has been completed (Trace Archive Database Mega BLAST ). Using the Hidden Markov Model on the alignments of all O. dioica FReDs we could identify a putative tenascin gene in this distinct urochordate as well. Unfortunately, we were not able to identify the entire O. dioica tenascin gene because the genome scaffolds have not been assembled yet (data not shown).
Tetraodon and Takifugu each have five tenascins
Accession numbers of sequences used to obtain data for searching, alignment and phylogenetic reconstructions of tenascins*
H. sapiens tenascin-C
M. musculus tenascin-C
G. gallus tenascin-C
D. rerio tenascin-C
T. nigroviridis tenascin-Ca
T. nigroviridis tenascin-Cb
H. sapiens tenascin-R
M. musculus tenascin-R
G. gallus tenascin-R
D. rerio tenascin-R
T. nigroviridis tenascin-R
H. sapiens tenascin-W
M. musculus tenascin-W
G. gallus tenascin-W
D. rerio tenascin-W
T. nigroviridis tenascin-W
H. sapiens tenascin-X
M. musculus tenascin-X
Takifugu rubripes tenascin-X
G. gallus tenascin-X ("Y")
Pufferfish have two tenascin-Cs
Two of the predicted tenascin sequences found in the Tetraodon nigroviridis genome showed the most similarity to zebrafish, chicken, and the numerous mammalian tenascin-Cs following BLASTP analyses of their FReDs and first and last FN type III domains. Both of these tenascins were predicted by SMART and visual inspection to include an N-terminal linker, a series of heptad repeats, 12.5 EGF-like repeats and a C-terminal FReD. One of the two sequences contains 12 FN type III domains (GSTENT00025206001) and the other contains 10 FN type III domains (GSTENT00020055001; Figure 3A; note that it was necessary for us to complete several predicted partial FN type III domains by identifying missing exons by translating nearby open reading frames in the genomic sequence). The amino acid sequences of the FReDs of the two tenascins share 80% identity and 94% similarity (Figure 3B). Bayesian inference of the phylogenetic trees predicts that these proteins are more closely related to each other than to any other vertebrate tenascin. Because of their sequence similarity to each other and to other tenascin-Cs, overall domain organization, and predicted evolutionary proximity, we will refer to these paralogs as tenascin-Ca (modified from GSTENT00020055001) and tenascin-Cb (modified from GSTENT00025206001). Like tenascin-C in Gallus gallus and man, tenascin-Cb contains an RGD integrin-binding motif in its third FN type III domain in a region predicted to be exposed to receptor binding. In tenascin-Ca this region contains the potentially reactive motif KGD. Similarly duplicated tenascin-C genes were identified in the Takifugu rubripes genome (Figure 3A). SMART reveals that one predicted protein (scaffold_2387.1 and SINFRUT00000156379) has heptad repeats, 12.5 tenascin-type EGF-like repeats, 10 FN type III domains and a C-terminal FReD. It was also necessary to modify this predicted protein by identifying open reading frames in the genomic sequence that corresponded to missing exons that completed several FN type III domains. The domain organization of the corrected tenascin is identical to that of Tetraodon nigroviridis tenascin-Ca. The FReDs of the two pufferfish tenascin-Cas are 95% identical and 99% similar. The second Takifugu rubripes tenascin-C (scaffold_148.12 and SINFRUT00000139096) is predicted by SMART and direct examination to have heptad repeats, 13.5 EGF-like repeats, 15 FN type III domains and a C-terminal FReD. Like Tetraodon nigroviridis tenascin-Cb, it has an RGD motif in the third FN type III domain (Figure 3A). The amino acid sequences of the tenascin-Cb FReDs from the two species of pufferfish are 92% identical and 98% similar.
Tenascin-R in Tetraodon and Takifugu
There are two predicted protein sequences lying side-by-side on chromosome 15 of T. nigroviridis that, when considered as a single gene product, contain an N-terminal linker, heptad repeats, 3.5 EGF-like repeats, 7 FN type III domains and a C-terminal FReD (GSTENT00028393001 and GSTENP00028394001). Phylogenetic analysis of the FReD and terminal FN type III domain show this potential tenascin to be most similar to amniote and zebrafish tenascin-R. Like tenascin-R in D. rerio it is adjacent to, and in the opposite orientation from, a tenascin-W gene (see below). The T. nigroviridis tenascin-R sequence, however, is not complete; the genomic sequence found between the regions encoding the EGF-like repeats and FN type III domains contains approximately 500 undetermined nucleotides. In contrast, the Takifugu rubripes tenascin-R (SINFRUP00000066378), found by BLASTP of the NCBI Protein Database, is missing only the signal sequence at the N-terminus. The remaining predicted protein includes heptad repeats, 4.5 EGF-like repeats, 9 FN type III domains and a C-terminal FReD. Alignment of the two pufferfish tenascin-R sequences confirms that the fourth EGF-like repeat and first two FN type III domains of Tetraodon nigroviridis are missing from the predicted protein. The domain architecture of Takifugu rubripes tenascin-R is identical to that found in the predicted tenascin-R proteins encoded in genomes of D. rerio, G. gallus, mouse and man (e.g., see Figure 1). Pufferfish tenascin-R does not contain an RGD motif, but the third FN type III domains contain an IDG motif, which is a potential recognition site for the alpha4/alpha9 family of integrins . This motif is also found in the same location on D. rerio, G. gallus, M. musculus and human tenascin-R.
Unique repeats in pufferfish tenascin-X
The unusual repeat and domain structure of T. nigroviridis tenascin-X is also seen in Takifugu rubripes tenascin-X. This was determined by a BLAT search of the T. rubripes genome with the FReD of Tetraodon nigroviridis tenascin-X. This search revealed a very large predicted protein (GENSCAN00000009040), which, upon further examination, was found to be composed of three closely clustered but distinctive gene products. The middle gene encodes a predicted tenascin with one EGF-like repeat, 19 charged repeats similar to those found in T. nigroviridis tenascin-X interrupted by two partial charged repeats, a charged domain that is 69% identical to amino acids 271–572 in T. nigroviridis tenascin-X, one partial and three complete FN type III domains, and a C-terminal FReD.
Phylogenetic analysis shows that the FReDs from the putative pufferfish tenascin-Xs are most similar to the FReDs of chicken tenascin-Y and mammalian tenascin-X (see below). However, the most striking evidence that the unusual tenascin-X of T. nigroviridis is indeed a member of this particular tenascin subfamily is its genomic location: it is found on chromosome 8 flanked by the genes encoding cytochrome p450 21-hydroxylase and C4 complement (Figure 4C). In mammals, the tenascin-X gene overlaps with the cytochrome p450 21-hydroxylase gene, which is encoded on the opposite strand of DNA, and lies adjacent to one of two C4 complement genes (e.g., see ). The retinoid X receptor beta gene, which lies next to the C4 gene in T. nigroviridis, is also an MHC complex gene in mammals found approximately 1 Mb from the human tenascin-X gene on chromosome 6. Thus, both by sequence homology and by synteny, the small gene encoding a single tenascin-type EGF-like sequence and only three complete FN type III domains corresponds to the T. nigroviridis ortholog of mammalian tenascin-X.
Tenascin-W: diversity through domain duplication
A putative gene product with a repeat and domain structure similar to zebrafish tenascin-W was identified in T. nigroviridis (GSTENT00028391001). This tenascin gene encodes an N-terminal linker, heptad repeats, 3.5 EGF-like repeats, four FN type III domains, and a C-terminal FReD. BLASTP of the first FN type III domain, the fourth FN type III domain, and the FReD all reveal that this putative gene product is most similar to tenascin-W from zebrafish and mouse. A similar tenascin-W was found in the genome of Takifugu rubripes (SINFRUP00000069989), but this tenascin-W has five FN type III domains instead of four. There is an RGD motif in the fifth FN type III domain of T. rubripes tenascin-W, but this motif is not found in tenascin-W from Tetraodon nigroviridis.
Xenopus tropicalis tenascins
Analysis of phylogeny and synteny
Although a fourth paralogous MHC gene cluster is encoded on human chromosome 19p13-1-13.3, no tenascin is encoded in this region, suggesting that this paralog was lost after the initial large-scale duplication events . The locus of tenascin-W adjacent to tenascin-R on chromosome 1 is suggestive of a distinct evolutionary origin, for example by local tandem duplication of an ancestral tenascin gene. Because the chromosome 1 and 9 MHC regions are so well-related as to indicate their origin from the most recent large-scale duplication , the positioning of tenascin-W adjacent to tenascin-R suggests either that: 1) this positioning was ancestral to the duplication event and a tenascin gene has been lost or transposed away from adjacent the tenascin-C gene; or 2) the tenascin-W gene arose by a tandem duplication of tenascin-R after the original duplication event and with a relatively rapid rate of change to the polypeptide sequence. It can also be hypothesized that the tenascin-W gene was transposed from the original locus on chromosome 19 into its current location soon after the duplication that gave rise to the tenascin-C and tenascin-R genes. On present evidence we cannot distinguish between these models, but it is interesting to note that the tenascin-W protein sequence is equivalently related (around 34%-36%) to both tenascin-C and tenascin-R. Finally, in C. intestinalis, the single tenascin gene is adjacent to a Notch gene just like all vertebrate tenascins which are located in close proximity to a Notch gene present in the MHC paralogous regions, supporting the common ancestry.
Six different names are currently being used to describe the tenascins of vertebrates: tenascin-C, tenascin-R, tenascin-X, tenascin-Y, tenascin-W and tenascin-N. Analysis of intron-exon splice sites, phylogenetic relationships and synteny shows that this nomenclature is inaccurate and needlessly complicated, and that there are in fact only four vertebrate tenascins. The original designations of tenascin-C and tenascin-R are confirmed by our analyses, but avian tenascin-Y shares close phylogenetic and genomic relationships with tenascin-X, and tenascin-N shares phylogenetic, syntenic and intron-exon junction homologies with tenascin-W. Since the names tenascin-X and tenascin-W have clear precedence in the literature and have been adopted in the majority of published reports, we recommend that use of the terms tenascin-Y [16, 47, 48, 49, 50, 51] and tenascin-N  be discontinued.
Here we report the first example of a tenascin from a non-vertebrate. A predicted tenascin was found in the genomic sequence of the invertebrate chordate C. intestinalis and is corroborated as an expressed gene by the identification of matching ESTs, by cDNA sequencing and by immunohistochemistry. A similar gene product was also identified in the related species C. savignyi. With only one copy of tenascin, C. intestinalis may represent an ideal model system for future studies of tenascin function, since analysis of its knockdown by morpholinos or its misexpression would not be complicated by the possible compensatory action of related tenascin gene products. We have also identified a tenascin-type FReD domain in O. dioica and expect to identify a full length tenascin when the genomic sequencing is completed and assembled. Thus, tenascins are most likely common to all urochordates.
There are no tenascins in Caenorhabiditis elegans and Drosophila melanogaster. It is now recognized that these organisms have undergone extensive gene loss and that the Cnidaria have a higher level of gene conservation in comparison to vertebrates . Nevertheless, we could not identify Cnidarian tenascins from the various databases (see Methods). Searches of the draft genome sequence of the echinoderm Strongylocentrotus purpuratus have also not revealed tenascin genes. Thus, to date, tenascins appear to be exclusive to the chordate lineage. Sequences encoding a tenascin-like FReD domain are included in the Branchiostoma floridae EST database [GenBank EST:CF919227] [GenBank EST:CF919269] , but it is unknown if this sequence includes adjacent FN type III domains and EGF-like repeats. It will be of interest to know if one or more tenascin is encoded in the amphioxus genome once this sequencing project is completed. Of the four vertebrate tenascins, tenascin-X has the most distinctive FReD sequence and overall domain organization. We hypothesize that tenascin-X arose from the first tenascin gene duplication in vertebrates.
Analysis of two pufferfish genomes revealed not four, but five, tenascins. The fifth tenascin appears to be the result of a relatively recent duplication of the tenascin-C gene. In keeping with the established protocol for naming very similar duplicated genes, we propose that these tenascin-C paralogs be referred to as tenascin-Ca and tenascin-Cb. The two pufferfish studied here are closely related, having diverged from a common ancestor only 18–30 million year ago . Nevertheless, the tenascin-Cb genes of these close relatives encode different numbers of EGF-like repeats and FN type III domains. This illustrates the potential for the numbers of these repeats and domains to change during evolution (see also Hughes ) and points to the potential problems that can stem from giving different names to tenascins on the basis of the numbers of their repeated domains. It will be interesting to study the expression of these tenascin-C genes in Tetraodon nigroviridis and Takifugu rubripes: do their expression patterns overlap, or have they evolved distinctive regulatory elements? If so, has their function diverged? There is considerable evidence that the great species diversity of bony fishes is the consequence of an additional duplication of the whole fish genome followed by massive gene loss since the appearance of the ancestral tetrapod (e.g., see ). Interestingly, in the family Tetraodontidae the only tenascin to persist as a duplicated gene is tenascin-C. Searches of the latest zebrafish genome assembly (Zv5, Wellcome Trust Sanger Institute) identify only one tenascin-C on chromosome 5 (NP_ 570982). It will be interesting to establish if the persistence of tenascin-C paralogs is specific to the pufferfish lineage.
The tenascin-X gene in pufferfish encodes previously undescribed, highly charged repeats. We propose the name "GK repeats" to describe these unique sequences, since all but one of the 19 repeats in T. rubripes and all of the repeats in Tetraodon nigroviridis begin with the amino acids glycine and lysine. Given the concentration of tenascin-X in the epimysium of birds and mammals, it is intriguing that these repeats combined with the rest of the DUF619 domain share some sequence homology with a prokaryotic collagen-binding protein and the muscle-specific protein UNC-89 in C. elegans. The observation that X. tropicalis tenascin-X contains a region with sequence similar both to UNC-89 as well as bird and human tenascin-X leads us to suggest that this region may have a role that is important to tenascin-X function, and its biological properties should be studied further.
A previous study in the zebrafish  reported five FN type III domains in the cDNA sequence of tenascin-W. The analysis of the genomic sequence of D. rerio as well as RT-PCR of zebrafish cDNA reveals a sixth FN type III domain. Thus, the D. rerio tenascin-W gene encodes the same number of FN type III domains as does the orthologous chicken gene, yet it encodes two more than does Tetraodon nigroviridis. This diversity again points out the potential hazards of giving proteins a new name based on the number of repeated domains – of the four tenascin subfamilies, only the tenascin-Rs have the same number of FN type III domains encoded in the genes of fish, birds and mammals. In the case of mammalian tenascin-C and tenascin-X, distinct processes of conservative or concerted evolution have been described for their FN type III domains . In the case of tenascin-W most of the diversity can be traced to different numbers of duplications in the third FN type III domain. This domain exists as a single copy in T. nigroviridis, two copies in Takifugu rubripes, three copies in D. rerio and G. gallus, 6 copies in man, and 9 copies in many other mammals including dog [NCBI Protein:XP_547455], rat [NCBI Protein:XP_222794] and mouse. What is it about this particular FN type III domain that may be leading to its duplication? One possibility is that this domain contains an integrin binding site, and different numbers of integrin binding sites may change the affinity of the intact molecule for a receptor, or may allow for tenascin-W to crosslink numerous receptors. Note that the LDVP/IDAP/IDSP motif, which has been shown to recognize alpha4beta1 integrin in fibronectin and VCAM , is found in almost all of the duplicated third FN type III domains analyzed here (underlined in Figure 6). Tenascin-W from X. tropicalis is different from other tenascin-Ws in its number of EGF-like repeats and the relatively recent duplication of its fourth, and not third, FN type III domain. Interestingly, this also results in the duplication of two potentially active integrin binding motifs: an RGD in the fourth domain and a KGD in the fifth. It is intriguing to note that species that lack an RGD motif in the third FN type III domain of tenascin-C (i.e., X. tropicalis and mouse) have an RGD motif in their tenascin-W. Since tenascin-C and tenascin-W are often co-expressed during development [17, 18, 51] it is interesting to speculate that they may have overlapping functions related to their binding to an RGD-dependent integrin.
Most tenascins studied to date are alternatively spliced, and only rarely are all of the potential FN type III domains identified in the final protein (e.g., see [56, 57]). This variability leads to different functional domains being exposed in different tissues [1, 7, 58]. As more ESTs become available, it will be interesting to analyze the many potential splice variants of each of these tenascins, as they may reveal additional functional variability in the tenascin gene family.
Tenascins are not known in protostomes or Cnidaria. We provide evidence that a single tenascin is encoded in the genome of the urochordate C. intestinalis. This invertebrate chordate tenascin contains the motif RGE in an exposed loop of its third FN type III domain, which may correspond to an integrin-binding site conserved in some vertebrate tenascins. Sequence alignments, analysis of domains and exon/intron organization and phylogenetic analyses of tenascins from four classes of vertebrates reveal that in fish and in tetrapods there are four members of the tenascin gene family: tenascin-C, tenascin-R, tenascin-X and tenascin-W. We suggest that use of the names tenascin-Y and tenascin-N be discontinued. In pufferfish there are two tenascin-C paralogs but single copies of the other tenascins. The human genome provides clear evidence that tenascin-C, tenascin-R and tenascin-X arose through the same ancestral genome duplications. Tenascin-W may have evolved as a result of a local duplication of the ancestor of tenascin-C and -R. The tenascin-X genes from the pufferfish Tetraodon nigroviridis and Takifugu rubripes encode different numbers of unique, highly charged tandem repeats of unknown function. X. tropicalis tenascin-X shares features with both the smaller teleost tenascin-Xs and the very large tenascin-Xs found in mammals. Finally, much of the diversity seen in the size of tenascin-W can be accounted for by the multiple duplications of the exon encoding the third FN type III domain in different species.
Sequences and bioinformatics
The NCBI Protein Database accession numbers of complete or partial tenascin amino acid sequences identified by key-word search or BLASTP (NCBI Basic Local Alignment Search Tool ) (reviewed by ) and used to obtain data for searching, alignment and phylogenetic reconstructions are listed in Table 1. Genomes were searched and analyzed at the NCBI , UCSC Genome Bioinformatics , Joint Genomic Institute Eukaryotic Genomics , the Laboratory for Developmental Biology and Genome Biology , Centre National de Séquençage Genoscope , the Max Planck Institute for Molecular Genetics , the Cnidarian Evolutionary Biology Database  and the Nematostella vectensis Genomics Database . Sequence alignment and phylogenetic trees were constructed simultaneously by using SATCHMO (Berkeley Phylogenomics ) as described by Edgar and Sjölander . The phylogenetic tree of the FReDs was constructed with MrBayes: Bayesian Inference of Phylogeny  (see ) using the WAG substitution model . We ran 1 million generations after which the average standard deviation of split frequencies was 0.004518. The phylogenetic tree was drawn using DrawTree . For identification of parologous tenascin-encoding regions in the human genome, the database of "Paralogons in the human genome", version 5.28, was searched (Paralogons in the Human Genome ) (see ).
Domain and repeat identification
Likely heptad repeats were identified using the Simple Modular Architecture Research Tool SMART , Paircoil Scoring Form  (see also ) and COILS  programs. Tenascin-type EGF-like repeats were identified by their characteristic number and spacing of cysteine residues (X4CX3CX5CX4CXCX8C) and confirmed with the SMART program, which classifies them as generic EGF repeats. FReDs and FN type III domains were identified by the NCBI Conserved Domain feature of BLASTP  and by SMART . Partial FN type III domains were also identified by the NCBI Conserved Domain program. Often the different programs for predicting open reading frames in pufferfish and X. tropicalis were not in agreement. For example, the sequence ELDAPSDLSAQDVTESSFTVSRDSTQVHIDGYFLSFSSSAGSN was predicted to lie between the third and fourth FN type III domains of Tetraodon nigroviridis tenascin-W at the NCBI protein database [CAG07652], but not in predicted protein at the Genoscope site (GSTENT00028391001). In such cases the proteins were aligned with avian and mammalian tenascin sequences using the SATCHMO program and the predicted protein with the best fit to known cDNA sequences was used. Additional short predicted open-reading frames were not considered to be part of the protein if the BLASTP and SMART programs predicted them to be potentially artifactual due to low-complexity. Intron-exon splice sites were estimated from the results of the "build protein" feature of BLASTP genomic searches and from the gene model feature of the Tetraodon genome browser .
Molecular cloning, antibody production and immunohistochemistry
The sequence of the N-terminus of Ciona intestinalis tenascin was predicted from genomic sequence 5' of the EGF-like repeats encoding a potential signal peptide. This was confirmed by the identification of a cDNA using RT-PCR with primers homologous to the region harboring the methionine N-terminal to the predicted signal peptide and a primer from the next exon. Animals were obtained from the Marine Biological Laboratory (Woods Hole, MA). RNA was extracted from larvae collected 18 hours after fertilization and used as template for reverse transcription. PCR was performed with the primer pair 5'-ATGTGGCCTGTTTCGAGTCG-3'/5'-ATTGCTGCTGGTCAGGAACG-3' and the resulting band sequenced. It encoded amino acids 1–66 shown in Figure 2A. To raise antibodies we amplified the cDNA encoding the first FN type III domain from C. intestinalis mRNA by RT-PCR with the primers 5'-TCTCATGTCATCAAACCATCAG-3'/5'-TGTTTTCACAGAAGCAGTAATTGG-3'. The resulting cDNA encoded amino acids 459–550 from the sequence shown in Figure 2A. Interestingly, the DNA sequence was only 86.8% identical to the exons of the genomic sequence (Ciona intestinalis Genome ), but all of them were silent mutations resulting in a 100% conserved protein sequence. These differences are possibly due to the genomic diversity that has been noted by others between specimens of C. intestinalis from the Pacific and Atlantic . This protein fragment was expressed in E. coli and antiserum against the purified protein was raised in rabbits. The antiserum recognized a large (>250 kDa) smeared band on immunoblots of homogenates of recently hatched C. intestinalis larvae (generously provided by W.R. Jeffery, University of Maryland) as well as smaller bands that were also present on the blot incubated with the preimmune serum. For immunostaining of whole mounts, larvae were fixed in 4% paraformaldehyde in sea water and permeabilized in methanol at -20C. Larvae were then rinsed in phosphate buffered saline with 0.01% Tween-20 (PBT), blocked in 0.1% bovine serum albumin in PBT, and incubated overnight in the anti-Ciona tenascin serum (1:100) or similarly diluted preimmune serum. Larvae were then rinsed overnight and incubated in goat anti-rabbit Alexa 594 secondary antibody (1:500) in PBT overnight. After extensive washes the immunostained larvae were coverslipped and observed using an Olympus confocal microscope.
Reverse Transcriptase Polymerase Chain Reaction
The cDNA encoding the fifth FN type III domain of D. rerio tenascin-W was cloned in the following way: total RNA from an adult fish was isolated with Trizol reagent (Invitrogen) and mRNA was isolated with the RNeasy mRNA purification kit (Qiagen). First strand cDNA was generated with SuperScript III reverse transcriptase (Invitrogen) according to the manufacturer's recommendation. RT-PCR (30 cycles 94°C 1 min, 94°C 50 sec, 60°C 1 min, 72°C 2 min) using primers corresponding to the sequences from neighboring exons (5'-GAGTTCAACAGAAGCGGAAAC-3'/5'-TTGAGTCTGAACATCAGTGGC-3') generated an appropriately sizedproduct that was sequenced. The cloned cDNA corresponded to the FN type III domain found in the genomic sequence encoded on asingle exon between the fourth and sixth FN type III domains of tenascin-W (99% identical, as opposed to 81% identical to the fourth FN type III domain). The cDNA coded protein that was identical to the protein predicted from the genomic sequence except that athreonine replaced an isoleucine (the 53rd residue of the domain). This sequence was confirmed by the use of asecond independent pair of primers (5'-AAGCGGAAACAGATATAGACGC-3'/5'-AATCTCTGCTGTTTCAGCCTC-3').
Research in RPT's laboratory is funded by NSF 0235711 and NIH NCRR C06 RR-12088-01. The authors would like to thank William Jeffery for supplying fresh homogenates of C. intestinalis, Stefano Canevascini for his assistance with confocal microscopy and Caroline Meloty-Kapella for her expertise at immunoblotting.
- 14.Matsumoto K, Arai M, Ishihara N, Ando A, Inoko H, Ikemura T: Cluster of fibronectin type III repeats found in the human major histocompatibility complex class III region shows the highest homology with the repeats in an extracellular matrix protein, tenascin. Genomics. 1992, 12: 485-491. 10.1016/0888-7543(92)90438-X.CrossRefPubMedGoogle Scholar
- 22.Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, Harafuji N, Hastings KE, Ho I, Hotta K, Huang W, Kawashima T, Lemaire P, Martinez D, Meinertzhagen IA, Necula S, Nonaka M, Putnam N, Rash S, Saiga H, Satake M, Terry A, Yamada L, Wang HG, Awazu S, Azumi K, Boore J, Branno M, Chin-Bow S, DeSantis R, Doyle S, Francino P, Keys DN, Haga S, Hayashi H, Hino K, Imai KS, Inaba K, Kano S, Kobayashi K, Kobayashi M, Lee BI, Makabe KW, Manohar C, Matassi G, Medina M, Mochizuki Y, Mount S, Morishita T, Miura S, Nakayama A, Nishizaka S, Nomoto H, Ohta F, Oishi K, Rigoutsos I, Sano M, Sasaki A, Sasakura Y, Shoguchi E, Shin-i T, Spagnuolo A, Stainier D, Suzuki MM, Tassy O, Takatori N, Tokuoka M, Yagi K, Yoshizaki F, Wada S, Zhang C, Hyatt PD, Larimer F, Detter C, Doggett N, Glavina T, Hawkins T, Richardson P, Lucas S, Kohara Y, Levine M, Satoh N, Rokhsar DS: The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science. 2002, 298: 2157-2167. 10.1126/science.1080049.CrossRefPubMedGoogle Scholar
- 24.UCSC Genome Bioinformatics. [http://genome.ucsc.edu]
- 26.Laboratory for Developmental Biology and Genome Biology. [http://ghost.zool.kyoto-u.ac.jp]
- 33.Trace Archive Database Mega BLAST. [http://www.ncbi.nlm.nih.gov/BLAST/mmtrace.shtml]
- 36.Yokosaki Y, Matsuura N, Higashiyama S, Murakami I, Obara M, Yamakido M, Shigeto N, Chen J, Sheppard D: Identification of the ligand binding site for the integrin alpha9 beta1 in the third fibronectin type III repeat of tenascin-C. J Biol Chem. 1998, 273: 11423-11428. 10.1074/jbc.273.19.11423.CrossRefPubMedGoogle Scholar
- 39.National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
- 53.Panopoulou G, Hennig S, Groth D, Krause A, Poustka AJ, Herwig R, Vingron M, Lehrach H: New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes. Genome Res. 2003, 13: 1056-1066. 10.1101/gr.874803.PubMedCentralCrossRefPubMedGoogle Scholar
- 54.Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De Berardinis V, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest Crollius H: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431: 946-957. 10.1038/nature03025.CrossRefPubMedGoogle Scholar
- 57.Derr LB, Chiquet-Ehrismann R, Gandour-Edwards R, Spence J, Tucker RP: The expression of tenascin-C with the AD1 variable repeat in embryonic tissues, cell lines and tumors in various vertebrate species. Differentiation. 1997, 62: 71-82. 10.1046/j.1432-0436.1997.6220071.x.CrossRefPubMedGoogle Scholar
- 59.NCBI Basic Local Alignment Search Tool. [http://www.ncbi.nlm.nih.gov/BLAST]
- 61.Joint Genomic Institute Eukaryotic Genomics. [http://genome.jgi-psf.org]
- 62.Centre National de Séquençage Genoscope. [http://www.genoscope.cns.fr]
- 63.Max Planck Institute for Molecular Genetics. [http://www.molgen.mpg.de]
- 64.Cnidarian Evolutionary Biology Database. [http://cnidbase.bu.edu/]
- 65.Nematostella vectensis Genomics Database. [http://www.stellabase.org/]
- 66.Berkeley Phylogenomics. [http://phylogenomics.berkeley.edu]
- 67.MrBayes: Bayesian Inference of Phylogeny. [http://mrbayes.csit.fsu.edu/index.php]
- 70.DrawTree. [http://www.daimi.au.dk/~mailund/drawtree.html]
- 71.Paralogons in the Human Genome. [http://wolfe.gen.tcd.ie/dup]
- 72.Simple Modular Architecture Research Tool SMART. [http://smart.embl-heidelberg.de]
- 73.Parcoil Scoring Form. [http://paircoil.lcs.mit.edu]
- 77.Ciona intestinalis Genome. [http://genome.jgi-psf.org/ciona2/ciona2.home.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.