Generation of divergent uroplakin tetraspanins and their partners during vertebrate evolution: identification of novel uroplakins
- 1.8k Downloads
The recent availability of sequenced genomes from a broad array of chordates (cephalochordates, urochordates and vertebrates) has allowed us to systematically analyze the evolution of uroplakins: tetraspanins (UPK1a and UPK1b families) and their respective partner proteins (UPK2 and UPK3 families).
We report here: (1) the origin of uroplakins in the common ancestor of vertebrates, (2) the appearance of several residues that have statistically significantly positive dN/dS ratios in the duplicated paralogs of uroplakin genes, and (3) the existence of strong coevolutionary relationships between UPK1a/1b tetraspanins and their respective UPK2/UPK3-related partner proteins. Moreover, we report the existence of three new UPK2/3 family members we named UPK2b, 3c and 3d, which will help clarify the evolutionary relationships between fish, amphibian and mammalian uroplakins that may perform divergent functions specific to these different and physiologically distinct groups of vertebrates.
Since our analyses cover species of all major chordate groups this work provides an extremely clear overall picture of how the uroplakin families and their partner proteins have evolved in parallel. We also highlight several novel features of uroplakin evolution including the appearance of UPK2b and 3d in fish and UPK3c in the common ancestor of reptiles and mammals. Additional studies of these novel uroplakins should lead to new insights into uroplakin structure and function.
KeywordsCommon Ancestor Bony Fish Cartilaginous Fish Genome Duplication Event Paralog Group
Uroplakins (UP’s) are the protein subunits of the urothelial plaques that cover the apical surface of mammalian bladder epithelium (urothelium). There are four major mammalian uroplakins, i.e., the 27-kDa UPIa, 28-kDa UPIb, 15-kDa UPII and the 47-kDa UPIIIa; [1, 2, 3]. UPK3b is a minor isoform of UPIIIa . These plaques form the so-called asymmetric unit membrane (AUM), and contribute to the permeability barrier function and mechanical stability of the urothelium. Uroplakin defects underlie some urinary tract anomalies, and one of the uroplakins, UPIa, can serve as the receptor for the uropathogenic E. coli that causes over 85% of urinary tract infections .
Uroplakins (UPK) can be divided into two types. The first type comprises UPK1a and 1b, which belong to the tetraspanin family (containing CD9, CD63, CD81 and CD151 proteins); tetraspanin proteins span the membrane four times and play important functions in fertilization, immunity and cell:cell interaction [6, 7, 8, 9, 10, 11]. The second type comprises UPK2 and UPK3 that span the membrane only once; these uroplakins share a stretch of ~12 amino acid residues on the extracellular side of their single transmembrane domain (TMD) [12, 13].
The fact that uroplakins 1a and 1b (UPK1a and UPK1b) interact specifically with uroplakins 2 and 3a (UPK2 and UPK3a), respectively, make them an attractive system for studying the co-evolution of interacting membrane protein pairs [14, 15, 16]. While mammalian uroplakins form 2D crystals of urothelial plaques on urothelial apical surface, uroplakins of the non-mammals including amphibians (which have the complete assortment of UPK1a, 1b, 2 and 3a, as well the minor UPK3b) do not form such plaques [17, 18, 19]. In Xenopus oocytes, UPK3a and its binding partner UPK1b play a key role in sperm-egg fertilization [19, 20, 21]. In addition, a UPK3-related gene product in zebrafish  was recently found to play a role in epithelial polarization and morphogenesis of pronephric tubules . The evolutionary relationship among these fish, amphibian and mammalian uroplakins, that seem to be functionally divergent, remains unclear.
To better understand the evolution of uroplakins and to decipher how the tetraspanin uroplakins coevolve with their binding partners, we analyzed the uroplakin-related sequences in a wide range of whole-genome-sequenced vertebrate species including mammals, birds, amphibians, bony fish and ancient cartilaginous fish . Previously we showed the existence of a strong co- evolutionary relationship between UPK1a and UPK1b and their partner’s UPK2 and UPK3a/3b proteins, respectively . The recent availability of additional genome-sequences from a broad array of chordates (cephalochordates, urochordates and vertebrates), including “living fossils” such as lampreys, spotted gars and coelacanths, allowed us to re-examine more systematically the evolution and possible neofunctionalization of uroplakins. For convenience and consistency, in this communication we will refer to the individual ortholog groups such as UPK1a, UPK1b, UPK2 and UPK3a as families, while the UPK1a/1b tetraspanins and the UPK2/UPK3-related proteins as two separate superfamilies.
In this paper, we pinpoint the origin of uroplakins in the common ancestor of vertebrates, track the appearance of skewed dn/dS ratios in the nucleotide sequences of the gene families and point to possible neofunctionalization in the duplication of paralog uroplakin genes. We also analyze the patterns of coevolution between UPK1a/1b tetraspanins and the UPK2/UPK3-related proteins. Finally, we report the existence of three new UPK members belonging to the UPK2/3 superfamily, i.e., UPK2b, 3c and 3d. Since our analyses are based on a broad array of species covering all major chordate groups this work presents an overall picture of the uroplakin families existing in nature.
Sequences and matrix construction
All protein and DNA sequences used in this study (tetraspanin UPK’s, i.e., UPK1a and UPK1b and single membrane spanning UPK’s, i.e., UPK2 and UPK3) are listed in Additional file 1: Figure S1 and Additional file 2: Figure S2, in which exons 2–5 are represented in alternate colours in the protein sequences. Blast searches with the Blast-T program were performed as described [22, 23, 24] with multiple starting queries using various genome-sequencing projects including the NCBI (http://www.ncbi.nlm.nih.gov; http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=euk), Ensembl (http://www.ensembl.org), http://www.ambystoma.org/ servers and ESTs databases. Intron-exon borders were determined as in  using the “align two sequences” option of the NCBI BLAST program (http://www.ncbi.nlm.nih.gov). Splice consensus signals were then manually annotated.
Cloning and sequencing of UPK 3c
Total human normal bladder mRNA (1100564 F, Asterand, Detroit, MI) was used to synthesize cDNA using Transcriptor Fisrt Strand cDNA Synthesis Kit (Roche, Germany) with Random Hexamer Primers. The human normal bladder cDNA of upk3c was isolated by RT-PCR using primers based on the hypothetical uroplakin 3BL sequence annotated in NCBI (NM_001114403.2). The primer sequences used for full-length ORF amplification were sense 5′- GACGGACGGACAGACAGATGGACA-3′ and antisense 5′-GCCCCTCTGGAACCCCTCAG-3′. The cDNA product was cloned into pCR®II- TOPO vector and sequenced.
Fasta sequences were aligned using the web based alignment tool TranslatorX  that utilizes amino acid alignments to generate DNA sequence alignments. Phylogenetic matrices in PHYLIP and NEXUS format were then generated using Mesquite  for both protein sequences and DNA sequences. We explored the different phylogenetic signal inherent in amino acid data and nucleotide data, by analysing the protein and DNA sequence matrices separately. In addition, we elided the DNA data matrix with the amino acid matrix for an analysis where the amino acid data weight the DNA sequence data . PHYLIP matrices were then used in subsequent analysis for natural selection (web based DataMonkey analyses and desktop HYPHY analyses). In addition to the two differently formatted matrices (PHYLIP versus NEXUS), we also generated two kinds of matrices. The first kind of matrix used the genes in the two gene families as terminals. One matrix for the UPK1 genes (UPK1a and UPK1b) was constructed and a second matrix for the UPK2/UPK3 families was also constructed. The second kind of matrix we constructed used the several vertebrate species that have UPKs in their genomes as terminals with partitions representing the seven paralog groups for these genes.
Detection of dN/dS skew
Two tests were used to detect the patterns of sequence change using dN/dS ratios in the gene families of this study. The first test examines branch specific departure from neutrality (or a dN/dS = 1.0). The Branch-site REL test in the HYPHY package was used on the two gene families (UPK1 and UPK2/UPK3) separately. The default settings and the Bayesian tree topology were used with in these tests. The second test is the MEME (Mixed Effects Model of Evolution) test that uses mixed model approaches to detect departures from neutrality at individual codons . This latter test was performed individually on each of the following seven genes – UPK1a, UPK1b, UPK2, UPK2a, UPK3a, UPK3b, and UPK3c.
Analysis of gene by gene phylogenetic interaction
The congruence of the UPK interacting pairs was examined using the Shimodaira Hasegawa test . This test examines the congruence of phylogenetic information in two partitions of data using a likelihood ratio test. Each of the seven genes (UPK1a, UPK1b, UPK2a, UPK2b, UPK3a, UPK3b, UPK3c) that are found in more than four species were tested pairwise for congruence with each other.
Results and discussion
Vertebrate origin and evolution of uroplakins
In our earlier work , we suggested that uroplakins first appeared in the common ancestor of vertebrates because the oldest uroplakin sequences we detected were of cartilagenous fish . With the availability of greatly expanded genomic databases of chordates (Vertebrates, Cephalochordates and Urochordates), we have found UPK-related sequences in lampreys (extant jawless basal vertebrates called agnathans) but not in Cephalochordates (Amphioxus), Urochordates (Ciona) or lower organisms. This finding suggests that UPKs originated in the common ancestor of vertebrates over 500 mya when vertebrates radiated from cephalochordates and urochordates and most likely underwent two rounds of whole genome duplication (WGD) [33, 34, 35, 36].
We used the elided matrix described in the materials and methods to generate the Bayesian trees for UPK1a/1b superfamily (Figure 1A) and UPK2/3 superfamily (Figure 1B), which represent different paralogs. Separate protein and DNA phylogenetic trees for these two gene superfamilies based on parsimony, maximum likelihood and Bayesian approaches are included in Additional file 3: Figure S3 and Additional file 4: Figure S4. Previous phylogenetic analyses showed that tetraspanin UPK1a’s and 1b’s form a tight clade within the broad superfamily of eukaryotic tetraspanins [24, 37, 38, 39]. The analysis of tetraspanin UPK1a and 1b (Figure 1A) shows that their genealogy agrees with animal phylogeny except for UPK1a and UPK1b from cartilaginous fish that are closer to tetrapods than to bony fish. This deviation probably reflects the well known high diversification and faster evolving rates of bony fish in comparison with tetrapod and cartilaginous fish .
The genealogy of UPK2/3, like that of UPK1a/1b (Figure 1B), is consistent with the organismal histories, with a few exceptions. For example, the lamprey UPK2a, as well as platypus UPK3a, coelacanth 3a, and lamprey 3a.1 and 3a.2 have highly divergent sequences that did not cluster with their respective groups (Figure 1B). These incongruence might be caused by long branch attraction; an analysis artifact in which rapidly evolving sequences cluster together regardless of their correct relationships . Alternatively, these proteins may be converging in function.
Gene duplication and hypotheses of neofunctionalation of uroplakins
To examine further the patterns of sequence change in the uroplakin genes, we established where in the phylogeny of vertebrates branch specific changes in intensity and direction of skew in dN/dS ratios occurred. We also determined which residues in the uroplakin proteins where statistically significant departure from dN/dS = 1.0 occur. There are three outcomes of using dN/dS as an indicator of sequence change. The first is when the ratio is significantly less than 1.0 (often times in the literature equated to purifying selection). Another case is where the gene sequences will accumulate silent (synonymous) and replacement (non-synonymous) substitutions in its DNA sequence in equal proportion, and hence have a dN/dS = 1.0 (often times equated in the literature with neutrality). The final and more rare possibility is that the site will or branch will have a statistically significant dN/dS ratio greater than 1.0 (often times in the literature referred to as positive Darwinian selection). Since the validity of equating the skew in these ratios has recently been called into question [43, 44] we prefer here to simply point out a pattern of departure form the neutral expectation (dN/dS = 1.0) when we observe a statistically significant result. Whether or not natural selection is at work in molding the skewed ratios is dependent on functional experiments and validation. We suggest however that significantly skewed branch or residues show the potential for evolutionarily important events and reporting the location of these skewed residues and branches will be useful to subsequent researchers working on the function and evolution of these proteins.
We thus identified the branches that have experienced statistically significant departure from neutrality in their dN/dS ratios in the uroplakin genealogies (Figure 1A and B). These analyses led to two major findings. First, in almost every uroplakin paralog group (UPK1a, 1b, 2a, 3a, 3b and 3c) a strong pattern of significant skew toward dN/dS > 1.0 accompanies the duplication that produced the paralog group (asterisks in Figure 1). Second, the divergence of mammal species is also accompanied by significant skew in sequence change (blue asterisks in Figure 1). The single exception to this pattern is for the mammalian UPK1b group.
Another interesting finding is that some uroplakin paralogs have higher levels dN/dS skew than others. For instance, while UPK1a has only a single codon with dN/dS > 1.0, UPK1b has five. UPK2a has two codons with dN/dS > 1.0, while UPK2b has eight. The UPK3 paralogs (UPK3a, UPK3b and UPK3c) however show similar levels of dN/dS > 1.0 (five codons in each).
These results are relevant to establishing hypotheses about the function and possible neofuntionalization of the uroplakin gene families. It is possible that after a gene family is duplicated the branch with more residues that are changing disproprortinately is the paralog that has gained novel function. Purifying natural selection often relaxes after the duplication of a gene family allowing for the neofunctionalization of the newly duplicated paralog [33, 34, 35, 36]. In order for neofunctionalization to occur the variation in nonsynonymous sites would need to be present and residues with dN/dS > 1.0 that are statistically significant would be good candidates for such neofunctionalization. Our results would then indicate that of the UPK1 paralogs (UPK1a and UPK1b), it is UPK1b that has the potential to be neofunctionalized. In this hypothesis, UPK1a would then have retained the ancestral protein function, while UPK1b would have evolved a new but related function to UPK1a.
Likewise, there are two points in the evolution of the gene family where we can hypothesize neofunctionalization events in the UPK2/UPK3 subfamilies. UPK2a has the lowest number of codons with positively skewed codons of the UPK2/UPK3 uroplakins, making it the more conserved in sequence and hence more than likely the more conserved in function. UPK2b and all of the UPK3’s (UPK3a, UPK3b and UPK3c) on the other hand have the potential to have been neofunctionalized producing newer and more derived functions and hence have more codons with dN/dS > 1.0.
These patterns of sequence divergence patterns for the tetraspanin UPKs and the UPK2/UPK3 proteins fit nicely with what we know about their coevolution and cofunctionality (see below). Since UPK1a physically interacts with UPK2a, then the genes for these two proteins should have similar patterns of sequence change (as manifest in dN/dS ratios). Likewise if UPK1b and UPK3 are physically interacting then we should also see similar patterns of sequence change for the genes for those proteins. Indeed, UPK1a and UPK2a show the largest number of positively skewed dN/dS ratios and hence show a hypothesized ancestral function, while UPK1b, UPK2b and UPK3a, b and c show potential patterns of neofunctionalization.
Uroplakin evolution and diversification of major vertebrete groups
While the formation of tetraspanin UPK’s, i.e., UPK1a and UPK1b, can be easily explained by a single duplication event in the common ancestor of vertebrates, the evolution of the UPK2/UPK3 families is more complex requiring several rounds of duplication events to explain the distribution of genes in the animal taxa where they exist (Figure 2; [47, 48, 49, 50, 51, 52]). We hypothesize a major duplication event that likely coincides with the first major whole genome duplication event in the common ancestor of vertebrates  that produced the UPK2 and UPK3 split. Within UPK2 genes another duplication event occurred to produce UPK2a and UPK2b. This duplication could have occurred in the common ancestor of cartilaginous and bony fish since we found UPK2b first appeared in cartilaginous fish. Alternatively, since in lower vertebrates we have only the genome of lampreys, we could not rule out the possibility that UPK2a was duplicated in the common ancestor of vertebrates followed by the subsequent loss of UPK2b in lampreys (Figure 2).
Since UPK3 has evolved into several gene families the duplication history of this group of genes is even more complex. The appearance of UPK3c could be explained by a duplication of UPK3b that took place in the common ancestor of reptiles and mammals (Figure 2). We hypothesize a duplication either in the common ancestor of vertebrates or in the common ancestor of cartilaginous and bony fish to produce the protoUPK3b and the fish UPK3d genes. Also, some phylum specific upk3b duplication occurred in amphibians (Xenopus UPK3b.1 and 3b.2) and in lampreys (UPK3a.1, 3a.3). Overall, we conclude that the evolution of UPK3 family of genes requires at least 4 rounds of duplication to explain the current distribution of genes in the genomes of vertebrates.
Using phylogenetic congruence to unravel the patterns of coevolution of uroplakin tetraspanin (UPK1a and UPK1b) and the UPK2/UPK3 superfamilies
Phylogenetic analysis of interacting proteins provides a powerful means to unravel the patterns of their coevolution [53, 54, 55, 56, 57]. Most studies of coevolution of proteins (thus their genes) take either a tree-based or a distance-based approach [53, 54, 55]. The basic idea with these studies is that if two proteins are coevolving and one incurs a mutational change in amino acid sequence, then the other will compensate with mutational change in sites that interact with the initial change. Such changes result in correlated evolutionary patterns both in distances and in phylogenetic relationships. In this study, we take a tree-based approach that compares the likelihood of the topologies of each interacting protein in the pairs of uroplakins. The Shimodara Hasegawa (SH) test allows for such comparison using a likelihood ratio test and enables us to show whether two proteins are indeed sharing strong phylogenetic signal. We suggest that strong congruence of phylogenetic signal is reasonable evidence of the coevolution of two uroplakins. More importantly, the lack of phylogenetic congruence of two uroplakins is strong evidence that they are not coevolving.
We studied the evolution of genes encoding the two major types of uroplakins, i.e., the UPK1a/1b tetraspanin type and the UPK2/3 tetraspanin-associated type. The tetraspanin UPKs show a clear pattern of duplication in the common ancestor of vertebrates more than likely commensurate with the major genome duplication event that has been hypothesized in this ancestor . Once the duplication occurred in the common ancestor of vertebrates, both UPK1a and UPK1b diverged dramatically as is evident by the different patterns of dN/dS ratios for these two paralog groups. On the other hand, the UPK2/UPK3 group of uroplakins experienced more complex and lineage-specific rounds of duplication to produce the existing genes in these two groups of UPKs. We suggest that UPK2 retained the ancestral function while the UPK3 paralogs neofunctionalized. Again the patterns of skewed dN/dS ratios for these paralog groups support this interpretation.
Moreover, we found that UPK1a and UPK2a show strong congruence with respect to evolutionary history. Likewise UPK1b and UPK3 paralogs show strong congruence, commensurate with their known interactions. Our current work identifies three new UPK families (ortholog groups - UPK2b, UPK3c and UPK3d) all belonging to the UPK2/3 superfamily. Our systematic analysis of uroplakin-related genes pinpoints the appearance of uroplakins to the earliest vertebrates, links the structural diversification and skew in dN/dS ratios with major gene duplication events, and nearly exhaustively identifies all the existing uroplakin families including several novel ones.
We thank MICINN (Spain) for financial support (grant CONSOLIDER INGENIO CSD- 2010–00065 to A.G.-E.). A.G.-E. was supported by the Research Stabilization Program of the Instituto de Salud Carlos III-Institut Catala de la Salut in Catalonia. J.U.C thanks the IISPV for a predoctoral fellowship. T.-T.S. was supported by NIH grants DK52206 and DK39753, and the Goldstein Fund for Urological Research of the New York University School of Medicine. RD thanks the Sackler Institute for Comparative Genomics and the Korein Family Foundation for their continued support. The nucleotide sequence reported in this paper has been submitted to the GenBankTM/EBI Data Bank with accession number KF150200.
- 6.Berditchevski F, Rubinstein E: Tetraspanins Proteins and cell regulation. 2013, New York: SpringerGoogle Scholar
- 7.DeSalle R, Sun TT, Bergmann T, García-España A: The evolution of tetraspanins through a phylogenetic lens. Tetraspanins Series: Proteins and cell regulation, vol 9. Edited by: Berditchevski F, Rubinstein E. 2013, New York: SpringerGoogle Scholar
- 18.Mahbub Hasan AK, Ou Z, Sakakibara K, Hirahara S, Iwasaki T, et al: Characterization of xenopus egg membrane microdomains containing uroplakin Ib/III complex: Roles of their molecular interactions for subcellular localization and signal transduction. Genes Cells. 2007, 12: 251-267. 10.1111/j.1365-2443.2007.01048.x.PubMedCrossRefGoogle Scholar
- 19.Sakakibara K, Sato K, Yoshino K, Oshiro N, Hirahara S, et al: Molecular identification and characterization of xenopus egg uroplakin III, an egg raft- associated transmembrane protein that is tyrosine-phosphorylated upon fertilization. J Biol Chem. 2005, 280: 15029-15037. 10.1074/jbc.M410538200.PubMedCrossRefGoogle Scholar
- 41.Mitra S, Lukianov S, Ruiz WG, Cianciolo Cosentino C, Sanker S, et al: Requirement for a uroplakin 3a-like protein in the development of zebrafish pronephric tubule epithelial cell function, morphogenesis, and polarity. PLoS One. 2012, 7: e41816-10.1371/journal.pone.0041816.PubMedPubMedCentralCrossRefGoogle Scholar
- 42.Felsenstein J: Inferring Phylogenies. 2004, Sunderland: Mass: Sinauer AssociatesGoogle Scholar
- 46.Seigneuret M: Complete predicted three-dimensional structure of the facilitator transmembrane protein and hepatitis C virus receptor CD81: Conserved and variable structural domains in the tetraspanin superfamily. Biophys J. 2006, 90: 212-227. 10.1529/biophysj.105.069666.PubMedPubMedCentralCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.