Tracing the evolution of the heterotrimeric G protein α subunit in Metazoa
Heterotrimeric G proteins are fundamental signaling proteins composed of three subunits, Gα and a Gβγ dimer. The role of Gα as a molecular switch is critical for transmitting and amplifying intracellular signaling cascades initiated by an activated G protein Coupled Receptor (GPCR). Despite their biochemical and therapeutic importance, the study of G protein evolution has been limited to the scope of a few model organisms. Furthermore, of the five primary Gα subfamilies, the underlying gene structure of only two families has been thoroughly investigated outside of Mammalia evolution. Therefore our understanding of Gα emergence and evolution across phylogeny remains incomplete.
We have computationally identified the presence and absence of every Gα gene (GNA-) across all major branches of Deuterostomia and evaluated the conservation of the underlying exon-intron structures across these phylogenetic groups. We provide evidence of mutually exclusive exon inclusion through alternative splicing in specific lineages. Variations of splice site conservation and isoforms were found for several paralogs which coincide with conserved, putative motifs of DNA-/RNA-binding proteins. In addition to our curated gene annotations, within Primates, we identified 15 retrotranspositions, many of which have undergone pseudogenization. Most importantly, we find numerous deviations from previous findings regarding the presence and absence of individual GNA- genes, nuanced differences in phyla-specific gene copy numbers, novel paralog duplications and subsequent intron gain and loss events.
Our curated annotations allow us to draw more accurate inferences regarding the emergence of all Gα family members across Metazoa and to present a new, updated theory of Gα evolution. Leveraging this, our results are critical for gaining new insights into the co-evolution of the Gα subunit and its many protein binding partners, especially therapeutically relevant G protein – GPCR signaling pathways which radiated in Vertebrata evolution.
KeywordsHeterotrimeric G protein G protein coupled receptors Evolution Whole genome duplication Paralog Orthology Genome annotation
- 2R WGD
2nd (and 1st) Round of Whole Genome Duplication in the Vertebrata ancestor
- 3R WGD
3rd Round of Whole Genome Duplication in the Teleostei ancestor
DNA binding protein
Expressed Sequence Tags
G protein Coupled Receptor
Open reading frame
RNA binding protein
Regulator of G protein Signaling
Translated Coding Exon
Transcriptome Shotgun Assembly
Extra-long exon1 (GNAS and GNAL)
Extra-extra-long exon1 (GNAS)
G protein Coupled Receptors (GPCRs) are a highly studied class of receptors due to their integral role in cellular signaling and therefore as therapeutic targets. Their evolution has shaped the chemical and biomolecular signaling systems of eukaryotes [1, 2]. Within this signaling cascade, a transducing element, the heterotrimeric G protein, composed of a monomeric α and obligate βγ dimer, acts as an intracellular relay for activated GPCRs to convert their message into an amplified signaling cascade. With only 16 paralogs in humans, compared to the 800 GPCR genes, the evolution of heterotrimeric G protein α subunit has received less attention than their transmembrane protein partners.
Shortly after their initial discovery and sequencing in several Mammalia species, the Gα subunit was found to be a highly conserved housekeeping protein . As such, traces of genes encoding heterotrimeric G protein α subunits (GNA-) have been found in almost all major branches of Eukaryota [1, 4, 5] despite the proposed differences in GPCR and transmembrane receptor signaling mechanisms between the Unikonta and Bikonta lineages (see ).
Using only Mammalia sequences, the first theory of G protein α evolution posited the relative evolution of four of the five Gα families (Gαi, Gαq, Gαs and Gα12; Gαv having not yet been discovered) . Focusing on the development and radiation of the visual system, others have evaluated the evolution of transducins (GNAT1 and GNAT2) and other critical protein-coding genes in the vision signal transduction pathway in both rods and cones across Vertebrata and non-vertebrate Chordata [6, 7, 8, 9]. However, to our knowledge, there have been no reports focused on studying the evolution of the other three families of Gα in Deuterostomia with the exception of Gα subunits in the fish chemosensory systems , and a more recent, coarse-grained study evaluating paralog counts across Opisthokonta phylogeny .
From these studies and others, we have compared our estimation of when each paralog emerged within Metazoa evolution. We have found numerous differences in the timing and number of predicted gene gain and loss events, due to a) differences in methodologies employed while searching for paralogous sequences and constructing phylogenetic trees and b) increased search space through the inclusion of more genomes. In addition to reporting new and manually curated gene annotations, we have also uncovered variations in alternative splicing patterns, non-canonical splice sites (SS), novel intron gain and loss events, Primates gene retrotranspositions and subsequent pseudogenization, as well as other nuanced deviations to the gene structure of this family. These data allow us to present an updated view on G protein α subunit evolution.
Genomes were analyzed for curated annotation within the ExonMatchSolver (EMS) framework according to its Implementation and Usage  utilizing both paralog-specific, individual translated coding exons (TCE) and full paralog sequences. Briefly, the EMS pipeline utilizes TCEs as the fundamental building blocks for its searches. Paralog-specific TCE amino acid (AA) sequences of a close relative to the target species were utilized as the query against the target genome. There are 16 GNA- genes within humans. As each family was expected to have a conserved exon-intron structure throughout Metazoa, the high quality annotations of human GNA- genes were utilized as the initial templates. Sister groups of Mammalia were evaluated next, before moving on to more distant families. For each major clade (Sauropsida, Amphibia, Actinopterygii, etc.), curation began within the species assembly with the highest reported sequence coverage, genome quality and level of annotation. This curated sequence was used as a seed TCE query for further analysis within that clade. A minimum of two orthologs were used as individual inputs for the hmmsearch when querying each target assembly. In addition to exon border position information, EMS also utilizes full-length protein sequences to annotate orthologous proteins along the target genome assembly via a spliced alignment . A minimum of two orthologs from closely related species were utilized as protein sequence queries for the target spliced alignment.
We utilized the Ensembl genome browser [15, 18, 20] and NCBI’s genome and assembly browser  for our starting queries as these databases contain easily accessible and high quality genome annotations. To validate gene gain and loss events, we evaluated the transcriptome shotgun assembly (TSA) sequence database, expression sequence tag (EST) database, and UniGene databases, accessed through NCBI [16, 17, 21, 22], using amino acid-based (tblastn) search queries. It is important to note that tissue-specific expression of some paralogs may hinder sequence validation through this approach. Synteny information (co-localization with neighboring genes) was also utilized in evaluating paralog assignments and gene loss, when available, through the Ensembl and NCBI genome browsers. The species tree that was used for mapping gene gain and loss events (Fig. 1) is based on screening of recent literature and the consensus therein [23, 24, 25, 26].
Reconstruction of gene trees
In order to build phylogenetic maximum likelihood (ML) trees on the nucleotide and amino acid level using RAxML protocols [27, 28], exonic, protein-coding sequences of interest were aligned using both ClustalOmega  and MUSCLE , and edited with the Jalview alignment editor . The Jalview alignment editor was utilized to manually inspect the MSAs to ensure annotated exon border positions were maintained during ClustalOmega and MUSCLE alignments. Additional files of the edits before and after Jalview inspection have been provided as Supplemental files X and Y. MSAs were then handed over to RAxML . The appropriate amino acid or nucleotide substitution model for each tree was determined through Prottest  and additional tree parameter optimizations were conducted through preliminary rounds of ML searches comparing the different models of rate heterogeneity available in RAxML, respectively (Gamma, CAT, and a variable heuristics optimization [27, 33]). Random starting trees were also employed for initial independent ML tree searches to determine if random starting trees improved topology search space over a maximum parsimony starting tree. After optimizing the substitution model with the best model of among-site versus per-site heterogeneity rates and starting tree, the ML trees were compared for their diversity across tree topology. The strength of the phylogenetic signal was assessed through comparison of the best likelihoods, and pairwise-Robinson Fould (RF) distance calculations were conducted across all independent searches. Production runs calculated support values for all ML trees and utilized bootstopping for all bootstrap replicates to decrease computational time. Bootstrapped replicates were summarized into Extended Majority Rule Consensus Trees and reported with bootstrap (BS) values as additional files (Additional file 2: Supplemental file 1, Additional file 3: Supplemental file 4, Additional file 4: Supplemental file 5 and Additional file 5: Supplemental file 6). Pairwise-RF distance calculations across topologies as well as a Shimodaira and Hasegawa test were used to confirm that differences between likelihoods were not significant before summarizing into consensus trees.
Gene tree-species tree reconciliation
NOTUNG v.188.8.131.52  was utilized to reconcile the known species tree as extracted from timetree  with the bootstrapped maximum likelihood gene tree generated by RAxML including all Holozoa species investigated. The root was chosen randomly from a set of roots proposed by NOTUNG which minimizes the gain/loss event score. After rearrangements, NOTUNG reconciled the species tree with 100 duplications and 209 losses (Edge Weight Threshold: 90.0). The number of duplications and losses can be over predicted in cases when the gene tree topology does not correspond to the species tree topology. In our study, the fast divergence of a paralog in different clades and missing sequence data may also contribute. We further considered more information (synteny, timing of WGDs) that was not available to NOTUNG. Those proposed, additional duplications are not discussed in detail within the main document, but may be inspected in detail.
Investigation of protein-binding motifs within DNA/RNA sequences
Centrimo  was used to perform a local (positional) enrichment analysis of in vivo and in vitro DNA- and RNA-binding protein (DPB/RBP) motifs from the following databases: Ray 2013 restricted to available Vertebrata motifs (human, mouse, frog) , Jolma 2013 , Jaspar Core database 2014 , BS Uniprot  mouse. Centrimo evaluates absolute enrichment of a motif by performing a binomial test to determine whether the best match motif counts at a specific position are significantly different from a uniform motif distribution. Centrimo was also run in differential mode to conduct a Fisher’s exact test to determine positional motif enrichment in a primary sequence set in comparison to a control set (adjusted p-value corrected for multiple testing < 0.05 for both tests).
First, the potential overlap of all conserved non-canonical splice sites (SS) (the 5′ ‘GC’ SS of intron6 in GNAI1, and the 3′ ‘TG’ SS of intron3 in GNAS) with DBP/RBP motifs were interrogated by testing differential motif enrichment in the nucleotide sequence surrounding the SS (full-length exon sequence and 40 nt of the intronic sequence). All orthologous sequences in the query set conserved the non-canonical SS, while the control set contained sequences with the canonical SS at the orthologous position. Second, the positional enrichment of potential DBP/RBP motifs was investigated within exon3 of GNAS and the surrounding conserved region by performing an absolute, local enrichment test. Homologous sequences were extracted from an additional 27 Placentalia from the Ensembl webserver  to form a total dataset of 33 species.
Detection of Retrogenes in Primates
The longest protein-coding isoform of each human GNA- gene was blasted against the human genome. Sequence matches overlapping annotated retrogenes were extracted at the nucleotide level via the Ensembl webserver  (GNAI2P2, GNAI2P1, GNAQP1, GS1-124 K5.9, RP11-611O2.6, AC010975.2, RP11-100 N3.2). 11 target Primate genomes (Additional file 6: Figure S1) were then queried using these human GNA- pseudogene annotations. Primate retrogenes were retrieved as single blast hits with the following settings: blastn; e-value < 10− 5; match/mismatch: 1, − 3; and opening/extension: 5, 2. Additional synteny (gene co-localization) information was also considered when identifying potential retrogenes. In cases with short scaffold lengths and no available synteny information, full-length parent genes were re-blasted against the putative target loci. Loci that retrieved multiple, subsequent sequence matches were then excluded. A single sequence match was considered to be an individual exon of a multi-exon paralog if it covered less than 50% of the query sequence. Cases of 30–50% query coverage were manually inspected to identify exon borders.
Conserved open reading frames (ORFs) between orthologous retrogenes that showed similarity to the multi-exon paralog were interrogated. These potential ORFs within the retrogene loci (Blast hit +/− 300 nt) were identified with ORF Finder  and similarity to the parent protein confirmed by blast (bl2seq –n blastp). Then potential novel ORFs with coding potential that were not similar to the parent protein sequence were investigated. For this purpose, the retrogene loci were aligned with ClustalOmega  and coding potential was accessed with RNAcode  probing at least four different reference species. Sequence hits were reported if the region was conserved in all Primates and contained at least one methionine as a possible initiation codon for translation.
Expression of pseudogenes was investigated utilizing the following recourses: the Ensembl genome browser [15, 18], the USC genome browser (with available species-specific mRNA, EST, cDNA and protein data) , the Expression Atlas (release 18 06 2017) , and psiCube . In order to search the Expression Atlas, we only considered those 16 pseudogenes of non-human Primates that had Ensembl gene IDs of the orthologous pseudogene (RPKM > 0.5). Only a selection of the datasets, which showed expression of the pseudogenes are presented.
Detection of natural selection in GNAO
The branch-site model implemented in CODEML in the PAML package  was utilized for the identification of residues within branches under positive selection. Significance was tested by comparing to the χ2 distribution. To exclude possible biases from codon model choice or shifts in GC content, three different codon models were applied (Codon Table, F3X4 and F1X4) and were assessed for consistency between results. Residues under positive selection were identified by Bayes Empirical Bayes (BEB) analysis . The respective alignments were tested for the presence of recombination with the RDP4 software  in order to minimize false positive signals of positive selection that are caused by other processes (linear sequence = TRUE, Disentangle overlapping events = TRUE). All recombination tests results were not significant (default values used, p < 0.05). To obtain estimates of the robustness of model parameters, we performed 100× bootstrapping with the codeml_sba software for those branch-site tests that rejected neutral selection in class 2a and 2b in the foreground branch (p < 0.05) [49, 50].
A phylogenetic tree was constructed for the concatenation of exons7 and 8 of all GNAOs including Cephalochordata and Vertebrata (excluding Teleostei and Agnatha) and evaluated with two different foreground branches: the ancestral branch of GNAO.1 and GNAO.2 after the exon duplication, but preceding speciation of Vertebrata, respectively (see Fig. 9). The respective nucleotide sequences were aligned with MASCE v1.01b . Sequences with missing data in these exons were excluded. The divergence of this alignment is not ideal (tree length 15.7 in H0, F3X4). However, as high divergence would lead to a loss of power rather than an increase in the rate of false positives in the test , the divergence is not considered to be deleterious to the analysis. Positive selection and differences in selection pressure were also tested in the foreground branch of a gene tree composed of GNAO (a,b).1 s and GNAOa.2 sequences including exons7 and 8 of Actinopterygii (ray-finned fishes). Foreground branches were defined as the branches after the 3R WGD and before Teleostei speciation (ancestral branches of GNAOa.1, b.1 and a.2, respectively, see Fig. 9).
Computational modeling of tertiary structures
Available crystal structures of Gα subunits and structural models based on crystal structures were utilized to map exon sequence positions onto tertiary folds. Though all structures and models utilize Mammalia sequences, the highly conserved tertiary and exon-intron structure of Gα supports that the relative exon position mappings are maintained across all phyla. The crystal structures of Gαq bound to PLCβ3 and RGS8 were utilized (PDB ID 4QJ3  and 5DO9 , respectively). The active monomer of Gαs (PDB ID 1AZT ) was used in addition to the crystal structure of Gαi bound to Gβγ (PDB ID 1GP2 ) and to RGS4 (PDB ID 1AGR ). Comparative models of Gαo (human GNAO.1 transcript variant) and Gαs (human sequence without exon3 and extended exon4) were constructed from previous modeling studies of the ternary complex  (activated GPCR bound to Gαi and Gβγ) by replacing Gαi side chain residues with either Gαo or Gαs sequence while maintaining backbone atom coordinates. After threading these sequences, model hybridization continued with optimizing fragment insertions, and relieving chain breaks through the comparative modeling RosettaCM protocol . The relaxed and optimized structural models were then utilized for further exon sequence mapping based on conserved sequence positions. All crystal structures and models were visualized with Pymol .
Results and Discussions
Gα paralog evolution before the 2R WGD of Vertebrata
preGNA- genes before the 2R WGD
The early Vertebrata ancestor underwent multiple rounds of whole genome duplication (WGD) [61, 62, 63, 64]. These events allowed for increased gene number and sequence diversity and are thus of special interest. Therefore, we primarily focused our study to species of Deuterostomia, but included nine non-Deuterostomia Opisthokonta as outgroups. To clarify the orthology relationship the following gene names are used to refer to the progenitor representatives of the Gα families before the Vertebrata radiation: preGNAI, preGNAO, preGNAQ, preGNAS, preGNAV, preGNA12 with the exception of paralogs within S. cerevisiae which are referred to as GPA1 and GPA2.
Using the EMS gene annotation pipeline, we report an updated, full account of paralog presence and paralog assignment within the outgroup species in comparison to previous reports. We find seven preGNA- paralogs in C. owczarwaki (previous reports find eight [2, 5]), six in A. queenslandica (previous studies report a range from five to seven [2, 5], while we and  identified eleven paralogs in M. leidyi.  report twelve to thirteen). All reports within M. brevicollis and S. cerevisiae were found to contain three preGNA- and two GPA- paralogs, respectively.
(pre)GNA- paralog presence before and after the 2R WGD in Vertebrata projected onto a Deuterostomia species tree
More specifically within Deuterostomia, we investigated nine species that diverged before the 2R WGD of Vertebrata, providing a clear starting point before the radiation of this gene family. Within each of these phyla we verified the existence of at least the six established paralogs. Exceptions were found within Urochordata, as we find a lineage-specific loss of preGNAO and preGNAV at the base of this phylum; this is contrary to previous reports of two preGNAO paralogs in C. intestinalis . To confirm this lineage-specific loss, we annotated four Urochordata genomes. All four possess multiple preGNAI-like genes, but none group within the preGNAO subtree (Fig. 2). A putative gene fragment, found only within B. schlosseri, groups with preGNAV (BS value 66). Due to limited data, it is unclear if this sequence represents a protein-coding gene or a pseudogene (Table 1, Fig. 2, and Additional file 7: Supplemental file 2.
In addition, each phylum interrogated maintained their own number of local gene duplications and/or retrotranspositions for the different primary Gα families (see Appendix A.i for details). To our knowledge, we are the first to report evidence of these duplications and the existence of these retrogenes. Further validation of their presence was interrogated by transcriptome and expression data where available (Additional file 8: Supplemental file 3).
The (pre) Gαi, q, and v families form a monophyletic group within Gα
This hypothesis is supported by the following observations: (1) The individual preGNAI and preGNAQ genes are encoded by eight and seven protein-coding exons, respectively. The family-specific exon borders are conserved across all paralogs within Cnidaria, Placozoa and Porifera, excluding lineage-specific variations within Protostomia and prior to Parazoa (Fig. 3). (2) preGNAI and preGNAQ are not arranged in tandem within the investigated Protostomia and non-Bilateria Metazoa species. Taking the evidence of (1) and (2) together, the scenario by Wilkie et al. would require independent intron gain and loss events within exon2/3 of preGNAI and exon2 of preGNAQ as well as independent lineage-specific losses of one of the gene copies in both preGNAI’ and preGNAQ’ gene pairs in the lineages which evolved after the divergence of preGNAI/Q into separate genes.
Therefore, we reject the highly unlikely hypothesis of a tandem duplication occurring before the duplication and divergence of preGNAI and preGNAQ into separate genes  and propose that preGNAI and preGNAQ underwent independent tandem duplications preceding the 2R WGD of Vertebrata. This gave rise to the preGNAI’-preGNAI” and preGNAQ’-preGNAQ” paralog pairs that retained their tandem orientation (Fig. 4). These genes are also referred to as GNAI0-GNAT0 and GNAQ/11-GNA14/15, respectively. Further studies will be required to validate the details of this hypothesis, specifically within non-Metazoa lineages.
No confirmed tandem duplications of preGNAQ were found in the investigated species prior to the 2R WGD of Vertebrata suggesting that preGNAQ tandemly duplicated into the preGNAQ’-preGNAQ” pair at the root of the Vertebrata lineage prior to the 2R WGD events. This progenitor pair then duplicated twice and retained the two gene pairs GNAQ-GNA14 and GNA11-GNA15 in Vertebrata.
We identified tandem duplications of preGNAI into what could be the progenitor preGNAI’-preGNAI” arrangements in Placozoa and Hemichordata. The gene pairs are both arranged in head to head orientations similar to those found in the two of the GNAI and GNAT gene pairs of Vertebrata. The Placozoa preGNAI duplications (GIa_Tadhaerens and GIb_Tadhaerens) both group within the preGNAI subtree with medium BS values (43). Within Hemichordata, one gene copy (GIa_AcornWorm) groups with the preGNAI subtree while the other forms the root of the GNAT subtree (GIb_AcornWorm) (Fig. 2). Though this grouping suggests that the gene pair may be a preGNAI0-preGNAT0 set, the low BS value (14) prevents this conclusion. All other identified preGNAI duplicates are not in a tandem arrangement; however, their small contig sizes prohibit thorough examination of conserved synteny. Overall, this suggests that the tandem duplication of preGNAI could have occurred prior to the emergence of Deuterostomia, but our annotations are not sufficient for further speculation without including more sequences and synteny information.
Independent duplications of preGNAI led to the emergence of preGNAV and preGNAO
We further expand on the hypothesis set by Wilkie et al.  by including Gαv into our analysis. Discovered in 2009  Gαv represents what some suggest is the fifth and final family of the G protein α subunit in animals . We hypothesize that preGNAV originated from an ancestral duplication of preGNAI within or just prior to the emergence of Holozoa as we and others [5, 65] have found this paralog across Holozoa lineages.
Note that the ML gene tree cannot resolve whether preGNAV emerged by duplication of preGNAI/Q, preGNAI or preGNAQ as the respective nodes are not well supported (Additional file 9: Figure S7). One of those possibilities is shown in Fig. 4.
preGNA12 originated from a Retrotransposition
The same is true after the duplication of preGNA12 (into GNA12 and GNA13) coinciding with the 2R WGD. The GNA13 paralog is conserved across Vertebrata, but we see altered exon-intron border positions between species which arose before and after the 3R WGD of Teleostei (Fig. 6e-g) (the 3R WGD is discussed below). Intron gains have been found to promote gene expression, transcript maturity, accumulation, and processing [69, 70, 71, 72, 73, 74]. The lack of similarity to the other family members’ exon-intron structures, and its diversity in function  suggest the possibility that preGNA12 underwent neofunctionalization after retrotransposition.
Gαs is related to Gαi/q
Excluding retrogenes and gene fragments, preGNA- genes (preGNAV, preGNAI, preGNAQ, and preGNAS) shared at least four exon border positions and three split codons (codons encoded across two exons). This suggests that preGNAI/Q and preGNAS may have arisen as a result of a gene duplication event from a common ancestor, though exon border information alone is not sufficient to draw this conclusion (Fig. 4). Further analysis is required to ascertain the exact evolutionary relationship between the Gαs and Gαi/q families; however, we see that (pre) GNAV and (pre) GNAI form a monophyletic group while (pre) GNAS clusters outside of this branch on the ML tree (Fig. 2, Additional file 9: Figure S7).
Individual exon duplications of preGNAI/Q and preGNAS in Cephalochordata
Prior to the 2R WGD, many paralogs underwent independent, local, single exon duplication events that give rise to alternative splice variants with mutually exclusive exons. Our findings are expanded upon in Appendix A.ii. We found alternative isoforms that arose by exon duplications for preGNAI, preGNAQ, and preGNAS. These may translate into proteins with diverse functions as these alternative transcripts differ in sequence around critical functional and protein-interface regions.
Gα paralog evolution after the Vertebrata 2R WGD
Paralog gains and losses
After a whole genome duplication event, new genetic material will either be maintained (if evolving under purifying or positive selection pressures) or will vanish into the genomic background (if evolving under neutral selection) . Duplicated genes that are maintained may gain new functions or subfunctionalize through mutations in the protein-coding sequence. Temporal and spatial expression patterns may be altered through changes in regulatory regions of the gene. Changes may be maintained to compensate for dosage effects, or serve as a failsafe against the accumulation of deleterious mutations [77, 78, 79]. It was estimated that after the 2R WGD of Vertebrata only 20–25% of the duplicated genetic material was retained within genomes [62, 80]. Genes with a low rate of amino acid substitution are more likely to be retained after a WGD , as are genes involved in the nervous system  or cellular signaling .
The Gα subunit is considered a housekeeping gene due to its pivotal role in transducing and amplifying signaling cascades in all cells. Many paralogs are ubiquitously expressed (Gαs, 12, 13, q, i2) in Mammalia tissues, and all but Gα14 and Gα15 are expressed in the brain or neurosensory tissues . Therefore, the duplicated and retained GNA- genes (Table 1b) are expected to evolve under strong purifying pressure to prevent the gain of deleterious mutations. Many duplicated Gα paralogs that were retained after the 2R WGD gained new functions, interaction partners, tissue specificity and/or new cellular signaling properties [8, 75].
The radiation of Gαi
We found no evidence of the proposed GNAT-like progenitor gene  in the Chordata lineage (preGNAT0) prior to Vertebrata divergence; this is in accordance with previous findings . In addition, we identified a putative preGNAT0 sequence within the Hemichordata lineage (denoted GIb_AcornWorm), that is positioned in a head to tail arrangement with a preGNAI gene (GIa_AcornWorm). It is not clear, whether this sequence represents a 1:1 ortholog to GNAT0 due to a low BS support (14) of GIb_AcornWorm with the split of the Vertebrata GNAT subtree.
GNAT3, which is situated adjacent to GNAI1 in a head to head orientation within Vertebrata genomes, is lost in a lineage-specific manner in Amphibia and Actinopterygii as reported previously [6, 10] and confirmed by the current study. The conserved syntenic regions around GNAI1 are maintained, revealing that this loss of GNAT3 is local and not connected to additional rearrangements. The fourth GNAI-GNAT gene pair (GNAI4-GNAT4) was predicted to be immediately lost subsequent to the 2R WGD ; synteny mapping in humans show a conserved fourth set of genes surrounding the region where the GNAI4-GNAT4 pair was initially situated after duplication and then presumably deleted .
However, we found nucleotide sequence evidence for four paralogs of GNAI in the Agnatha lineage in both lamprey species investigated, which may correspond to the four copies originating from duplications of the GNAI0-GNAT0 gene pair. All four GNAI genes have the same eight protein-coding exon structure with conserved border positions, and the amino acid ML tree shows the putative GNAI1–4 all clustering close to the root of the Gnathostomata GNAI subtree (Fig. 2). The nucleotide ML tree provides better resolution with lamprey GNAT1 and GNAT2 clustering with their putative Vertebrata 1:1 orthologs (Fig. 7). Synteny mapping supports the expected head to tail orientation of the GNAT1-GNAI2 pair and the head to head orientation of GNAI3-GNAT2. In addition, GNAI1 synteny supports the loss of GNAT3 by maintaining conserved flanking gene neighbors. While a fourth copy of GNAI (GNAI4) has been briefly described previously in lampreys , the lack of clear synteny information prevents further validation of its origin in the Vertebrata ancestor. Though the conservation of exon border positions, split codons, and nucleotide sequence support the assignment of GNAI4 to the Gαi subfamily, evidence of conserved gene neighbors are needed to ascertain if this paralog is the product of an independent duplication or if it is a product of the 2R WGD. There is no evidence of 1:1 orthologs to the lamprey-specific GNAI4 in other Vertebrata lineages. We also reveal that the putative fourth member of GNAT proposed by  is rather a putative GNAT1 ortholog considering synteny information and ML tree topology, not a novel GNAT gene or the missing fourth member.
One significant improvement from our study comes from the inclusion of two Agnatha species. The genome of P. marinus used in previous studies is highly fragmented preventing reconstruction of complete gene sequences or evaluation of synteny information. Including an additional species allowed us to clarify ambiguities present in those regions. Nevertheless, we cannot resolve whether lamprey GNAI1–3 and GNAT1–3 represent 1:1 orthologs to human GNAI1–3 and GNAT1–3, respectively, despite the conserved tandem orientation of the genes and conserved synteny around several of the paralogs, as the position in the ML tree is not well supported and partially conflicting. The lamprey Gαq family members are also situated near the root of the Q/11 or the whole Q family subtree in the ML tree (see below). This reflects the current debate about the exact timing of the 2R WGD relative to the divergence of lampreys and possible lamprey-specific (whole) genome duplications [13, 84].
We identified full-length GNAZ genes in all Vertebrata species evaluated (including ghostshark), as well as partial genes (due to small contig size) in both lamprey species - contrary to previous reports , Contrary to previous theories , we found no substantial evidence of preGNAZ-like sequences in non-Vertebrata Deuterostomia. The ML tree composed of all five primary families (Fig. 2) shows GNAZ grouping tightly within the Gαi family; taken together, this suggests GNAZ originated from a duplication of a Gαi family member in early Vertebrata evolution.
Two preGNA- sequences (B. schlosseri and T. adhaerens) are seen on the ML tree to group with the GNAZ branch, albeit with low bootstrap values (32). Both genes in question possess a gene structure that is highly similar to the eight exons of preGNAI and are thus excluded as 1:1 orthologs of a putative preGNAZ.
The exon-intron structure of GNAZ largely deviates from the exon-intron structure of other Gαi family members (Additional file 10: Figure S2). GNAZ is located on the opposite strand within an intron of the RSPH14 gene. We hypothesize that GNAZ emerged through a retrotransposition into this position and subsequently gained one intron. This resulted in the conserved two protein-coding exon gene structure. Appendix B.i discusses further analysis done to investigate whether the intron of GNAZ carries signatures of insertion mediated by a retrotransposon mechanism; however, no conservation of these residues was found.
Three of the four known family members (prior to Gαv discovery) were previously predicted to be situated on large blocks of duplicated genetic material . We systematically validated that preGNAQ duplicates (GNAQ, 14, 11 and 15) were present in all Vertebrata. The head to tail arrangement of the gene pairs GNAQ-GNA14 and GNA11-GNA15 is conserved in all investigated species. As seen in the ML trees, GNAQ and GNA11 are very closely related while GNA14 and GNA15 though diverged, group together.
GNA14 and 15 have gained sequence divergence, tissue expression specificity and new functionality, while GNAQ and 11 appear to be ubiquitously expressed in Mammalia tissues and are involved in a high level of redundant cellular signaling processes . We see two lineage-specific losses of GNA15 in Coelacanthiformes as well as in Neoaves (supported by loss in all six investigated neoavian species), that are further supported by synteny information, EST and TSA data (Additional file 8: Supplemental file 3).
During the 2R WGD, preGNAS duplicated to give rise to GNAS and GNAL (Gαolf) ; GNAL developed tissue-specific expression and functional specificity within the olfactory bulb and various neuronal tissues . We found a species-specific loss of GNAL in the genome of the green anole lizard. However, when validating this putative loss with transcriptome and expression data, we found evidence of GNAL expression within lizard TSA and EST data [17, 21] (Additional file 8: Supplemental file 3c-d). We thus conclude that GNAL must be encoded within the genome of the green anole lizard though it is not represented within the investigated genome assembly. Such issues have been previously reported and may be due to problems during scaffold assembly and coverage during sequencing .
In addition to the XL-exon, an extra-extra-long exon (XXL-exon) has been reported upstream of GNAS in human and rodent species . Due to its variability in size (approximately ranging from 1400 nt to 2300 nt) and vast sequence divergence, the XXL-exon was not investigated here. Conservation of imprinting [93, 94] and the gene promoter, which is shared with four other upstream genes [95, 96], were not the subject of this study. For excellent reports on the complex GNAS gene structure in Mammalia, please see [92, 97, 98].
As another peculiarity, GNAS possesses a cassette exon, exon3, which can be skipped during splicing [99, 100] (Fig. 10a). The inclusion of exon3 adds 15 AA to the Gαs protein (14 AA encoded by this exon plus one AA encoded by a split codon shared with exon4). When mapped onto the tertiary protein structure, the amino acid region encoded by exon3, extends a flexible linker between α-helix1 of the enzymatic GTPase domain and α-helixA of the helical domain (Fig. 10b). This region may be important for G protein activation and nucleotide exchange [89, 101].
The cassette exon3 of GNAS appears to be a very “recent” evolutionary invention as we only find it conserved in Placentalia (placental mammals) but not in other Vertebrata. Interrogation of available transcriptome and expression data confirmed that there is no evidence of exon3 existence outside of this branch (Additional file 8: Supplemental file 3). The intron between exon2 and 4 is large (~ 43,000–72,000 nt) in non-placental Sarcopterygii, while the homologous region becomes much smaller (~ 6000–9000 nt) after emergence of exon3.
We searched for sequences similar to exon3 in other species of Mammalia to elucidate the possible origin of this new exon. We could not find sequence similarity to human proteins from UniProt KB  or the NCBI database  or to the intronic region between exon2 and exon4 in 14 Sarcopterygii (lobed-finned fishes) when querying with the amino acid and nucleotide sequence of exon3, respectively. Within Placentalia, a highly conserved sequence stretch of roughly 75 nt is situated upstream and 25 nt downstream of exon3, bookending the exon (Additional file 14: Figure S4). Appendix B.ii discusses predicted motifs for DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) we identified which may be present within this sequence stretch.
The emergence of exon3 in Placentalia also co-occurs with the ability of exon4 to be extended by three nucleotides (Fig. 10). This extension is mediated by a well-documented non-canonical SS ‘TG’ situated 3 nt upstream of the canonical SS ‘AG’ . The ‘TG’ splice recognition pattern shifts the SS to allow the nucleotides ‘CAG’ to be included within the exon giving rise to four different isoforms around this exon junction variation: exon2-E-exon3-G-exon4, exon2-E-exon3-GS-exon4, exon2-D-exon4, exon2-DS-exon4.
We found no evidence of an extended exon4 outside of Placentalia in any genome interrogated. Therefore, we conclude that exon3 and the extension of exon4 co-occurred in the ancestor of Placentalia after the split from Marsupialia (marsupials). The expression of all four possible variations of transcripts with the inclusion/exclusion of exon3 and the possible extension of exon4 is supported by transcriptome and expression data.
Pyne et al. speculated that the additional amino acid arisen from the exon extension could promote phosphorylation . We did not find any evidence for posttranslational modifications at this or neighboring positions in UniProt KB  or the PhosphoSite database . Amino acids encoded by exon3 and the exon4 extension are situated in a flexible linker region between the GTPase domain and the helical domain of the G protein. This region is unresolved in all crystal structures of the Gαs subunit (Fig. 10b).
preGNA12 was duplicated to give rise to GNA12 and GNA13 in Vertebrata during the 2R WGD. Both paralogs are present in all Vertebrata genomes investigated except for Amphibia (X. tropicalis and X. laevis). Genomic information and available EST data support a loss of GNA12 (Additional file 8: Supplemental file 3) though GNA13 is present in both species. Refer to Fig. 6 for altered exon border information.
GNAV was the most recently discovered member of the GNA- genes  due to the widespread loss of this paralog. GNAV was lost independently twice within Vertebrata: at the base of Tetrapoda and at the base of Agnatha. Any preGNAV gene duplications were not retained after the 2R WGD. Prior to the 2R WGD, preGNAV gained an intron dividing exon7 into two (Fig. 5a). This gene structure is maintained in all species of Vertebrata where the paralog is present (ghostshark, coelacanth, gar and Teleostei).
Retrogenes in Primates
We find that members of four of the five Gα families have been subjected to repeated retrotransposition during very recent evolutionary history, specifically during the evolution of Primates and suborders within (Additional file 6: Figure S1). Eight of the 15 retrotranspositions are species-specific and limited to the marmoset and tarsier-lineages (Additional file 15: Table S4). This might reflect the excess of retrocopies in Platyrrhini (New World monkey) in comparison to Cercopithecidae (Old World monkey) . Additionally, the GNA11 retrogene GS1-124 K5.9 was tandemly duplicated twice as indicated by the location of these retrogenes in proximity to their parent retrogene. Surprisingly, the gorilla-specific copy of GS1-124 K5.9 conserves more than 80% of the full-length open reading frames (ORFs) of the parent gene with 99.34% sequence identity to the protein sequence, although we did not detect any expression. Contrarily, the Cercopithecidae-specific GS1-124 K5.9 copy is expressed in baboon frontal cortex.
Most of the Primates retrogenes degraded into pseudo-retrogenes conserving several short ORFs that are still similar to the parent genes. Those pseudo-retrogenes are only lowly transcribed in one species in at most two independent RNA-seq experiments considered (Additional file 15: Table S4).
Contrarily, GS1-124 K5.9 and GNAQP1 are interesting examples of retrogenes that are functional in several Primate species. We consider both genes to be functional as 1) they conserve a homologous region longer than 40 AA with high similarity to the parent protein across all Catarrhini; 2) promotors are annotated directly upstream on the same strand in human (Ensembl v87 ); 3) transcription of both genes in human is supported by the psiCube data as well as by six independent RNA-seq studies retrieved from the Expression atlas [44, 45] (three shown) and by at least one RNA-seq experiment for another Primate species, vervet-AGM and macaque, respectively (Additional file 16: Figure S5, Additional file 15: Table S4). GNAQP1 is expressed in a variety of tissues, while transcription of GS1-124 K5.9 was detected in only three tissues in human (testis, choroid plexus, and forebrain Additional file 16: Figure S5 a, c & e). Five independent studies support the expression of both genes in human testis (see Additional file 16: Figure S5e for sixth study) in accordance with the tendency of retrogene expression in testis reported previously [104, 105]. Interestingly, macaque also expresses GNAQP1 in testis (Additional file 16: Figure S5b).
Two other retrogenes, AC010975.2 and RP11-100 N3.2, are transcribed in human and at least one other species implying that those genes might also be functional, although we detected no conserved ORF or upstream promotor. The GNA13 pseudogene AC010975.2 is expressed in human, vervet-AGM and baboon with overlapping tissue expression in pituitary gland across both Cercopithecidae species, while RP11-100 N3.2 is expressed in human and macaque (not shown). We note that the expression levels found of all (putative) functional GNA- retrogenes are in general lower than expression of the parent genes.
The Gα subunit belongs to the fold clan of P-loop NTPases. This clan is one of the few examples of gene families that are consistently highly duplicated via retrotransposition in the different lineages of worm, human and fly . Our observation in this context is in accordance with findings that correlate retrotransposition with the expression level of the parent gene in germ line tissue [106, 107]. Most members of the Gα family are housekeeping proteins that are known to have widely distributed or ubiquitous expression patterns throughout the body . The excess of GNA- retrotransposition in Primates likely reflects the known high activity of retrotransposable elements in this clade . (Pseudo-)retrogenes are a potential source for the emergence of paralogs, (long) non-coding RNAs and ORFs encoding small peptides and are often lineage-specific . The latter two types do not necessarily have sequence similarity to the parent protein and can gain functions in a completely different cellular context. In this study about GNA- gene and protein evolution, we focused on retrogenes that still show sequence similarity to the parent protein and well annotated human GNA- retrogenes. Our retrogene counts thus represent a lower boundary of retrotransposition events. Instead of providing exact counts, we exemplified the high frequency of retrotranspositions in the evolutionary history of GNA- genes in the Primates lineage.
Individual exon duplications in GNAQ, GNA11, and preGNAI
We found additional duplications of exon4 in GNAQ and GNA11 in some species of Vertebrata. Surprisingly, the homologous sequence of preGNAI, encoded by exon5, can also be alternatively spliced in Urochordata. The sequence diversity in the alternatively spliced transcripts may have an important role in providing novel functionality as these sequence regions correspond to important interface regions within the protein tertiary structure. For further analysis of these exons, please see Appendix B.iii, Additional file 17: Figure S6, and Additional file 4: Supplemental file 5.
Non-canonical splice sites of GNAI1
We found conservation of canonical ‘GT-AG’ splicing patterns for all of the exon sequences annotated with two exceptions. The first is the alternative upstream splice site (SS) of exon4 in GNAS in Placentalia which has been discussed above. The second is the highly conserved 5′ non-canonical SS ‘GC’ in intron6 of GNAI1 in most species of Sauropsida and Mammalia (Additional file 9: Figure S7). This non-canonical splice site co-occurs with an extension of the consensus motif within the surrounding exonic and intronic regions. As the switch from canonical to non-canonical SS, and its subsequent systematic conservation, is surprising, we evaluated possible selective pressures within this region. Our analysis of motifs for DNA-/RNA-binding proteins (DBPs/RBPs) is detailed in Appendix B.iv and Additional file 9: Figure. S7 and Additional file 18: Figure S8.
Gα paralogs after the 3R WGD in Teleostei
Paralog gains and losses
In addition to the Vertebrata 2R WGD [61, 62] a third round of whole genome duplication (3R WGD) occurred at the base of Teleostei [64, 109, 110]. It is estimated that over 75% of the genes which arose from the 3R WGD were subsequently lost [109, 110]. The paralog gains and losses obtained from the EMS are summarized in Table 1. We confirmed and updated the paralog counts reported by Oka et al. . Briefly, we find two copies of GNAI1, GNAI2, GNAL, GNA11, and GNA14 in all Teleostei. GNAV, GNAS, GNAQ all have two copies present in Euteleostei, but only one copy remains in zebrafish. GNAO and GNA13 also have two copies, though there are lineage-specific deletions in pufferfish and Atlantic cod, respectively. Only one copy is maintained after the 3R WGD for GNAI3, GNAZ, GNAT1, and GNAT2. GNA12 also has one copy retained in Euteleostei, but two copies are present in zebrafish. It appears that zebrafish GNA15 underwent several duplications resulting in an arrangement of four GNA15 paralogs  situated on the same chromosome next to each other with otherwise conserved synteny. At least three of the four copies are expressed as confirmed by EST and TSA data. GNAT3 is deleted in all Actinopterygii. Of the paralogs that are retained, we find variations in the positions of intron-exon borders (GNA12 and GNA13) and variations in alternative splicing patterns (GNAO, GNA11, GNAQ) as discussed in other sections.
GNAO alternative splicing in Teleostei
Two copies of GNAO were retained after the 3R WGD (except within Tetraodontidae -pufferfish). In zebrafish, medaka and stickleback both mutually exclusive exons (exon7.2–8.2 and exon7.1–8.1) were retained in one copy (referred to as gene copy ‘a’ - GNAOa.1 and GNAOa.2). The other gene copy (GNAOb) lost one pair of exons7–8 immediately following the 3R WGD. In Tetraodontidae, we see a lineage-specific deletion of the complete GNAOa copy (Fig. 9a).
To determine which copies of the exon sequences were retained in these paralogs (either variant .1 or .2), we created a ML tree of the nucleotide sequences for GNAO’s exon7 and exon8 across all phylogenetic branches evaluated. We see that the alternatively spliced exons7 and 8 of GNAOa possess both the .1 and the .2 transcript variants while all of the .1 sequence variants are conserved within GNAOb. Thus, we resolve that the .2 exon pair of GNAOb was lost at the base of Teleostei and that GNAOa.2 was lost independently in G. morhua (Atlantic cod). In our selection analysis, we did not detect any residues under positive selection in any of the ancestral branches tested (GNAOb.1, GNAOa.1 and GNAOa.2). While all residues of exons 7.1 and 8.1 are under strong purifying selection in both ‘a’ and ‘b’ copies (w = 0.0075), the selection pressure is slightly released with about 6% of residues evolving under neutral selection in the ancestral branch leading to GNAOa.2. This might also reflect the released pressure that ultimately led to the loss of GNAOb.2 in all Teleostei.
The strength of this study comes from the inclusion and curation of genes from highly fragmented genome assemblies in addition to the genomes of well-studied model organisms. Despite improved long-read genome sequencing techniques, computational assembly of accurate whole genome sequences remains a challenge . High sequence similarity between genes due to homology remains challenging when assembling DNA-seq reads into larger scaffolds or when mapping RNA-seq reads to a genome. The ambiguity of these regions can result in chimeric gene annotations where two different genes are presumed to be one. Additional errors can be introduced via automated gene prediction tools which probe the assembly. For a more thorough examination of these hurdles please see [11, 24].
The ExonMatchSolver (EMS) algorithm  was developed to assist in overcoming some of these challenges when curating highly fragmented genome assemblies. EMS differs from other methodologies by querying for the collective “match” of all paralogous genes of a protein family within an individual genome assembly. As the family of heterotrimeric G proteins contains many paralogs, we used the EMS technique to annotate and disambiguate paralogs of the Gα subunit across phylogeny. Despite its usefulness, it is of note that the EMS pipeline does not resolve inversions of exons or significantly altered exon-intron structures. Instead this tool provides contexts for manually resolving such ambiguities in the nucleotide sequences.
Through the use of the EMS pipeline to assist in the curation of the GNA- genes across a dense species sampling, we have identified dozens of sequence deviations and inconsistencies within the examined species and paralogs compared to previous works and genome annotations. In this work, we have uncovered many paralogs of GNA- not identified by previous methodologies; this is likely due to the use of coarse-grained approaches which misidentified the presence and absence of genes and/or due to the reliance on gene trees covering a limited range of species. Our updated report allows us to refine the theories surrounding Gα evolution.
In addition to the major findings of gains and loss events and paralog family assignments within this manuscript, we also uncovered previously unknown variance in gene duplications, the conservation of alternative splicing patterns, exon duplications/insertions, non-canonical SS, conserved DBP and RBP motifs, and traced back the emergence of Primate retrogenes. Each of these variants are expanded upon in the appendices. In addition, our curated sequences have been made available for use as the basis of future annotations, sequencing efforts, and as seed inputs for developing biological questions surrounding the Gα family.
We thank Axel Wintsche for insight into the investigation of DBP and RBP motifs.
HH and AL were supported by EY006062; EY010291. JM and AL were supported through NIH (R01 GM080403, R01 GM099842). HI was partially funded by the Volkswagen Foundation within the framework “Evolutionary Biology” and by the European Social Fund (ESF) of the European Union (EU) (100148833/22117017, 100227413). Traveling of HI and AL between Leipzig and Nashville was supported by microgrants from the Leipzig/Vanderbilt collaboration.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files.
AL collected, analyzed, and interpreted the data and wrote the manuscript. HI significantly assisted in analyzing and interpreting the data and was a major contributor in writing the manuscript. JM provided assistance with G protein structure and modeling. HH provided assistance with G protein structure and function. PS assisted in data analysis and interpretation. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 16.O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):D733–45.PubMedCrossRefGoogle Scholar
- 20.Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, García Girón C, Hourlier T, et al. The Ensembl gene annotation system. Database (Oxford). 2016;2016Google Scholar
- 27.Stamatakis A. Using RAxML to infer phylogenies. Curr Protoc Bioinformatics. 2015;51:6.14.11–4.Google Scholar
- 37.Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen CY, Chou A, Ienasescu H, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42(Database issue):D142–7.PubMedCrossRefGoogle Scholar
- 47.Bielawski JP, Baker JL, Mingrone J. Inference of episodic changes in natural selection acting on protein coding sequences via CODEML. Curr Protoc Bioinformatics. 2016;54:6.15.11–16.15.32.Google Scholar
- 58.The PyMOL Molecular Graphics System, Version 1.8 Schroedinger, LLC. Google Scholar
- 60.Holland PW, Garcia-Fernàndez J, Williams NA, Sidow A. Gene duplications and the origins of vertebrate development. Dev Suppl. 1994:125–33.Google Scholar
- 80.Julien Roux JL, Marc Robinson-Rechavi Selective constraints on coding sequences of nervous system genes are a major determinant of duplicate gene retention in vertebrates. bioRxiv. 2017;2016(072959):PrePrint.Google Scholar
- 86.Oldham WM, Van Eps N, Preininger AM, Hubbell WL, Hamm HE: Mechanism of the receptor-catalyzed activation of heterotrimeric G proteins. Nat Struct Mol Biol 2006, 13(9):772–777.Google Scholar
- 100.Kaya AI, Lokits AD, Gilbert JA, Iverson TM, Meiler J, Hamm HE. A conserved phenylalanine as relay between the α5 helix and the GDP binding region of heterotrimeric Gi protein α subunit. J Biol Chem. 2014;Google Scholar
- 112.Lindsay SJ, Xu Y, Lisgo SN, Harkin LF, Copp AJ, Gerrelli D, Clowry GJ, Talbot A, Keogh MJ, Coxhead J, et al. HDBR expression: a unique resource for global and individual gene expression studies during early human brain development. Front Neuroanat. 2016;10:86.PubMedPubMedCentralCrossRefGoogle Scholar
- 115.Oldham WM, Hamm HE. How do receptors activate G proteins? Adv Protein Chem. 2007;74:67–93.Google Scholar
- 120.Pollard AJ, Krainer AR, Robson SC, Europe-Finner GN. Alternative splicing of the adenylyl cyclase stimulatory G-protein G alpha(s) is regulated by SF2/ASF and heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and involves the use of an unusual TG 3′-splice site. J Biol Chem. 2002;277(18):15241–51.PubMedCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.