SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions
Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.
KeywordsDifferential splicing Alternative splicing RNA-seq Uncertainty Biological variability Differentiation
Induced pluripotent stem cell
Proportion spliced in
Reverse transcriptase polymerase chain reaction
transcripts per million
Transformer-2 protein homolog alpha/beta
Alternative splicing is related to a change in the relative abundance of transcript isoforms produced from the same gene . Multiple approaches have been proposed to study differential splicing from RNA sequencing (RNA-seq) data [2, 3]. These methods generally involve the analysis of either transcript isoforms [4, 5, 6, 7], clusters of splice junctions [8, 9], alternative splicing events [10, 11], or exonic regions . Relative abundances of the splicing events or transcript isoforms are generally described in terms of a percentage or proportion spliced-in (PSI) and differential splicing is given in terms of the difference of these relative abundances, or ΔPSI, between conditions [13, 14]. PSI values estimated from RNA-seq data have shown a good agreement with independent experimental measurements, and the magnitude of ΔPSI represents a good indicator of biological relevance [10, 15]. However, despite the multiple improvements achieved by recent RNA-seq analysis methods, many challenges remain unresolved. These include the limitations in processing time for current methods, the computational and storage capacity required, as well as the constraints in the number of sequencing reads needed to achieve high enough accuracy.
An additional challenge for RNA-seq analysis is the lack of robust methods to account for biological variability between replicates or to perform meaningful analyses of differential splicing across multiple conditions. Although many methods assess the estimation uncertainty of the splicing event or transcript isoforms [10, 11, 12], they generally do so on individual events rather than considering the genome-wide distribution. Additionally, most methods determine the significance of differential splicing by performing tests directly on read counts, leaving the selection of relevant ΔPSI values to an arbitrary cut-off. In other cases, fold changes instead of ΔPSI are given, which are even harder to interpret in terms of splicing changes.
We showed before that transcriptome quantification could be leveraged for the fast estimation of event PSI values with high accuracy compared with experimental and simulated datasets . We now present here a new method for analyzing differential splicing, SUPPA2, which builds upon these principles to address the current challenges in the study of differential splicing, and taking into account biological variability. Compared with other existing approaches for differential splicing analysis using RNA-seq data, SUPPA2 provides several advantages. SUPPA2 can work with multiple replicates per condition and with multiple conditions. Additionally, SUPPA2 estimates the uncertainty of ΔPSI values as a function of the expression of transcripts involved in the event, taking into account all events genome-wide to test the significance of an observed ΔPSI, thereby directly estimating the biological relevance of the splicing change without relying on arbitrary ΔPSI cut-offs. Moreover, SUPPA2 incorporates the possibility to perform clustering of differentially spliced events across multiple conditions to identify groups of events with similar splicing patterns and common regulatory mechanisms. In conclusion, SUPPA2 enables cost-effective use of RNA-seq for the robust and streamlined analysis of differential splicing across multiple biological conditions. The software described here is available at https://github.com/comprna/SUPPA.
SUPPA2 monitors uncertainty to determine differential splicing
We compared SUPPA2 results with three other methods that calculate differential splicing using multiple replicates per condition: rMATS  and MAJIQ , which describe changes in terms of ΔPSI, and DEXSeq , which uses fold changes. Importantly, we found that SUPPA2 was much faster than the other methods, devoting 24 s to the PSI quantification and about 32 min and 47 s for differential splicing analysis on the same datasets (Fig. 1c). Since SUPPA2 performs the significance test directly on the ΔPSI values without needing to go back to the read data, it hence provides unmatched speed for differential splicing analysis. Comparing the results obtained with each method (Additional file 1: Figure S1), we observed that rMATS and DEXSeq detect many apparently significant events with small inclusion changes that are not distinguishable from the variability between biological replicates, whereas SUPPA2 and MAJIQ separate well these two distributions. As SUPPA2 exploits the between-replicate variability to test for significance, it avoids the use of an arbitrary global |ΔPSI| threshold to identify biologically relevant events and detects significant events across a wide range of gene expression values (Additional file 1: Figure S1). This feature of SUPPA2 should hence better rationalize |ΔPSI| threshold cut-offs.
SUPPA2 provides high accuracy at low sequencing depth and with short read lengths
To test the accuracy of SUPPA2 with different sequencing settings and compare it with other methods, we simulated 277 exon-cassette (SE) events and 318 alternative splice site (A5/A3) events with |ΔPSI| > 0.2 between two conditions with three replicates per condition (Additional file 1: Figure S2a). To perform a balanced comparison, we considered the same number of negative controls, consisting of different SE and A5/A3 events with arbitrary PSI values but with no simulated change between conditions (Additional file 2: Table S1; “Methods”). We simulated genome-wide RNA-seq reads using RSEM  at different sequencing depths (120, 60, 25, 10, and 5 million (M) 100-nucleotide (nt) paired-end reads per sample) and for different read lengths (100, 75, 50, and 25 nt at a fixed depth of 25 M paired-end reads). Despite the differences in the numbers and length of the reads (Additional file 2: Table S2), the genes containing the positive and negative events used for benchmarking showed similar distributions of expression values at all depths and read lengths (Additional file 1: Figure S2b). We then calculated differentially spliced events with SUPPA2, rMATS, MAJIQ, and DEXSeq and evaluated the detection rate and accuracy on the simulated events (Additional file 2: Table S3).
We also considered an unbalanced configuration where one replicate had 120 M reads and the other two replicates had 10 M reads. In this hybrid configuration, SUPPA2 recovered a high number of events and a high number of true positives for SE events. On the other hand, for A5/A3 events we observed a slight drop in accuracy (Additional file 2: Table S3), probably due to a high proportion of short variable regions in the alternative sites events (79 events (25%) of the A5/A3 events involved a region of under 9 nt), which may be more problematic for correct transcript quantification than using direct mapping to splice junctions. Importantly, although MAJIQ showed a high detection rate and accuracy in the unbalanced configuration, it had to be run with specialized parameters (“Methods”), whereas SUPPA2 was run in the same way for all cases. Additionally, SUPPA2 also showed high correlation values between the predicted and simulated ΔPSI values (Additional file 2: Table S3), and similar to those obtained with rMATS and MAJIQ. In light of these results, we can conclude that SUPPA2 performs comparably to other methods under a wide spectrum of sequencing conditions and, in particular, it outperforms other methods at low sequencing depth and short read length.
SUPPA2 provides accurate splicing change quantification compared with experimental results
SUPPA2 identifies experimentally reproducible splicing changes not detected by other methods
The results described above suggest a general agreement between the different methods in the detection of significant differentially spliced events. To assess this question, we performed a direct comparison of the results obtained from the four methods, SUPPA2, rMATS, MAJIQ, and DEXSeq, using the same RNA-seq data for the knockdown of TRA2A and TRA2B compared with controls . Since exon-cassette (SE; 48.71%) and alternative splice site (A5/A3; 37.71%) events are the most frequent events in humans compared to mutual exclusion (6.22%) or intron-retention (7.36%), we decided to match SE and A5/A3 events across all four methods. We were able to identify 7116 SE events and 2924 A5/A3 events unambiguously detected by all four methods, i.e., they were measured and tested for significance by all methods (Additional file 1: Figure S4a; Additional file 2: Table S12; “Methods”).
For the 7116 SE events, each method found between 133 and 274 events to be significant, with 370 events predicted as significant by any one method, but only 22 events predicted by all four methods (Additional file 1: Figure S4a). Similarly, 352 A5/A3 events were predicted to be significant by at least one method, and only two predicted by all four methods (Additional file 1: Figure S4a). Events detected by more methods tended to have higher ΔPSI values (Additional file 1: Figure S4b) and covered a smaller range of gene expression values (Additional file 1: Figure S4c). Despite the low detection overlap, the significant events predicted by each method independently showed enrichment of TRA2B CLIP tags and of Tra2 binding motifs (Additional file 2: Table S13; Additional file 3: Supplementary methods); hence, each set independently had the expected properties related to the knockdown experiment. It is possible that each method describes a different subset of changes and generally misses others. To seek further support for this point, we selected for experimental validation 15 SE events and seven A3 events that had CLIP tags and Tra2 motifs nearby the regulated exon. The seven A3 events and six of the 15 SE events were predicted only by SUPPA2, whereas the remaining nine were not predicted by any of the four methods, but were significant according to SUPPA2 before multiple test correction (Additional file 2: Table S14). From these 15 SE events, five only showed one PCR band and could not be evaluated. However, for the rest, seven changed significantly according to the RT-PCR (two-tailed t-test p value < 0.05), with six of them changing in the same direction predicted by SUPPA2. Overall, nine events changed in the same direction as predicted (Fig. 3c; Additional file 2: Table S14). In particular, we validated a new event in EML4 (Fig. 3d), a gene involved in cancer through a fusion with ALK that is not present in MDA-MB-231 cells . In addition, we could measure six of the seven A3 events; all were measured to change in the same direction as predicted by SUPPA2 and four were significant (two-tailed t-test p value < 0.05; Additional file 2: Table S14). This analysis shows the value of using a suite of methods based on different algorithms, like SUPPA2, to reveal novel experimentally reproducible events that are missed by other methods.
SUPPA2 finds biologically relevant event clusters across multiple conditions
SUPPA2 is also able to analyze multiple conditions by computing the pairwise differential splicing between conditions, and can detect groups of events with similar splicing patterns across conditions using density-based clustering (“Methods”). To evaluate the ability of SUPPA2 to cluster events, we analyzed a 4-day time-course of differentiation of human induced pluripotent stem cells (iPSCs) into bipolar neurons , which had not been analyzed yet for alternative splicing. SUPPA2 identified 2780 regulated cassette events (p value < 0.05), out of which 207 (8.4%) were microexons (length < 28 nt), which represent an enrichment (Fisher’s exact test p value < 2.2e-16, odds ratio = 3.94) compared to a set of 20,452 non-regulated cassette events (p value > 0.1), with the majority of these microexons (69%) significantly more included in differentiated cells (ΔPSI > 0 and p value < 0.05 between the first and fourth day).
To further validate the findings with SUPPA2, we performed a motif enrichment analysis in regulated events compared to non-regulated events. Notably, compared to the non-regulated events, the 2780 regulated cassette events showed enrichment in binding motifs for the RNA binding protein (RBP) SFPQ (z-score > 4), which has been described before as a necessary factor for neuronal development . Additionally, the differentially spliced events in clusters were enriched in, among others, CELF, RBFOX, ESRP, MBNL, and SRRM4 motifs (Fig. 4d–f), in concordance with the described role of CELF, RBFOX, and SRRM4 genes in neuronal differentiation [23, 25, 26, 27]. Consistent with these findings, SRRM4 and members of the CELF and RBFOX families showed upregulation at the initial steps of iPSC differentiation into neurons (Additional file 1: Figure S5; Additional file 2: Table S17). On the other hand, CELF5 and ESRP1 were downregulated during differentiation. The MBNL3 gene showed initial upregulation at stage 1, followed by downregulation at later stages (Additional file 1: Figure S5; Additional file 2: Table S17). Notably, we found that only the cluster enriched in microexon splicing inclusion showed an enrichment of SRRM4 motifs upstream of the regulated exons, in agreement with the previous description of SRRM4 binding upstream of microexons to regulate their inclusion during neuronal differentiation , and further supports the specificity of SRRM4 to regulate microexons. Our results also suggest possible novel regulators of neuronal differentiation, such as the MBNL proteins in the regulation of events increasing exon inclusion and ESRP in events that decrease exon inclusion (Fig. 4d–f).
We also used SUPPA2 to analyze differential splicing across five stages of erythroblast differentiation . In this case we considered all event types for clustering. For the optimal value of maximum reachability distance (S = 0.1), we obtained two homogeneous and well-differentiated clusters (silhouette score = 0.91), one for events with low PSI that increased at the last differentiation stage with 149 events, and a second cluster with 86 events that showed the opposite behavior (Additional file 1: Figure S6). In agreement with previous results , we observed an enrichment of intron retention events in the cluster of events that increased inclusion at the late differentiation stage, as compared with the other cluster, which does not include any retained intron (Fisher’s exact test p value = 0.04958). We conclude that SUPPA2 provides a powerful approach to analyze splicing across multiple conditions, validated not only by intrinsic measures of clustering consistency, but also by recovering known biological results and new features.
Our extensive evaluations here indicate that SUPPA2 provides a broadly applicable solution to current challenges in the analysis of differential splicing from RNA sequencing data across multiple conditions, and has features that will make it attractive to many potential users. SUPPA2 is faster than other methods and maintains a high accuracy, especially at low sequencing depth and for short read length. Despite using less reads or shorter reads, SUPPA2 could detect the majority of the simulated events and maintained a high proportion of true positives and low proportion of false positives. SUPPA2 thus offers an unprecedented opportunity to study splicing in projects with limited budgets, or to reuse for splicing studies available sequencing datasets with lower depth than usually required by other methods. Additionally, the low computing and storage requirements of SUPPA2 makes it possible to perform fast differential splicing processing and clustering analysis on a laptop. Thus, coupled with fast methods for transcript quantification [30, 31, 32], SUPPA2 facilitates the study of alternative splicing across multiple conditions without the need for large computational resources. The simplicity and modular architecture of SUPPA2 also makes it a very convenient tool in multiple contexts, as PSI values from other methods and for other event types, like complex events, or data types, like transcripts, can be used in SUPPA2 for differential splicing analysis or for clustering across conditions.
According to our simulated benchmarking analysis, as well as others published before, it may seem that bioinformatics methods used to analyze RNA-seq data tend to coincide on a large number of events. However, using real experimental data we actually observed low agreement in targets between methods. These discrepancies in target selection can be explained by various factors, including the different ways in which a splicing change is represented by each method (e.g., an event, an exon, or a graph), how changes in splicing patterns are tested by each method, and how biological and experimental variability affects these tests. Intriguingly, the results from each method do make sense biologically, in that differentially spliced events were enriched in motifs and mapped protein–RNA interaction sites related to the depleted splicing factor. This makes it unlikely that any one method provides a clear advantage in terms of the results, and instead suggests that at least two or three methods should be used to identify all the possible significant splicing variants between different conditions. In particular, we chose for comparison three other methods with very different representations of the splicing and statistical approach. The results we obtained recommend use of two or more such tools to comprehensively monitor splicing complexity by picking out different sets of events that would not otherwise be discovered, rather than identifying largely overlapping groups of events. Supporting this point we could validate experimentally events not predicted by any other methods but predicted by SUPPA2. We further observed that although most methods had the power to identify small significant ΔPSI values, different methods tended to agree on events with large splicing changes. Importantly, a fraction of these significant events with small ΔPSI are indistinguishable from the variability observed between replicates and hence are not likely to be biologically relevant. SUPPA2 also performs a statistical test that can separate significant splicing changes from the biological variability, providing thus an advantage to identify biologically relevant changes across a wide range of expression values. By exploiting the biological variability, without having to go back to the read data, SUPPA2 provides a fast and accurate way to detect differential splicing without the need for arbitrary global ΔPSI thresholds.
Although SUPPA2 relies on genome annotation to define events, poorly annotated genomes can be improved and extended before analysis by SUPPA2. In fact, recent analyses have shown that improved annotations lead to significantly better PSI estimates from RNA-seq when benchmarked to high-resolution RT-PCR measurements [33, 34, 35]. Current technological trends predict an increase in the number of efforts to improve the transcriptome annotation in multiple species and conditions . In this direction, SUPPA2 could play a key role for the systematic and rapid genome-wide analysis of splicing following annotation and sample updates.
The speed, modularity, and accuracy of SUPPA2 enable cost-effective use of RNA sequencing for the robust and streamlined analysis of differential splicing across multiple biological conditions.
Here we implicitly assume that the background distribution is symmetric. SUPPA2 includes an option to correct for multiple testing using the Benjamini-Hochberg method across all events from the same gene, as they cannot be considered to be entirely independent of each other, for which the false discovery rate (FDR) cut-off can be given as input.
SUPPA2 currently implements two density-based clustering methods: DBSCAN  and OPTICS . Density-based clustering has the advantage that one does not need to specify the expected number of clusters, and the choice between the two methods depends mainly on the computational resources and the amount of data. Both methods use the vectors of mean PSI values per event and require as input the minimum number of events in a cluster (N), which could be interpreted as the minimum expected size of the regulatory modules. OPTICS also requires the maximum reachability distance (S), which represents the maximum distance in PSI space of an event to a cluster. On the other hand, DBSCAN requires as input the maximum distance to consider two events as cluster partners (D), which OPTICS calculates through an optimization procedure allowing any value below S. DBSCAN allows simple and fast data partitioning but has the drawback of being sensitive to the input parameters. On the other hand, OPTICS, which can be seen as a generalization of DBSCAN, explores the possible maximum values for D beyond which clustering quality drops. OPTICS can thus potentially produce better clustering results since it is not limited to a fixed radius of clustering, but it is penalized by a greater computational cost. Clustering is performed only with events that change significantly in at least one pair of adjacent conditions. Three different distance metrics can be currently used: Euclidean, Manhattan, and Cosine. Cluster qualities are reported using the silhouette score , which indicates how well the events are assigned to clusters, and the root mean square standard deviation (RMSSTD), which measures the homogeneity of each cluster. Additionally, the number and percentage of events in clusters are also reported. Motif enrichment analysis was performed as before  using MOSEA, available at https://github.com/comprna/MOSEA. Further details on the motif enrichment and analysis of differential expression are provided in Additional file 3: Supplementary material.
where TPM1 and TPM2 are the abundances for the two transcripts in the gene, given in TPM units. For the negative set, we took an equal number of events without exchanging their TPM values. These negative events had a gene expression distribution and a distribution of transcript relative abundance similar to the positive events, and an expected variability between conditions similar to the variability between biological replicates. We used RSEM  to simulate sequencing reads for the two conditions, three replicates each, at various depths (120, 60, 25, 10 and 5 M 100-nt paired-end reads per sample) and at various read lengths (100, 75, 50, and 25 nt, at a depth of 25 M paired-end reads) (Additional file 2: Tables S1–S3). Further details of the simulations are given in the Additional file 3:Supplementary material. Datasets and commands to reproduce these simulations are available at https://github.com/comprna/SUPPA_supplementary_data.
We analyzed RNA-seq data for the double knockdown of TRA2A and TRA2B in MDA-MB-231 cells and controls with three replicates per condition  (GSE59335). For benchmarking, we used 83 RT-PCR validated events for comparison (Additional file 2: Tables S4 and S5) and 44 RT-PCR negative events (Additional file 2: Tables S12 and S13). We also analyzed data from cerebellum and liver mouse tissues covering eight different time points from two full circadian cycles  (GSE54651) and performed a comparison with 50 events validated by RT-PCR  comparing samples CT28, CT40, and CT52 in cerebellum with the same circadian time points in liver (Additional file 2: Tables S8 and S9). We also analyzed RNA-seq data for stimulated and unstimulated Jurkat T cells and compared them with RT-PCR validated events (no tested replicates) [9, 41] (SRP059357; Additional file 2: Tables S10 and S11). From these 54 RT-PCR validated events, we only used the 30 events that had experimental value |ΔPSI| > 0.05. For the study of multiple conditions, we used RNA-seq samples from a 4-day time-course for the differentiation of human iPSCs into bipolar neurons  (GSE60548). Original data were for days 0, 1, 3, and 4 after initiation of differentiation. Additionally, we analyzed RNA-seq from five steps of differentiating human erythroblasts  (GSE53635), with three replicates per condition. RNA-seq reads from all experiments were used to quantify human and mouse transcripts from Ensembl (version 75, without pseudogenes) with Salmon . Reads were mapped to the human (hg19) or mouse (mm10) genomes using TopHat . All methods other than SUPPA2 were used with these mappings. Cassette events from SUPPA2 and rMATS were matched to the RT-PCR validated events in each dataset, considering only those cases where the middle exon matched exactly the validated exons and confirming the flanking exons with the RT-PCR primers when available. Ambiguous matches were discarded from the comparison. For MAJIQ we selected the inclusion junction compatible with the validated event that had the largest posterior probability for |ΔPSI| > 0.1. For DEXSeq we considered only exonic regions that matched exactly with the regulated exon of the experimentally validated cassette event. To select a set of cassette events common to all four methods, we selected the events measured by both SUPPA2 and rMATS such that the middle exon matched exactly a DEXSeq exonic region and did not appear in more than one event from SUPPA2 or rMATS. From this set, we selected those for which any of the two inclusion junctions was present in MAJIQ, and selected the junction with the largest posterior probability for |ΔPSI| > 0.1. Further details are provided in Additional file 3: Supplementary material.
Running time was measured using the Unix time command time. For SUPPA2 running time was measured independently of the transcript quantification step. Similarly, for all other methods the running time did not include the read-mapping step. Time was measured independently for PSI calculation and for differential splicing analysis. All methods were run on a Unix machine with 12 Gb of RAM and eight Intel Xeon 2-GHz CPU cores.
Details on the experimental validation are given in Additional file 3: Supplementary material.
Software and datasets
SUPPA2 is available at https://github.com/comprna/SUPPA.
Commands and datasets used in this work are available at https://github.com/comprna/SUPPA_supplementary_data.
Software for the motif enrichment analysis is available at https://github.com/comprna/MOSEA.
The authors thank C. Calixto, J. Brown, R. Zhang, M. Irimia, and N. Barbosa-Morais for useful discussions and J. Vaquero-Garcia and Y. Barash for comments on an earlier version of the manuscript.
This work was supported by the MINECO and FEDER with grants BIO2014–52566-R and BIO2017–85364-R, by AGAUR with grants SGR2014–1121 and SGR2017–1020, by BBSRC (BB/P006612/1), and by Breast Cancer Now (2014NovPR355). GH is a BBSRC-funded PhD student.
Availability of data and materials
Datasets used in this manuscript are accessible at the Gene Expression Omnibus (GEO) under accessions GSE59335 , GSE54651 , GSE60548 , and GSE53635 . SUPPA (version 2.2 ) is open source under the MIT license (OSI-compliant), is implemented in python3.4, and is freely available at https://github.com/comprna/SUPPA.
JCE, MS, JLT, and EE designed and implemented the method and algorithms, JCE, JLT, and EE devised the analyses, and JCE, JLT, and MS carried out the benchmarking analyses. GH and DJE produced the datasets related to the double knockdown of TRA2A and TRA2B and performed the validation experiments. BS developed the software for motif enrichment analysis. JLT, JCE, and EE wrote the manuscript with essential input from GH and DJE. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Ethics approval was not applicable for this study.
The authors declare that they have no competing interests.
- 1.Lee Y, Rio DC. Mechanisms and regulation of alternative pre-mRNA splicing. Annu Rev Biochem. 2015;84:291–323. https://www.ncbi.nlm.nih.gov/pubmed/25784052.
- 2.Alamancos GP, Agirre E, Eyras E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol Biol. 2014;1126:357–97. https://www.ncbi.nlm.nih.gov/pubmed/24549677.
- 3.Lahat A, Grellscheid SN. "Differential mRNA Alternative Splicing." In Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Cham: Springer; 2016. p. 105-119. https://link.springer.com/chapter/10.1007/978-3-319-31350-4_5.
- 7.Froussios K, Mourão K, Schurch NJ, Barton GJ. Identifying differential isoform abundance with RATs: a universal tool and a warning. bioRxiv. 2017. p.132761.Google Scholar
- 8.Hu Y, Huang Y, Du Y, Orellana CF, Singh D, Johnson AR, et al. DiffSplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res. 2013;41(2):e39. https://www.ncbi.nlm.nih.gov/pubmed/23155066.
- 9.Vaquero-Garcia J, Barrera A, Gazzara MR, González-Vallinas J, Lahens NF, Hogenesch JB, et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. elife. 2016;5:e11752. http://www.ncbi.nlm.nih.gov/pubmed/26829591 CrossRefPubMedPubMedCentralGoogle Scholar
- 15.Venables JP, Brosseau J-P, Gadea G, Klinck R, Prinos P, Beaulieu J-F, et al. RBFOX2 is an important regulator of mesenchymal tissue-specific splicing in both normal and cancer tissues. Mol Cell Biol. 2013;33:396–405. https://www.ncbi.nlm.nih.gov/pubmed/23149937.
- 16.Alamancos GP, Pagés A, Trincado JL, Bellora N, Eyras E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA. 2015;21:1521–31. https://www.ncbi.nlm.nih.gov/pubmed/26179515.
- 20.Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31. https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf.Google Scholar
- 26.Raj B, Irimia M, Braunschweig U, Sterne-Weiler T, O’Hanlon D, Lin ZY, et al. A global regulatory mechanism for activating an exon network required for neurogenesis. Mol Cell. 2014;56:90–103. https://www.ncbi.nlm.nih.gov/pubmed/25219497.
- 31.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017; http://www.ncbi.nlm.nih.gov/pubmed/28263959
- 33.Brown JWS, Calixto CPG, Zhang R. High-quality reference transcript datasets hold the key to transcript-specific RNA-sequencing analysis in plants. New Phytol. 2016; http://www.ncbi.nlm.nih.gov/pubmed/27659901
- 34.Zhang R, Calixto CPG, Tzioutziou NA, James AB, Simpson CG, Guo W, et al. AtRTD—a comprehensive reference transcript dataset resource for accurate quantification of transcript-specific expression in Arabidopsis thaliana. New Phytol. 2015;208:96–101. https://www.ncbi.nlm.nih.gov/pubmed/26111100.
- 35.Zhang R, Calixto CPG, Marquez Y, Venhuizen P, Tzioutziou NA, Guo W, et al. A high quality Arabidopsis transcriptome for accurate transcript-level analysis of alternative splicing. Nucleic Acids Res. 2017;45:5061–73. http://www.ncbi.nlm.nih.gov/pubmed/28402429 CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat Methods. 2018;15(3):201-6. https://www.ncbi.nlm.nih.gov/pubmed/29334379.
- 38.Sebestyén E, Singh B, Miñana B, Pagès A, Mateo F, Pujana MA, et al. Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks. Genome Res. 2016;26:732–44. http://www.ncbi.nlm.nih.gov/pubmed/27197215 CrossRefPubMedPubMedCentralGoogle Scholar
- 41.Cole BS, Tapescu I, Allon SJ, Mallory MJ, Qiu J, Lake RJ, et al. Global analysis of physical and functional RNA targets of hnRNP L reveals distinct sequence and epigenetic features of repressed and enhanced exons. RNA. 2015;21:2053–66. http://www.ncbi.nlm.nih.gov/pubmed/26437669 CrossRefPubMedPubMedCentralGoogle Scholar
- 43.Best A, James K, Dalgliesh C, Hong E, Kheirolahi-Kouhestani M, Curk T, Xu Y, Danilenko M, Hussain R, Keavney B, Wipat A, Klinck R, Cowell I, Lee KC, Austin C, Venables JP, Chabot B SKM, Tyson-Capper A, et al. Investigation into human Tra2 protein-dependent splicing in MDA-MB-231 cells using iCLIP and RNA-seq. Gene Expression Ommibus (GEO). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59335.
- 44.Zhang R, Lahens NF, Ballance HI, Hughes ME HJ. A circadian gene expression atlas in mammals assayed by RNA-seq. Gene Expression Ommibus (GEO). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54651.
- 45.Buskamp V LN. Rapid neurogenesis through transcriptional activation in human stem cell. Gene Expression Ommibus (GEO). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60548.
- 46.Pimentel H, Parra M, Gee S, Ghanem D, An X, Li J, Mohandas N, Pachter L CJ. RNA-seq analysis of differentiating human erythroblasts. Gene Expression Ommibus (GEO). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53635.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.