Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data

  • Marius Nicolae
  • Serghei Mangul
  • Ion Măndoiu
  • Alex Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)


In this paper we present a novel expectation-maximization algorithm for inference of alternative splicing isoform frequencies from high-throughput transcriptome sequencing (RNA-Seq) data. Our algorithm exploits disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information if available. Empirical experiments on synthetic datasets show that the algorithm significantly outperforms existing methods of isoform and gene expression level estimation from RNA-Seq data. The Java implementation of IsoEM is available at


Read Length Error Fraction Line Sweep Base Quality Score Sequencing Library Preparation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anton, M., Gorostiaga, D., Guruceaga, E., Segura, V., Carmona-Saez, P., Pascual-Montano, A., Pio, R., Montuenga, L., Rubio, A.: SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biology 9(2), R46 (2008)Google Scholar
  2. 2.
    Birol, I., Jackman, S.D., Nielsen, C.B., Qian, J.Q., Varhol, R., Stazyk, G., Morin, R.D., Zhao, Y., Hirst, M., Schein, J.E., Horsman, D.E., Connors, J.M., Gascoyne, R.D., Marra, M.A., Jones, S.J.M.: De novo transcriptome assembly with ABySS. Bioinformatics 25(21), 2872–2877 (2009)CrossRefPubMedGoogle Scholar
  3. 3.
    Carninci, P., et al.: The Transcriptional Landscape of the Mammalian Genome. Science 309(5740), 1559–1563 (2005)CrossRefPubMedGoogle Scholar
  4. 4.
    Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)Google Scholar
  5. 5.
    Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucl. Acids Res. p. gkq224 (2010) (advance access)Google Scholar
  6. 6.
    Hiller, D., Jiang, H., Xu, W., Wong, W.H.: Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25(23), 3056–3059 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Jackson, B., Schnable, P., Aluru, S.: Parallel short sequence assembly of transcriptomes. BMC Bioinformatics 10(suppl. 1), S14+ (2009)Google Scholar
  8. 8.
    Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25(8), 1026–1032 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact transcriptome reconstruction from short sequence reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25 (2009)Google Scholar
  11. 11.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)CrossRefPubMedGoogle Scholar
  12. 12.
    Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods (2008)Google Scholar
  13. 13.
    Paşaniuc, B., Zaitlen, N., Halperin, E.: Accurate estimation of expression levels of homologous genes in RNA-seq experiments. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 397–409. Springer, Heidelberg (2010)Google Scholar
  14. 14.
    Richard, H., Schulz, M.H., Sultan, M., Nurnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucl. Acids Res. 38(10), e112+ (2010)CrossRefGoogle Scholar
  15. 15.
    She, Y., Hubbell, E., Wang, H.: Resolving deconvolution ambiguity in gene alternative splicing. BMC Bioinformatics 10(1), 237 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Temple, G., et al.: The completion of the Mammalian Gene Collection (MGC). Genome Research 19(12), 2324–2333 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28(5), 511–515 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marius Nicolae
    • 1
  • Serghei Mangul
    • 2
  • Ion Măndoiu
    • 1
  • Alex Zelikovsky
    • 2
  1. 1.Computer Science & Engineering DepartmentUniversity of ConnecticutStorrsUSA
  2. 2.Computer Science DepartmentGeorgia State University, University PlazaGeorgia

Personalised recommendations