CLIIQ: Accurate Comparative Detection and Quantification of Expressed Isoforms in a Population

  • Yen-Yi Lin
  • Phuong Dao
  • Faraz Hach
  • Marzieh Bakhshi
  • Fan Mo
  • Anna Lapuk
  • Colin Collins
  • S. Cenk Sahinalp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7534)

Abstract

The recently developed RNA-Seq technology provides a high-throughput and reasonably accurate way to analyze the transcriptomic landscape of a tissue. Unfortunately, from a computational perspective, identification and quantification of a gene’s isoforms from RNA-Seq data remains to be a non-trivial problem. We propose CLIIQ, a novel computational method for identification and quantification of expressed isoforms from multiple samples in a population. Motivated by ideas from compressed sensing literature, CLIIQ is based on an integer linear programming formulation for identifying and quantifying ”the most parsimonious” set of isoforms. We show through simulations that, on a single sample, CLIIQ provides better results in isoform identification and quantification to alternative popular tools. More importantly, CLIIQ has an option to jointly analyze multiple samples, which significantly outperforms other tools in both isoform identification and quantification.

Keywords

Isoform Identification Isoform Quantification RNA-Seq Transcriptomics Integer Linear Programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Method 8(6), 469–477 (2011)CrossRefGoogle Scholar
  2. 2.
    Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B.W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., Regev, A.: Full-length transcriptome assembly from RNA-seq data without a reference genome. Nature Biotechnology 29(7), 644–652 (2011)CrossRefGoogle Scholar
  3. 3.
    Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S.D., Mungall, K., Lee, S., Okada, H.M., Qian, J.Q., Griffith, M., Raymond, A., Thiessen, N., Cezard, T., Butterfield, Y.S., Newsome, R., Chan, S.K., She, R., Varhol, R., Kamoh, B., Prabhu, A.L., Tam, A., Zhao, Y., Moore, R.A., Hirst, M., Marra, M.A., Jones, S.J.M., Hoodless, P.A., Birol, I.: De novo assembly and analysis of RNA-seq data. Nat. Meth. 7(11), 909–912 (2010)CrossRefGoogle Scholar
  4. 4.
    Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnology 28(5), 503–510 (2010)CrossRefGoogle Scholar
  5. 5.
    Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotech. 28(5), 511–515 (2010)CrossRefGoogle Scholar
  6. 6.
    Li, W., Feng, J., Jiang, T.: IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly (Extended Abstract). In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 168–188. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Li, J.J., Jiang, C.R., Brown, J.B., Huang, H., Bickel, P.J.: Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proceedings of the National Academy of Sciences 108(50), 19867–19872 (2011)CrossRefGoogle Scholar
  8. 8.
    Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25(9), 1105–1111 (2009)CrossRefGoogle Scholar
  9. 9.
    Au, K.F., Jiang, H., Lin, L., Xing, Y., Wong, W.H.: Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Research 38(14), 4570–4578 (2010)CrossRefGoogle Scholar
  10. 10.
    Wang, K., Singh, D., Zeng, Z., Coleman, S.J., Huang, Y., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., Liu, J.: MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research 38(18), e178 (2010)Google Scholar
  11. 11.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)MathSciNetMATHGoogle Scholar
  12. 12.
    Hormozdiari, F., Hajirasouliha, I., McPherson, A., Eichler, E.E., Sahinalp, S.C.: Simultaneous structural variation discovery among multiple paired-end sequenced genomes. Genome Research 21(12), 2203–2212 (2011)CrossRefGoogle Scholar
  13. 13.
    Rozov, R., Halperin, E., Shamir, R.: MGMR: leveraging RNA-Seq population data to optimize expression estimation. BMC Bioinformatics 13(suppl. 6), S2 (2012)Google Scholar
  14. 14.
    Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research 36(16), e105 (2008)Google Scholar
  15. 15.
    CLIIQ Supplementary Material (2012), http://compbio.cs.sfu.ca/publications/CLIIQSup.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yen-Yi Lin
    • 1
  • Phuong Dao
    • 1
  • Faraz Hach
    • 1
  • Marzieh Bakhshi
    • 1
  • Fan Mo
    • 2
  • Anna Lapuk
    • 2
  • Colin Collins
    • 2
  • S. Cenk Sahinalp
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.Vancouver Prostate Centre & Department of Urologic SciencesUniversity of British ColumbiaVancouverCanada

Personalised recommendations