Skip to main content

A Novel Combinatorial Method for Estimating Transcript Expression with RNA-Seq: Bounding the Number of Paths

  • Conference paper
Algorithms in Bioinformatics (WABI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8126))

Included in the following conference series:

Abstract

RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. The problem is usually modeled by a weighted splicing graph whose nodes stand for exons and whose edges stand for split alignments to the exons. The task consists of finding a number of paths, together with their expression levels, which optimally explain the coverages of the graph under various fitness functions, such least sum of squares. In (Tomescu et al. RECOMB-seq 2013) we showed that under general fitness functions, if we allow a polynomially bounded number of paths in an optimal solution, this problem can be solved in polynomial time by a reduction to a min-cost flow program. In this paper we further refine this problem by asking for a bounded number k of paths that optimally explain the splicing graph. This problem becomes NP-hard in the strong sense, but we give a fast combinatorial algorithm based on dynamic programming for it. In order to obtain a practical tool, we implement three optimizations and heuristics, which achieve better performance on real data, and similar or better performance on simulated data, than state-of-the-art tools Cufflinks, IsoLasso and SLIDE. Our tool, called Traph, is available at http://www.cs.helsinki.fi/gsa/traph/ .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alamancos, G.P., Agirre, E., Eyras, E.: Methods to study splicing from high-throughput RNA Sequencing data. CoRR abs/1304.5952 (2013)

    Google Scholar 

  2. Bernard, E., et al.: Efficient RNA Isoform Identification and Quantification from RNA-Seq Data with Network Flows. SU2C-AACR-DT0409; SES-0835531; CCF-0939370

    Google Scholar 

  3. Brett, D., et al.: Alternative splicing and genome complexity. Nature Genetics 30(1), 29–30 (2001)

    Article  Google Scholar 

  4. Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Guttman, M., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28(5), 503–510 (2010)

    Article  Google Scholar 

  6. Heber, S., et al.: Splicing graphs and EST assembly problem. Bioinformatics 18(suppl. 1), S181–S188 (2002)

    Google Scholar 

  7. Heijden, V.D., et al.: Estimating the size of a criminal population from police records using the truncated poisson regression model. Statistica Neerlandica 57(3), 289–304 (2003)

    Article  MathSciNet  Google Scholar 

  8. Hiller, D., et al.: Simultaneous Isoform Discovery and Quantification from RNA-Seq., pp. 1–19 (2012)

    Google Scholar 

  9. Li, J.J., et al.: Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc. of the National Academy of Sciences 108(50), 19867–19872 (2011)

    Article  Google Scholar 

  10. Li, T., Jiang, R., Zhang, X.: Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard. CoRR abs/1305.0916 (2013)

    Google Scholar 

  11. Li, W., et al.: IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J. Comput. Biol. 18(11), 1693–1707 (2011)

    Article  MathSciNet  Google Scholar 

  12. Lin, Y.-Y., Dao, P., Hach, F., Bakhshi, M., Mo, F., Lapuk, A., Collins, C., Sahinalp, S.C.: CLIIQ: Accurate Comparative Detection and Quantification of Expressed Isoforms in a Population. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 178–189. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Mangul, S., et al.: An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads. In: Ranka, S., et al. (eds.) BCB, pp. 369–376. ACM (2012)

    Google Scholar 

  14. Maniatis, T., Tasic, B.: Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418(6894), 236–243 (2002)

    Article  Google Scholar 

  15. McIntyre, L., et al.: RNA-seq: technical variability and sampling. BMC Genomics 12(1), 293 (2011)

    Google Scholar 

  16. Mezlini, A.M., et al.: iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Research 23(3), 519–529 (2012)

    Article  Google Scholar 

  17. Mortazavi, A., et al.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621–628 (2008)

    Article  Google Scholar 

  18. Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nature Reviews. Genetics 12(2), 87–98 (2011)

    Article  Google Scholar 

  20. Pepke, S., Wold, B., Mortazavi, A.: Computation for ChIP-seq and RNA-seq studies. Nature Methods 6(11), s22–s32 (2009)

    Google Scholar 

  21. Tomescu, A.I., Kuosmanen, A., Rizzi, R., Mäkinen, V.: A Novel Min-Cost Flow Method for Estimating Transcript Expression with RNA-Seq. BMC Bioinformatics 14(suppl. 5), S15 (2013), Presented at RECOMB-Seq, Beijing, China (2013)

    Google Scholar 

  22. Trapnell, C., et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)

    Article  Google Scholar 

  23. Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)

    Article  Google Scholar 

  24. Vatinlen, B., et al.: Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research 185(3), 1390–1401 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Xia, Z., et al.: NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq. BMC Bioinformatics 12(1), 162 (2011)

    Google Scholar 

  26. Xing, Y., et al.: The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 14(3), 426–441 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tomescu, A.I., Kuosmanen, A., Rizzi, R., Mäkinen, V. (2013). A Novel Combinatorial Method for Estimating Transcript Expression with RNA-Seq: Bounding the Number of Paths. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40453-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40452-8

  • Online ISBN: 978-3-642-40453-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics