Abstract
RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. The problem is usually modeled by a weighted splicing graph whose nodes stand for exons and whose edges stand for split alignments to the exons. The task consists of finding a number of paths, together with their expression levels, which optimally explain the coverages of the graph under various fitness functions, such least sum of squares. In (Tomescu et al. RECOMB-seq 2013) we showed that under general fitness functions, if we allow a polynomially bounded number of paths in an optimal solution, this problem can be solved in polynomial time by a reduction to a min-cost flow program. In this paper we further refine this problem by asking for a bounded number k of paths that optimally explain the splicing graph. This problem becomes NP-hard in the strong sense, but we give a fast combinatorial algorithm based on dynamic programming for it. In order to obtain a practical tool, we implement three optimizations and heuristics, which achieve better performance on real data, and similar or better performance on simulated data, than state-of-the-art tools Cufflinks, IsoLasso and SLIDE. Our tool, called Traph, is available at http://www.cs.helsinki.fi/gsa/traph/ .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alamancos, G.P., Agirre, E., Eyras, E.: Methods to study splicing from high-throughput RNA Sequencing data. CoRR abs/1304.5952 (2013)
Bernard, E., et al.: Efficient RNA Isoform Identification and Quantification from RNA-Seq Data with Network Flows. SU2C-AACR-DT0409; SES-0835531; CCF-0939370
Brett, D., et al.: Alternative splicing and genome complexity. Nature Genetics 30(1), 29–30 (2001)
Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)
Guttman, M., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28(5), 503–510 (2010)
Heber, S., et al.: Splicing graphs and EST assembly problem. Bioinformatics 18(suppl. 1), S181–S188 (2002)
Heijden, V.D., et al.: Estimating the size of a criminal population from police records using the truncated poisson regression model. Statistica Neerlandica 57(3), 289–304 (2003)
Hiller, D., et al.: Simultaneous Isoform Discovery and Quantification from RNA-Seq., pp. 1–19 (2012)
Li, J.J., et al.: Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc. of the National Academy of Sciences 108(50), 19867–19872 (2011)
Li, T., Jiang, R., Zhang, X.: Isoform reconstruction using short RNA-Seq reads by maximum likelihood is NP-hard. CoRR abs/1305.0916 (2013)
Li, W., et al.: IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J. Comput. Biol. 18(11), 1693–1707 (2011)
Lin, Y.-Y., Dao, P., Hach, F., Bakhshi, M., Mo, F., Lapuk, A., Collins, C., Sahinalp, S.C.: CLIIQ: Accurate Comparative Detection and Quantification of Expressed Isoforms in a Population. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 178–189. Springer, Heidelberg (2012)
Mangul, S., et al.: An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads. In: Ranka, S., et al. (eds.) BCB, pp. 369–376. ACM (2012)
Maniatis, T., Tasic, B.: Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418(6894), 236–243 (2002)
McIntyre, L., et al.: RNA-seq: technical variability and sampling. BMC Genomics 12(1), 293 (2011)
Mezlini, A.M., et al.: iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Research 23(3), 519–529 (2012)
Mortazavi, A., et al.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621–628 (2008)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)
Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nature Reviews. Genetics 12(2), 87–98 (2011)
Pepke, S., Wold, B., Mortazavi, A.: Computation for ChIP-seq and RNA-seq studies. Nature Methods 6(11), s22–s32 (2009)
Tomescu, A.I., Kuosmanen, A., Rizzi, R., Mäkinen, V.: A Novel Min-Cost Flow Method for Estimating Transcript Expression with RNA-Seq. BMC Bioinformatics 14(suppl. 5), S15 (2013), Presented at RECOMB-Seq, Beijing, China (2013)
Trapnell, C., et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)
Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)
Vatinlen, B., et al.: Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths. European Journal of Operational Research 185(3), 1390–1401 (2008)
Xia, Z., et al.: NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq. BMC Bioinformatics 12(1), 162 (2011)
Xing, Y., et al.: The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 14(3), 426–441 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tomescu, A.I., Kuosmanen, A., Rizzi, R., Mäkinen, V. (2013). A Novel Combinatorial Method for Estimating Transcript Expression with RNA-Seq: Bounding the Number of Paths. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-40453-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)