Skip to main content

IsoTree: De Novo Transcriptome Assembly from RNA-Seq Reads

(Extended Abstract)

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10330))

Included in the following conference series:

Abstract

High-throughput sequencing of mRNA has made the deep and efficient probing of transcriptomes more affordable. However, the vast amounts of short RNA-seq reads make de novo transcriptome assembly an algorithmic challenge. In this work, we present IsoTree, a novel framework for transcripts reconstruction in the absence of reference genomes. Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k-mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of mixed integer linear program to build a prefix tree, called isoform tree. Each path from the root node of the isoform tree to a leaf node represents a plausible transcript candidate which will be pruned based on the information of pair-end reads. Experiments showed that IsoTree performs better in recall on both pair-end reads and single-end reads and in precision on pair-end reads compared to other leading transcript assembly programs including Cufflinks, StringTie and BinPacker.

This work is supported by National Natural Science Foundation of China under No. 61672325 and No. 61472222.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, M., Manley, J.L.: Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches. Nat. Rev. Mol. Cell Biol. 10(11), 741–754 (2009)

    Google Scholar 

  2. Wang, E.T., Sandberg, R., Luo, S., et al.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)

    Article  Google Scholar 

  3. Faustino, N.A., Cooper, T.A.: Pre-mRNA splicing and human disease. Genes Dev. 17(4), 419–437 (2003)

    Article  Google Scholar 

  4. Sveen, A., Kilpinen, S., Ruusulehto, A., et al.: Aberrant RNA splicing in cancer; expression changes and driver mutations of splicing factor genes. Oncogene 35, 2413–2427 (2015)

    Article  Google Scholar 

  5. Pertea, M., Pertea, G.M., Antonescu, C.M., et al.: StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295 (2015)

    Article  Google Scholar 

  6. Trapnell, C., Willians, B.A., Pertea, G., et al.: Transcript assembly and abundance estimation from RNA-Seq reveals throusands of new transcripts and switching among isoforms. Nat. Biotechnol. 28(5), 511–515 (2010)

    Article  Google Scholar 

  7. Guttman, M., Garber, M., Levin, J.Z., et al.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28(5), 503–510 (2010)

    Article  Google Scholar 

  8. Maretty, L., Sibbesen, J.A., Krogh, A.: Bayesian transcriptome assembly. Genome Biol. 15(10), 501 (2014)

    Article  Google Scholar 

  9. Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. J. Comput. Biol. 18(3), 305–321 (2011)

    Article  MathSciNet  Google Scholar 

  10. Li, W., Jiang, T.: IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J. Comput. Biol. 18(11), 1693–1707 (2011)

    Article  MathSciNet  Google Scholar 

  11. Tomescu, A.I., Kuosmanen, A., Rizzi, R., et al.: A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinform. 14(5), S15 (2013)

    Google Scholar 

  12. Mezlini, A.M., Smith, E.J.M., Fiume, M., et al.: iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 23(3), 519–529 (2013)

    Article  Google Scholar 

  13. Canzar, S., Andreotti, S., Weese, D., Reinert, K., Klau, G.W.: CIDANE: comprehensive isoform discovery and abundance estimation. Genome Biol. 17(1), 16 (2016)

    Article  Google Scholar 

  14. Liu, J., Yu, T., Jiang, T., et al.: TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol. 17(1), 213 (2016)

    Article  Google Scholar 

  15. Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9), 1105–1111 (2009)

    Article  Google Scholar 

  16. Kim, D., Pertea, G., Trapnell, C., et al.: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14(4), R36 (2013)

    Article  Google Scholar 

  17. Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7), 873–881 (2010)

    Article  Google Scholar 

  18. Dobin, A., Davis, C.A., Schlesinger, F., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)

    Article  Google Scholar 

  19. Au, K.F., Jiang, H., Lin, L., et al.: Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38(14), 4570–4578 (2010)

    Article  Google Scholar 

  20. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)

    Article  Google Scholar 

  21. Simpson, J.T., Wong, K., Jackman, S.D., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)

    Article  Google Scholar 

  22. Grabherr, M.G., Haas, B.J., Yassour, M., et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29(7), 644–652 (2011)

    Article  Google Scholar 

  23. Peng, Y., Leung, H.C.M., Yiu, S.M., et al.: IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29(13), i326–i334 (2013)

    Article  Google Scholar 

  24. Schulz, M.H., Zerbino, D.R., Vingron, M., et al.: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8), 1086–1092 (2012)

    Article  Google Scholar 

  25. Chang, Z., Li, G., Liu, J., et al.: Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 16(1), 30 (2015)

    Article  Google Scholar 

  26. Liu, J., Li, G., Chang, Z., et al.: BinPacker: packing-based de novo transcriptome assembly from RNA-seq data. PLoS Comput. Biol. 12(2), e1004772 (2016)

    Article  Google Scholar 

  27. Camacho, C., Coulouris, G., Avagyan, V., et al.: BLAST+: architecture and applications. BMC Bioinform. 10(1), 421 (2009)

    Article  Google Scholar 

  28. Heber, S., Alekseyev, M., Sze, S.H., et al.: Splicing graphs and EST assembly problem. Bioinformatics 18(suppl 1), S181–S188 (2002)

    Article  Google Scholar 

  29. Griebel, T., Zacher, B., Ribeca, P., et al.: Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40(20), 10073–10083 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haodi Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, J., Feng, H., Zhu, D., Zhang, C., Xu, Y. (2017). IsoTree: De Novo Transcriptome Assembly from RNA-Seq Reads. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59575-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59574-0

  • Online ISBN: 978-3-319-59575-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics