Exact Transcriptome Reconstruction from Short Sequence Reads

Lacroix, Vincent; Sammeth, Michael; Guigo, Roderic; Bergeron, Anne

doi:10.1007/978-3-540-87361-7_5

Vincent Lacroix¹,
Michael Sammeth¹,
Roderic Guigo¹ &
…
Anne Bergeron²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1116 Accesses
27 Citations

Abstract

In this paper we address the problem of characterizing the RNA complement of a given cell type, that is, the set of RNA species and their relative copy number, from a large set of short sequence reads which have been randomly sampled from the cell’s RNA sequences through a sequencing experiment. We refer to this problem as the transcriptome reconstruction problem, and we specifically investigate, both theoretically and practically, the conditions under which the problem can be solved. We demonstrate that, even under the assumption of exact information, neither single read nor paired-end read sequences guarantee theoretically that the reconstruction problem has a unique solution. However, by investigating the behavior of the best annotated human gene set, we also show that, in practice, paired-end reads – but not single reads – may be sufficient to solve the vast majority of the transcript variants species and abundances. We finally show that, when we assume that the RNA species existing in the cell are known, single read sequences can effectively be used to infer transcript variant abundances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, M.D., et al.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252(5013), 1651–1656 (1991)
Article Google Scholar
Bellin, D., Werber, M., Theis, T., Schulz, B., Weisshaar, B., Schneider, K.: EST Sequencing, Annotation and Macroarray Transcriptome Analysis Identify Preferentially Root-Expressed Genes in Sugar Beet. Plant Biology 4(6), 700–710 (2002)
Article Google Scholar
Bennett, S.T., Barnes, C., Cox, A., Davies, L., Brown, C.: Toward the 1,000 dollars human genome. Pharmacogenomics 6, 373–382 (2005)
Article Google Scholar
Chen, J., Skiena, S.: Assembly For Double-Ended Short-Read Sequencing Technologies. In: Mardis, E., Kim, S., Tang, H. (eds.) Advances in Genome Sequencing Technology and Algorithms. Artech House Publishers (2007)
Google Scholar
De Bona, F., Ossowski, S., Schneeberger, K., Rätsh, G.: Optimal Spliced Alignments of Short Sequence Reads. Bioinformatics (in press, 2008)
Google Scholar
ENCODE Project Consortium. Identification and analysis of functional elements in 1 % of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)
Google Scholar
Harrow, J., et al.: GENCODE: producing a reference annotation for ENCODE. Genome Biology 7, S4 (2006)
Article Google Scholar
Heber, S., Alekseyev, M., Sze, S.H., Tang, H., Pevzner, P.A.: Splicing graphs and EST assembly problem. Bioinformatics 18 (suppl. 1), 181–188 (2002)
Google Scholar
Hoffmann, K.F., Dunne, D.W.: Characterization of the Schistosoma transcriptome opens up the world of helminth genomics. Genome Biology 5, 203 (2003)
Article Google Scholar
Houde, M., et al.: Wheat EST resources for functional genomics of abiotic stress. BMC Genomics 7, 149 (2006)
Article Google Scholar
Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Article Google Scholar
Mironov, A.A., Fickett, J.W., Gelfand, M.S.: Frequent alternative splicing of human genes. Genome Research 9, 1288–1293 (1999)
Article Google Scholar
Modrek, B., Resch, A., Grasso, C., Lee, C.: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Research 29, 2850–2859 (2001)
Article Google Scholar
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research. 35, D61-D65 (2007)
Article Google Scholar
Sammeth, M., Foissac, S., Guigo, R.: A General Definition and Nomenclature for Alternative Splicing Events. PLoS Computational Biology (in press, 2008)
Google Scholar
Sammeth, M., Valiente, G., Guigo, R.: Bubbles: Alternative Splicing Events of Arbitrary Dimension in Splicing Graphs. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 372–395. Springer, Heidelberg (2008)
Chapter Google Scholar
Sugnet, C.W., Kent, W.J., Ares, M., Haussler, D.: Transcriptome and genome conservation of alternative splicing events in humans and mice. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 9, pp. 66–77 (2004)
Google Scholar
Weber, A., Weber, K., Carr, K., Wilkerson, C., Ohlrogge, J.: Sampling the Arabidopsis Transcriptome with Massively Parallel Pyrosequencing. Plant Physiology 144, 32–42 (2007)
Article Google Scholar
Xing, Y., Resch, A., Lee, C.: The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Research 14(3), 426–441 (2004)
Article Google Scholar
Xing, Y., Yu, T., Wu, Y.N., Roy, M., Kim, J., Lee, C.: An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Research 34, 3150–3160 (2006)
Article Google Scholar
Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn Graphs. Genome Research 18, 821–829 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Genome Bioinformatics Research Group - CRG, Barcelona, Spain
Vincent Lacroix, Michael Sammeth & Roderic Guigo
Comparative Genomics Laboratory, Université du Québec à, Montréal, Canada
Anne Bergeron

Authors

Vincent Lacroix
View author publications
You can also search for this author in PubMed Google Scholar
Michael Sammeth
View author publications
You can also search for this author in PubMed Google Scholar
Roderic Guigo
View author publications
You can also search for this author in PubMed Google Scholar
Anne Bergeron
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A. (2008). Exact Transcriptome Reconstruction from Short Sequence Reads. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-87361-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87360-0
Online ISBN: 978-3-540-87361-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics