A Robust Method for Transcript Quantification with RNA-seq Data

Huang, Yan; Hu, Yin; Jones, Corbin D.; MacLeod, James N.; Chiang, Derek Y.; Liu, Yufeng; Prins, Jan F.; Liu, Jinze

doi:10.1007/978-3-642-29627-7_12

Yan Huang²⁰,
Yin Hu²⁰,
Corbin D. Jones²¹,
James N. MacLeod²²,
Derek Y. Chiang²³,
Yufeng Liu²⁴,
Jan F. Prins²⁰ &
…
Jinze Liu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1476 Accesses
1 Citations

Abstract

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells), but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e. lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this paper, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cufflinks, http://cufflinks.cbcb.umd.edu
Ensembl Genome Browser, http://useast.ensembl.org/index.html
NCBI Reference Sequence (RefSeq), http://www.ncbi.nlm.nih.gov/RefSeq
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12(3), R22 (2011)
Article Google Scholar
Bejerano, G.: Algorithms for variable length markov chain modeling. Bioinformatics 20, 788–789 (2004)
Article Google Scholar
Bohnert, R., Gunnar, R.: rquant.web: a tool for rna-seq-based transcript quantitation. Nucleic Acids Research 38(suppl. 2), W348–W351 (2010)
Article Google Scholar
Brosseau, J.-P., Lucier, J.-F., Lapointe, E., Durand, M., Gendron, D., Gervais-Bird, J., Tremblay, K., Perreault, J.-P., Elela, S.A.: High-throughput quantification of splicing isoforms. RNA Society 16, 442–449 (2010)
Article Google Scholar
Feng, J., Li, W., Jiang, T.: Inference of Isoforms from Short Sequence Reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)
Chapter Google Scholar
Fox-Walsh, K.L., Dou, Y., Lam, B.J., Hung, S.-P., Baldi, P.F., Herte, K.J.: The architecture of pre-mrnas affects mechanisms of splice-site pairing. Proc. Natl. Acad. Sci. 102(45), 16176–16181 (2005)
Article Google Scholar
Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28, 503–510 (2010)
Article Google Scholar
Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (1990)
Google Scholar
Hu, Y., Wang, K., He, X., Chiang, D.Y., Prins, J.F., Liu, J.: A probabilistic framework for aligning paired-end rna-seq data. Bioinformatics 26, 1950–1957 (2010)
Article Google Scholar
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in rna-seq. Bioinformatics 25, 1026–1032 (2009)
Article Google Scholar
Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nuc. 6, 291–295 (2009)
Google Scholar
Shi, L., Reid, L.H., Jones, W.D., et al.: The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)
Article Google Scholar
Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact Transcriptome Reconstruction from Short Sequence Reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26 (4), 493–500 (2010)
Article Google Scholar
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biology 11 (2010)
Google Scholar
Li, W., Feng, J., Jiang, T.: IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 168–188. Springer, Heidelberg (2011)
Chapter Google Scholar
Lia, J.J., Jiangb, C.-R., Browna, J.B., Huanga, H., Bickela, P.J.: Sparse linear modeling of next-generation mrna sequencing (rna-seq) data for isoform discovery and abundance estimation. PNAS (2011)
Google Scholar
Olejniczak, M., Galka, P., Krzyzosiak, W.J.: Sequence-non-specific effects of rna interference triggers and microrna regulators. Nucl. Acids Res. 38(1), 1–16 (2010)
Article Google Scholar
Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology 6, 9 (2011)
Article Google Scholar
Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40, 1413–1415 (2008)
Article Google Scholar
Richard, H., Schulz, M.H., Sultan, M., Nrnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in rna-seq experiments. Nucleic Acids Research 38, e112 (2010)
Article Google Scholar
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12, R22 (2011)
Article Google Scholar
Russell, S., Norvig, P.: Artificial intelligence: A modern approach, R22 (2003)
Google Scholar
Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research, 1–15 (2010)
Google Scholar
Singh, D., Orellana, C.F., Hu, Y., Jones, C.D., Liu, Y., Chiang, D.Y., Liu, J., Prins, J.F.: Fdm: A graph-based statistical method to detect differential transcription using rna-seq data. Bioinformatics (2011)
Google Scholar
Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research 38, e112 (2010)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society Series B. 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)
Article Google Scholar
Turro, E., Su, S.-Y., Gonçalves, Â., Coin, L.J.M., Richardson, S., Lewin, A.: Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads. Genome Biology 12, R13 (2011)
Article Google Scholar
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008)
Article Google Scholar
Wang, K., Singh, D., Zeng, Z., Huang, Y., Coleman, S., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., Liu, J.: Mapsplice: Accurate mapping of rna-seq reads for splice junction discovery. Nucleic Acid Research 38(18), 178 (2010)
Article Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63 (2009)
Article Google Scholar
Wu, J., Akerman, M., Sun, S., Richard McCombie, W., Krainer, A.R., Zhang, M.Q.: Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics (2011)
Google Scholar
Wu, Z., Wang, X., Zhang, X.: Using non-uniform read distribution models to improve isoform expression inference in rna-seq. Bioinformatics 27, 502–508 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of North Carolina, Chapel Hill, USA
Yan Huang, Yin Hu, Jan F. Prins & Jinze Liu
Department of Biology, University of North Carolina, Chapel Hill, USA
Corbin D. Jones
Department of Veterinary Science, University of Kentucky, USA
James N. MacLeod
Department of Genetics, University of North Carolina, Chapel Hill, USA
Derek Y. Chiang
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, USA
Yufeng Liu

Authors

Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Corbin D. Jones
View author publications
You can also search for this author in PubMed Google Scholar
James N. MacLeod
View author publications
You can also search for this author in PubMed Google Scholar
Derek Y. Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jan F. Prins
View author publications
You can also search for this author in PubMed Google Scholar
Jinze Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Tel-Aviv University, 69978, Tel-Aviv, Israel
Benny Chor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Y. et al. (2012). A Robust Method for Transcript Quantification with RNA-seq Data. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-29627-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29626-0
Online ISBN: 978-3-642-29627-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics