Abstract
The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells), but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e. lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this paper, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cufflinks, http://cufflinks.cbcb.umd.edu
Ensembl Genome Browser, http://useast.ensembl.org/index.html
NCBI Reference Sequence (RefSeq), http://www.ncbi.nlm.nih.gov/RefSeq
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12(3), R22 (2011)
Bejerano, G.: Algorithms for variable length markov chain modeling. Bioinformatics 20, 788–789 (2004)
Bohnert, R., Gunnar, R.: rquant.web: a tool for rna-seq-based transcript quantitation. Nucleic Acids Research 38(suppl. 2), W348–W351 (2010)
Brosseau, J.-P., Lucier, J.-F., Lapointe, E., Durand, M., Gendron, D., Gervais-Bird, J., Tremblay, K., Perreault, J.-P., Elela, S.A.: High-throughput quantification of splicing isoforms. RNA Society 16, 442–449 (2010)
Feng, J., Li, W., Jiang, T.: Inference of Isoforms from Short Sequence Reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)
Fox-Walsh, K.L., Dou, Y., Lam, B.J., Hung, S.-P., Baldi, P.F., Herte, K.J.: The architecture of pre-mrnas affects mechanisms of splice-site pairing. Proc. Natl. Acad. Sci. 102(45), 16176–16181 (2005)
Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28, 503–510 (2010)
Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (1990)
Hu, Y., Wang, K., He, X., Chiang, D.Y., Prins, J.F., Liu, J.: A probabilistic framework for aligning paired-end rna-seq data. Bioinformatics 26, 1950–1957 (2010)
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in rna-seq. Bioinformatics 25, 1026–1032 (2009)
Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nuc. 6, 291–295 (2009)
Shi, L., Reid, L.H., Jones, W.D., et al.: The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)
Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact Transcriptome Reconstruction from Short Sequence Reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26 (4), 493–500 (2010)
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biology 11 (2010)
Li, W., Feng, J., Jiang, T.: IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 168–188. Springer, Heidelberg (2011)
Lia, J.J., Jiangb, C.-R., Browna, J.B., Huanga, H., Bickela, P.J.: Sparse linear modeling of next-generation mrna sequencing (rna-seq) data for isoform discovery and abundance estimation. PNAS (2011)
Olejniczak, M., Galka, P., Krzyzosiak, W.J.: Sequence-non-specific effects of rna interference triggers and microrna regulators. Nucl. Acids Res. 38(1), 1–16 (2010)
Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology 6, 9 (2011)
Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40, 1413–1415 (2008)
Richard, H., Schulz, M.H., Sultan, M., Nrnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in rna-seq experiments. Nucleic Acids Research 38, e112 (2010)
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12, R22 (2011)
Russell, S., Norvig, P.: Artificial intelligence: A modern approach, R22 (2003)
Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research, 1–15 (2010)
Singh, D., Orellana, C.F., Hu, Y., Jones, C.D., Liu, Y., Chiang, D.Y., Liu, J., Prins, J.F.: Fdm: A graph-based statistical method to detect differential transcription using rna-seq data. Bioinformatics (2011)
Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research 38, e112 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society Series B. 58, 267–288 (1996)
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)
Turro, E., Su, S.-Y., Gonçalves, Â., Coin, L.J.M., Richardson, S., Lewin, A.: Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads. Genome Biology 12, R13 (2011)
Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008)
Wang, K., Singh, D., Zeng, Z., Huang, Y., Coleman, S., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., Liu, J.: Mapsplice: Accurate mapping of rna-seq reads for splice junction discovery. Nucleic Acid Research 38(18), 178 (2010)
Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63 (2009)
Wu, J., Akerman, M., Sun, S., Richard McCombie, W., Krainer, A.R., Zhang, M.Q.: Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics (2011)
Wu, Z., Wang, X., Zhang, X.: Using non-uniform read distribution models to improve isoform expression inference in rna-seq. Bioinformatics 27, 502–508 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, Y. et al. (2012). A Robust Method for Transcript Quantification with RNA-seq Data. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-29627-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29626-0
Online ISBN: 978-3-642-29627-7
eBook Packages: Computer ScienceComputer Science (R0)