A Robust Method for Transcript Quantification with RNA-seq Data

  • Yan Huang
  • Yin Hu
  • Corbin D. Jones
  • James N. MacLeod
  • Derek Y. Chiang
  • Yufeng Liu
  • Jan F. Prins
  • Jinze Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells), but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e. lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this paper, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.


Transcript quantification Transcriptome RNA-seq 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Ensembl Genome Browser,
  3. 3.
    NCBI Reference Sequence (RefSeq),
  4. 4.
    Roberts, A., Trapnell, C., Donaghey, J., Rinn, J., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12(3), R22 (2011)CrossRefGoogle Scholar
  5. 5.
    Bejerano, G.: Algorithms for variable length markov chain modeling. Bioinformatics 20, 788–789 (2004)CrossRefGoogle Scholar
  6. 6.
    Bohnert, R., Gunnar, R.: rquant.web: a tool for rna-seq-based transcript quantitation. Nucleic Acids Research 38(suppl. 2), W348–W351 (2010)CrossRefGoogle Scholar
  7. 7.
    Brosseau, J.-P., Lucier, J.-F., Lapointe, E., Durand, M., Gendron, D., Gervais-Bird, J., Tremblay, K., Perreault, J.-P., Elela, S.A.: High-throughput quantification of splicing isoforms. RNA Society 16, 442–449 (2010)CrossRefGoogle Scholar
  8. 8.
    Feng, J., Li, W., Jiang, T.: Inference of Isoforms from Short Sequence Reads. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 138–157. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Fox-Walsh, K.L., Dou, Y., Lam, B.J., Hung, S.-P., Baldi, P.F., Herte, K.J.: The architecture of pre-mrnas affects mechanisms of splice-site pairing. Proc. Natl. Acad. Sci. 102(45), 16176–16181 (2005)CrossRefGoogle Scholar
  10. 10.
    Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnology 28, 503–510 (2010)CrossRefGoogle Scholar
  11. 11.
    Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (1990)Google Scholar
  12. 12.
    Hu, Y., Wang, K., He, X., Chiang, D.Y., Prins, J.F., Liu, J.: A probabilistic framework for aligning paired-end rna-seq data. Bioinformatics 26, 1950–1957 (2010)CrossRefGoogle Scholar
  13. 13.
    Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in rna-seq. Bioinformatics 25, 1026–1032 (2009)CrossRefGoogle Scholar
  14. 14.
    Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nuc. 6, 291–295 (2009)Google Scholar
  15. 15.
    Shi, L., Reid, L.H., Jones, W.D., et al.: The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 24(9), 1151–1161 (2006)CrossRefGoogle Scholar
  16. 16.
    Lacroix, V., Sammeth, M., Guigo, R., Bergeron, A.: Exact Transcriptome Reconstruction from Short Sequence Reads. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 50–63. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: Rna-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26 (4), 493–500 (2010)CrossRefGoogle Scholar
  18. 18.
    Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in rna-seq data. Genome Biology 11 (2010)Google Scholar
  19. 19.
    Li, W., Feng, J., Jiang, T.: IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 168–188. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Lia, J.J., Jiangb, C.-R., Browna, J.B., Huanga, H., Bickela, P.J.: Sparse linear modeling of next-generation mrna sequencing (rna-seq) data for isoform discovery and abundance estimation. PNAS (2011)Google Scholar
  21. 21.
    Olejniczak, M., Galka, P., Krzyzosiak, W.J.: Sequence-non-specific effects of rna interference triggers and microrna regulators. Nucl. Acids Res. 38(1), 1–16 (2010)CrossRefGoogle Scholar
  22. 22.
    Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology 6, 9 (2011)CrossRefGoogle Scholar
  23. 23.
    Pan, Q., Shai, O., Lee, L.J., Frey, B.J., Blencowe, B.J.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics 40, 1413–1415 (2008)CrossRefGoogle Scholar
  24. 24.
    Richard, H., Schulz, M.H., Sultan, M., Nrnberger, A., Schrinner, S., Balzereit, D., Dagand, E., Rasche, A., Lehrach, H., Vingron, M., Haas, S.A., Yaspo, M.-L.: Prediction of alternative isoforms from exon expression levels in rna-seq experiments. Nucleic Acids Research 38, e112 (2010)CrossRefGoogle Scholar
  25. 25.
    Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., Pachter, L.: Improving rna-seq expression estimates by correcting for fragment bias. Genome Biology 12, R22 (2011)CrossRefGoogle Scholar
  26. 26.
    Russell, S., Norvig, P.: Artificial intelligence: A modern approach, R22 (2003)Google Scholar
  27. 27.
    Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research, 1–15 (2010)Google Scholar
  28. 28.
    Singh, D., Orellana, C.F., Hu, Y., Jones, C.D., Liu, Y., Chiang, D.Y., Liu, J., Prins, J.F.: Fdm: A graph-based statistical method to detect differential transcription using rna-seq data. Bioinformatics (2011)Google Scholar
  29. 29.
    Srivastava, S., Chen, L.: A two-parameter generalized poisson model to improve the analysis of rna-seq data. Nucleic Acids Research 38, e112 (2010)CrossRefGoogle Scholar
  30. 30.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society Series B. 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., Pachter, L.: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology 28, 511–515 (2010)CrossRefGoogle Scholar
  32. 32.
    Turro, E., Su, S.-Y., Gonçalves, Â., Coin, L.J.M., Richardson, S., Lewin, A.: Haplotype and isoform specific expression estimation using multi-mapping rna-seq reads. Genome Biology 12, R13 (2011)CrossRefGoogle Scholar
  33. 33.
    Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008)CrossRefGoogle Scholar
  34. 34.
    Wang, K., Singh, D., Zeng, Z., Huang, Y., Coleman, S., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., MacLeod, J.N., Chiang, D.Y., Prins, J.F., Liu, J.: Mapsplice: Accurate mapping of rna-seq reads for splice junction discovery. Nucleic Acid Research 38(18), 178 (2010)CrossRefGoogle Scholar
  35. 35.
    Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63 (2009)CrossRefGoogle Scholar
  36. 36.
    Wu, J., Akerman, M., Sun, S., Richard McCombie, W., Krainer, A.R., Zhang, M.Q.: Splicetrap: a method to quantify alternative splicing under single cellular conditions. Bioinformatics (2011)Google Scholar
  37. 37.
    Wu, Z., Wang, X., Zhang, X.: Using non-uniform read distribution models to improve isoform expression inference in rna-seq. Bioinformatics 27, 502–508 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yan Huang
    • 1
  • Yin Hu
    • 1
  • Corbin D. Jones
    • 2
  • James N. MacLeod
    • 3
  • Derek Y. Chiang
    • 4
  • Yufeng Liu
    • 5
  • Jan F. Prins
    • 1
  • Jinze Liu
    • 1
  1. 1.Department of Computer ScienceUniversity of North CarolinaChapel HillUSA
  2. 2.Department of BiologyUniversity of North CarolinaChapel HillUSA
  3. 3.Department of Veterinary ScienceUniversity of KentuckyUSA
  4. 4.Department of GeneticsUniversity of North CarolinaChapel HillUSA
  5. 5.Department of Statistics and Operations ResearchUniversity of North CarolinaChapel HillUSA

Personalised recommendations