Abstract
The development of novel high-throughput DNA sequencing methods has provided a powerful method for both mapping and quantifying transcriptomes. This method, termed RNA-seq (RNA sequencing), has advantages over microarray-based approaches in terms of wide dynamic range of expressions, less reliance on existing knowledge about genome sequence, and low background noise. After aligning the reads to the reference genomes, the first step of RNA-seq analysis is to infer relative transcript abundances. This can be done at the whole transcript level, at the isoform-specific relative abundance level assuming a known set of isoforms, and at the level where transcripts are identified and their abundances are quantified. We review these methods briefly and add some recent developments in dealing with non-uniform read distribution within a transcript. We focus on methods for simultaneous transcript discovery and quantification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., Rinn, J.L.: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25(18), 1915–1927 (2011)
Feng, J., Li, W., Jiang, T.: Inference of isoforms from short sequence reads. J. Comput. Biol. 8(3), 305–321 (2011)
Guttman, M., Rinn, J.: Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012)
Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., Rinn, J.L., Lander, E.S., Regev, A.: Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs. Nat. Biotech. 28(5), 503–510 (2010)
Heber, S., Alekseyev, M., Sze, S., Tang, H., Pevzner, P.A.: Splicing graphs and EST assembly problem. Bioinformatics 18, S181–S188 (2002)
Hu, Y., Liu, Y., Mao, X., Jia, C., Ferguson, J., Xue, C., Reilly, M., Li, H., Li, M.: PennSeq: accurate isoform-specific gene expression quantification in RNA-seq by modeling non-uniform read distribution. Nucleic Acids Res. 42(3), e20 (2014)
Jiang, H., Salzman, J.: A penalized likelihood approach for robust estimation of isoform expression arXiv:1310.0379 (2013, preprint)
Jiang, H., Wong, W.H.: Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25, 1026–1032 (2009)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)
Lappalainen, T., Sammeth, A., Friedlander, M.R., et al.: Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013)
LeGault, L.H., Dewey, C.N.: Inference of alternative splicing from RNA-seq data with probabilistic splice graphs. Bioinformatics 29(18), 2300–2310 (2013)
Li, W., Jiang, T.: Transcriptome assembly and isoform expression level estimation from biased RNA-seq reads. Bioinformatics 28(22), 2914–2921 (2012)
Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in RNA-seq data. Genome Biol. 11, R50 (2010)
Li, W., Feng, J., Jiang, T.: IsoLasso: a LASSO regression approach to RNA-seq based transcriptome assembly. J. Comput. Biol. 88(11), 1693–1707 (2011)
Li, J.J., Jiang, C.R., Brown, J.B., Huang, H., Bickel, P.J.: Sparse linear modeling of next-generation mRNA sequencing (RNA-seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. 108(50), 19867–19872 (2012)
Mezlini, A.M., Smith, E.J., Fiume, M., Buske, O., Savich, G.L., Shah, S., Aparicio, S., Chiang, D.Y., Goldenberg, A., Brudno, M.: iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 23(3), 519–529 (2013)
Montgomery, S.B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R.P., Ingle, C., Nisbett, J., Guigo, R., Dermitzakis, E.T.: Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010)
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Meth. 5, 621–628 (2008)
Pachter, L.: Models for transcript quantification from RNA-seq. Technical Report. University of California, Berkeley (2013)
Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkadori, E., Veyrieras, J.B., Stephens, M., Gilad, Y., Pritchard, J.: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010)
Rozowsky, J., Abyzov, A., Wang, J., Alves, P., Raha, D., Harmanci, A., Leng, J., Bjornson, R., Kong, Y., Kitabayashi, N., Bhardwaj, N., Rubin, M., Snyder, M., Gerstein, M.: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7, 522 (2011)
Salzman, J., Jiang, H., Wong, W.H.: Statistical modeling of RNA-seq data. Stat. Sci. 26 (1), 62–83 (2011)
Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J., Akey, J.M.: A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res. 21, 1728–1738 (2011)
Steijger, T., Abril, J.F., Engstrm, P.G., Kokocinski, F., The RGASP Consortium, Hubbard, T.J., Guig, R., Harrow, J., Berton, P.: Assessment of transcript reconstruction methods for RNRNA-seq. Nat. Meth. 10, 1177–1184 (2013)
Stevenson, K.R., Coolon, J.D., Wittkopp, P.J.: Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome. BMC Genom. 14, 536 (2013)
Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25(9), 1105–1111 (2009)
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A.M., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B., Pachter, L.: Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotech. 28(5), 511–515 (2010)
Turro, E., Su, S.Y., Gonçalves, Â., Coin, L.J., Richardson, S., Lewin, A.: Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol. 12(2), R13 (2011)
Vardhanabhuti, S., Li, M., Li, H.: A hierarchical Bayesian model for estimating and inferring differential isoform expression for multi-sample RNA-seq data. Stat. Biosci. 5(1), 244–258 (2013)
Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Wu, T.W., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010)
Acknowledgements
This research is supported by NIH grants CA127334 and GM097505.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Li, H. (2014). Isoform Expression Analysis Based on RNA-seq Data. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-07212-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)