Skip to main content

A Hierarchical Bayesian Model for RNA-Seq Data

  • Chapter
  • First Online:
Complex Models and Computational Methods in Statistics

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

  • 2310 Accesses

Abstract

In the last few years, RNA-Seq has become a popular choice for high-throughput studies of gene expression, revealing its potential to overcome microarrays and become the new standard for transcriptional profiling. At a gene-level, RNA-Seq yields counts rather than continuous measures of expression, leading to the need for novel methods to deal with count data in high-dimensional problems.We present a hierarchical Bayesian approach to the modeling of RNA-Seq data. The model accounts for the difference in the total number of counts in the different samples (sequencing depth), as well as for overdispersion, with no need to transform the data prior to the analysis. Using an MCMC algorithm, we identify differentially expressed genes, showing promising results both on simulated and on real data, compared to those of edgeR and DESeq (state-of-the-art algorithms for RNA-Seq data analysis).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)

    Article  Google Scholar 

  2. Anders, S., Reyes, A., Huber, W.: Detecting differential usage of exons from RNA-seq data. Genome Res, online advanced access (2012)

    Google Scholar 

  3. Anscombe, F.: Sampling theory of the negative binomial and logarithmic series distributions. Biometrika 37(3/4), 358–382 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baldi, P., Long, A.: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6), 509–519 (2001)

    Article  Google Scholar 

  5. Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformat. 11(1), 94 (2010)

    Article  Google Scholar 

  6. Bulmer, M.: On fitting the Poisson lognormal distribution to species-abundance data. Biometrics 30(1), 101–110 (1974)

    Article  MATH  Google Scholar 

  7. Hardcastle, T., Kelly, K.: baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformat. 11(1), 422 (2010)

    Google Scholar 

  8. Ibrahim, J., Chen, M., Gray, R.: Bayesian models for gene expression with DNA microarray data. J. Am. Stat. Assoc. 97(457), 88–99 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kendziorski, C., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22, 3899–3914 (2003)

    Article  Google Scholar 

  10. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)

    Article  Google Scholar 

  11. Lawless, J.: Negative binomial and mixed Poisson regression. Can. J. Stat. 15(3), 209–225 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lee, J., Ji, Y., Liang, S., Cai, G., Müller, P.: On differential gene expression using RNA-Seq data. Cancer Informat. 10, 205 (2011)

    Google Scholar 

  13. Lönnstedt, I., Speed, T.: Replicated microarray data. Statistica Sinica 12(1), 31–46 (2002)

    MathSciNet  MATH  Google Scholar 

  14. Marioni, J., Mason, C., Mane, S., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509 (2008)

    Article  Google Scholar 

  15. McCarthy, D., Smyth, G.: Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6), 765 (2009)

    Article  Google Scholar 

  16. Mortazavi, A., Williams, B., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621–628 (2008)

    Article  Google Scholar 

  17. Plummer, M.: JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing, pp. 20–22, March 2003

    Google Scholar 

  18. R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2009). URL http://www.R-project.org

  19. Robinson, M., McCarthy, D., Smyth, G.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139 (2010)

    Google Scholar 

  20. Robinson, M., Smyth, G.: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21), 2881 (2007)

    Article  Google Scholar 

  21. Robinson, M., Smyth, G.: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9(2), 321 (2008)

    Article  MATH  Google Scholar 

  22. Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. Roy. Stat. Soc. Ser. B (Methodolog.) 71(2), 319–392 (2009)

    Google Scholar 

  23. Shi, L., Reid, L., Jones, W., et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24(9), 1151–1161 (2006)

    Article  Google Scholar 

  24. Smyth, G.: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1), 3 (2004)

    MathSciNet  Google Scholar 

  25. Tarazona, S., García-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A.: Differential expression in RNA-seq: A matter of depth. Genome Res. 21(12), 2213–2223 (2011)

    Article  Google Scholar 

  26. Wang, L., Feng, Z., Wang, X., Wang, X., Zhang, X.: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26(1), 136 (2010)

    Article  Google Scholar 

  27. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

  28. Wu, Z., Jenkins, B., Rynearson, T., Dyhrman, S., Saito, M., Mercier, M., Whitney, L.: Empirical Bayes analysis of sequencing-based transcriptional profiling without replicates. BMC Bioinformat. 11, 564 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Risso .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Italia

About this chapter

Cite this chapter

Risso, D., Sales, G., Romualdi, C., Chiogna, M. (2013). A Hierarchical Bayesian Model for RNA-Seq Data. In: Grigoletto, M., Lisi, F., Petrone, S. (eds) Complex Models and Computational Methods in Statistics. Contributions to Statistics. Springer, Milano. https://doi.org/10.1007/978-88-470-2871-5_17

Download citation

Publish with us

Policies and ethics