Abstract
In the last few years, RNA-Seq has become a popular choice for high-throughput studies of gene expression, revealing its potential to overcome microarrays and become the new standard for transcriptional profiling. At a gene-level, RNA-Seq yields counts rather than continuous measures of expression, leading to the need for novel methods to deal with count data in high-dimensional problems.We present a hierarchical Bayesian approach to the modeling of RNA-Seq data. The model accounts for the difference in the total number of counts in the different samples (sequencing depth), as well as for overdispersion, with no need to transform the data prior to the analysis. Using an MCMC algorithm, we identify differentially expressed genes, showing promising results both on simulated and on real data, compared to those of edgeR and DESeq (state-of-the-art algorithms for RNA-Seq data analysis).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)
Anders, S., Reyes, A., Huber, W.: Detecting differential usage of exons from RNA-seq data. Genome Res, online advanced access (2012)
Anscombe, F.: Sampling theory of the negative binomial and logarithmic series distributions. Biometrika 37(3/4), 358–382 (1950)
Baldi, P., Long, A.: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6), 509–519 (2001)
Bullard, J., Purdom, E., Hansen, K., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformat. 11(1), 94 (2010)
Bulmer, M.: On fitting the Poisson lognormal distribution to species-abundance data. Biometrics 30(1), 101–110 (1974)
Hardcastle, T., Kelly, K.: baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformat. 11(1), 422 (2010)
Ibrahim, J., Chen, M., Gray, R.: Bayesian models for gene expression with DNA microarray data. J. Am. Stat. Assoc. 97(457), 88–99 (2002)
Kendziorski, C., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22, 3899–3914 (2003)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Lawless, J.: Negative binomial and mixed Poisson regression. Can. J. Stat. 15(3), 209–225 (1987)
Lee, J., Ji, Y., Liang, S., Cai, G., Müller, P.: On differential gene expression using RNA-Seq data. Cancer Informat. 10, 205 (2011)
Lönnstedt, I., Speed, T.: Replicated microarray data. Statistica Sinica 12(1), 31–46 (2002)
Marioni, J., Mason, C., Mane, S., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509 (2008)
McCarthy, D., Smyth, G.: Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25(6), 765 (2009)
Mortazavi, A., Williams, B., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621–628 (2008)
Plummer, M.: JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing, pp. 20–22, March 2003
R Development Core Team: R: a language and environment for statistical computing. RÂ Foundation for Statistical Computing, Vienna, Austria (2009). URL http://www.R-project.org
Robinson, M., McCarthy, D., Smyth, G.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139 (2010)
Robinson, M., Smyth, G.: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21), 2881 (2007)
Robinson, M., Smyth, G.: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9(2), 321 (2008)
Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. Roy. Stat. Soc. Ser. B (Methodolog.) 71(2), 319–392 (2009)
Shi, L., Reid, L., Jones, W., et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24(9), 1151–1161 (2006)
Smyth, G.: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1), 3 (2004)
Tarazona, S., GarcÃa-Alcalde, F., Dopazo, J., Ferrer, A., Conesa, A.: Differential expression in RNA-seq: A matter of depth. Genome Res. 21(12), 2213–2223 (2011)
Wang, L., Feng, Z., Wang, X., Wang, X., Zhang, X.: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26(1), 136 (2010)
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Wu, Z., Jenkins, B., Rynearson, T., Dyhrman, S., Saito, M., Mercier, M., Whitney, L.: Empirical Bayes analysis of sequencing-based transcriptional profiling without replicates. BMC Bioinformat. 11, 564 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Italia
About this chapter
Cite this chapter
Risso, D., Sales, G., Romualdi, C., Chiogna, M. (2013). A Hierarchical Bayesian Model for RNA-Seq Data. In: Grigoletto, M., Lisi, F., Petrone, S. (eds) Complex Models and Computational Methods in Statistics. Contributions to Statistics. Springer, Milano. https://doi.org/10.1007/978-88-470-2871-5_17
Download citation
DOI: https://doi.org/10.1007/978-88-470-2871-5_17
Published:
Publisher Name: Springer, Milano
Print ISBN: 978-88-470-2870-8
Online ISBN: 978-88-470-2871-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)