Skip to main content

Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR

  • Chapter
  • First Online:
Statistical Analysis of Next Generation Sequencing Data

Abstract

This article reviews the statistical theory underlying the edgeR software package for differential expression of RNA-seq data. Negative binomial models are used to capture the quadratic mean-variance relationship that can be observed in RNA-seq data. Conditional likelihood methods are used to avoid bias when estimating the level of variation. Empirical Bayes methods are used to allow gene-specific variation estimates even when the number of replicate samples is very small. Generalized linear models are used to accommodate arbitrarily complex designs. A key feature of the edgeR package is the use of weighted likelihood methods to implement a flexible empirical Bayes approach in the absence of easily tractable sampling distributions. The methodology is implemented in flexible software that is easy to use even for users who are not professional statisticians or bioinformaticians. The software is part of the Bioconductor project.

This article describes some recently implemented features. Loess-style weighting is used to improve the weighted likelihood approach, and an analogy with quasi-likelihood is used to estimate the optimal weight to be given to the empirical Bayes prior. The article includes a fully worked case study with complete code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010). doi:10.1186/gb-2010-11-10-r106

    Article  Google Scholar 

  2. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 57, 289–300 (1995)

    MATH  MathSciNet  Google Scholar 

  3. Chen, Y.: Differential expression analysis of complex RNA-seq experiments. Ph.D. thesis, University of Melbourne (2013)

    Google Scholar 

  4. Cox, D.R., Reid, N.: Parameter orthogonality and approximate conditional inference. J. R. Stat. Soc. Series B 49, 1–39 (1987)

    MATH  MathSciNet  Google Scholar 

  5. Efron, B.: Robbins, empirical Bayes and microarrays. Ann. Stat. 31(2), 366–378 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  6. Efron, B., Morris, C.: Stein’s estimation rule and its competitors: an empirical Bayes approach. J. Am. Stat. Assoc. 68(341), 117–130 (1973)

    MATH  MathSciNet  Google Scholar 

  7. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G.K., Tierney, L., Yang, J.Y., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)

    Article  Google Scholar 

  8. Jørgensen, B.: The Theory of Dispersion Models. Chapman & Hall, London (1997)

    Google Scholar 

  9. Liao, Y., Smyth, G.K., Shi, W.: featureCounts: an efficient general-purpose read summarization program. Bioinformatics 30, 923–930 (2014)

    Article  Google Scholar 

  10. Liao, Y., Smyth, G.K., Shi, W.: The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41(10), e108 (2013)

    Article  Google Scholar 

  11. Lund, S., Nettleton, D., McCarthy, D., Smyth, G.: Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat. Appl. Genet. Mol. Biol. 11(5), Article 8 (2012)

    Google Scholar 

  12. Maglott, D., Ostell, J., Pruitt, K., Tatusova, T.: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 39, D52–D57 (2011)

    Article  Google Scholar 

  13. Man, K., Miasari, M., Shi, W., Xin, A., Henstridge, D., Preston, S., Pellegrini, M., Belz, G., Smyth, G., Febbraio M Kallies, A.: IRF4 is essential for T cell receptor affinity mediated metabolic programming and clonal expansion of T cells. Nat. Immunol. 14, 1155–1165 (2013)

    Article  Google Scholar 

  14. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., Gilad, Y.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008). doi:10.1101/gr.079558.108

    Article  Google Scholar 

  15. McCarthy, D.J., Chen, Y., Smyth, G.K.: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40(10), 4288–4297 (2012)

    Article  Google Scholar 

  16. McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall/CRC, Boca Raton (1989)

    Book  MATH  Google Scholar 

  17. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Meth. 5(7), 621–628 (2008)

    Article  Google Scholar 

  18. Nelder, J.A., Wedderburn, R.W.M.: Generalized linear models. J. R. Stat. Soc. Series A (General) 135(3), 370–384 (1972). http://www.jstor.org/stable/2344614

  19. Pruitt, K., Tatusova, T., Brown, G., Maglott, D.: NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012)

    Article  Google Scholar 

  20. Robinson, M.D., Oshlack, A.: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11(3), R25 (2010). doi:10.1186/gb-2010-11-3-r25

    Article  Google Scholar 

  21. Robinson, M.D., Smyth, G.K.: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21), 2881–2887 (2007)

    Article  Google Scholar 

  22. Robinson, M.D., Smyth, G.K.: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9(2), 321–332 (2008)

    Article  MATH  Google Scholar 

  23. Robinson, M., McCarthy, D., Smyth, G.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)

    Google Scholar 

  24. Shendure, J., Aiden, E.L.: The expanding scope of DNA sequencing. Nat. Biotechnol. 30(11), 1084–1094 (2012)

    Article  Google Scholar 

  25. Smyth, G.: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3(1), Article 3 (2004)

    Google Scholar 

  26. Smyth, G., Verbyla, A.: Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10(6), 695–709 (1999)

    Article  Google Scholar 

  27. Wang, X.: Approximating Bayesian inference by weighted likelihood. Can. J. Stat. 34(2), 279–298 (2006)

    Article  MATH  Google Scholar 

  28. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009). doi:10.1038/nrg2484

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Wei Shi for providing the fragment counts and alignment code for the IRF4 data, and to Davis McCarthy who programmed the original implementation of the loess local likelihood trend described in Sect. 3.3.3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gordon K. Smyth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Chen, Y., Lun, A.T.L., Smyth, G.K. (2014). Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_3

Download citation

Publish with us

Policies and ethics