RNA-Seq Experiment and Data Analysis

  • Hanquan Liang
  • Erliang Zeng
Part of the Methods in Molecular Biology book series (MIMB, volume 1366)


With the ability to obtain tens of millions of reads, high-throughput messenger RNA sequencing (RNA-Seq) data offers the possibility of estimating abundance of isoforms and finding novel transcripts. In this chapter, we describe a protocol to construct an RNA-Seq library for sequencing on Illumina NGS platforms, and a computational pipeline to perform RNA-Seq data analysis. The protocols described in this chapter can be applied to the analysis of differential gene expression in control versus 17β-estradiol treatment of in vivo or in vitro systems.

Key words

RNA-Seq Next-generation sequencing Data analysis Bioconductor Statistical analysis Differentially expressed genes 



We thank Dr. Thomas Girke at the University of California Riverside for sharing his R scripts.


  1. 1.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefGoogle Scholar
  2. 2.
    Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87–98CrossRefGoogle Scholar
  3. 3.
    Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18(9):1509–1517CrossRefGoogle Scholar
  4. 4.
    Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628CrossRefGoogle Scholar
  5. 5.
    Twine NA, Janitz K, Wilkins MR, Janitz M (2011) Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer’s disease. PLoS One 6(1), e16266CrossRefGoogle Scholar
  6. 6.
    Eksi R, Li HD, Menon R et al (2013) Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data. PLoS Comput Biol 9(11), e1003314CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Leggett RM, Ramirez-Gonzalez RH, Clavijo BJ, Waite D, Davey RP (2013) Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics. Front Genet 4:288CrossRefGoogle Scholar
  9. 9.
    Andrews S (2010) FastQC: a quality control tool for high throughput sequence data.
  10. 10.
    R: A language and environment for statistical computing.
  11. 11.
    Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80CrossRefGoogle Scholar
  12. 12.
    Gaidatzis D, Lerch A, Hahne F, Stadler MB (2014) QuasR: quantification and annotation of short reads in R. Bioinformatics pii, btu781Google Scholar
  13. 13.
    Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106CrossRefGoogle Scholar
  14. 14.
    Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.McDermott Center for Human Growth and DevelopmentUniversity of Texas Southwestern Medical CenterDallasUSA
  2. 2.Department of BiologyUniversity of South DakotaVermillionUSA
  3. 3.Department of Computer ScienceUniversity of South DakotaVermillionUSA

Personalised recommendations