Information and Statistical Analysis Pipeline for High-Throughput RNA Sequencing Data

Nakaoka, Shinji; Matsuyama, Keita

doi:10.1007/7651_2019_245

Shinji Nakaoka³ &
Keita Matsuyama³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2109))

2868 Accesses

Abstract

Applications of RNA sequencing have been wide-spreading in various subfields of life science. Construction of information and statistical analysis pipeline is indispensable to process raw RNA sequencing (RNA-seq) data generated by next-generation sequencers in order to extract biological implications. In this chapter, we introduce a common pipeline for RNA-seq data. A collection of notes on related advanced topics will be useful when conducting information and statistical analysis in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SePIA: RNA and small RNA sequence processing, integration, and analysis

Article Open access 20 May 2016

Modeling and analysis of RNA-seq data: a review from a statistical perspective

Article 10 August 2018

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization

Article Open access 08 January 2016

References

Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T (2017) Transcriptomics technologies. PLoS Comput Biol 13:e1005457
Article Google Scholar
Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38:1767–1771
Article CAS Google Scholar
Kobayashi T, Voisin B, Kim DY, Kennedy EA, Jo JH, Shih HY, Truong A, Doebel T, Sakamoto K, Cui CY, Schlessinger D, Moro K, Nakae S, Horiuchi K, Zhu J, Leonard WJ, Kong HH, Nagao K (2019) Homeostatic control of sebaceous glands by innate lymphoid cells regulates commensal bacteria equilibrium. Cell 176:982–997.e916
Article CAS Google Scholar
FastQC. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Article CAS Google Scholar
Chen S, Zhou Y, Chen Y, Gu J (2018) Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890
Article Google Scholar
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360
Article CAS Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
Article CAS Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing Subgroup (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Article Google Scholar
Anders S, Pyl PT, Huber W (2015) HTSeq—a python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169
Article CAS Google Scholar
Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41:e108
Article Google Scholar
Liao Y, Smyth GK, Shi W (2019) The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res 47(8):e47
Article CAS Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Google Scholar
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140
Article CAS Google Scholar
Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloe D, Le Gall C, Schaeffer B, Le Crom S, Guedj M, Jaffrezic F, French StatOmique C (2013) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14:671–683
Article CAS Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
Google Scholar
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Gene Ontology C (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261
Article CAS Google Scholar
Yu G, Wang LG, Han Y, He QY (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16:284–287
Article CAS Google Scholar
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353–D361
Article CAS Google Scholar
Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, Matthews L, May B, Milacic M, Rothfels K, Shamovsky V, Webber M, Weiser J, Williams M, Wu G, Stein L, Hermjakob H, D'Eustachio P (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44:D481–D487
Article CAS Google Scholar
Team RC (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Google Scholar
Gruning B, Dale R, Sjodin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Koster J, Bioconda T (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15:475–476
Article Google Scholar
Github. https://github.com/
Docker. https://www.docker.com/
Galaxy. https://usegalaxy.org/
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138
Article CAS Google Scholar
Mikheyev AS, Tin MM (2014) A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour 14:1097–1102
Article CAS Google Scholar
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127
Article Google Scholar
Bourgon R, Gentleman R, Huber W (2010) Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A 107:9546–9551
Article CAS Google Scholar
Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049
Article CAS Google Scholar
Wu AR, Neff NF, Kalisky T, Dalerba P, Treutlein B, Rothenberg ME, Mburu FM, Mantalas GL, Sim S, Clarke MF, Quake SR (2014) Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 11:41–46
Article CAS Google Scholar
Hwang B, Lee JH, Bang D (2018) Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50:96
Article Google Scholar

Download references

Acknowledgments

This work is supported by JST PRESTO Grant Number JPMJPR16E9 and the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid (C) JP16K05265 and (S) JP15H05707. The authors are grateful to Ms. Mai Suganami for typesetting references and corrections of typos.

Author information

Authors and Affiliations

Faculty of Advanced Life Science, Hokkaido University, Sapporo, Japan
Shinji Nakaoka & Keita Matsuyama

Authors

Shinji Nakaoka
View author publications
You can also search for this author in PubMed Google Scholar
Keita Matsuyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shinji Nakaoka .

Editor information

Editors and Affiliations

Ottawa, ON, Canada
Kursad Turksen

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Nakaoka, S., Matsuyama, K. (2019). Information and Statistical Analysis Pipeline for High-Throughput RNA Sequencing Data. In: Turksen, K. (eds) Epidermal Cells. Methods in Molecular Biology, vol 2109. Humana, New York, NY. https://doi.org/10.1007/7651_2019_245

Download citation

DOI: https://doi.org/10.1007/7651_2019_245
Published: 22 June 2019
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0250-8
Online ISBN: 978-1-0716-0251-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Information and Statistical Analysis Pipeline for High-Throughput RNA Sequencing Data

Abstract

Access this chapter

Similar content being viewed by others

SePIA: RNA and small RNA sequence processing, integration, and analysis

Modeling and analysis of RNA-seq data: a review from a statistical perspective

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Navigation

Information and Statistical Analysis Pipeline for High-Throughput RNA Sequencing Data

Abstract

Access this chapter

Similar content being viewed by others

SePIA: RNA and small RNA sequence processing, integration, and analysis

Modeling and analysis of RNA-seq data: a review from a statistical perspective

QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation