Transcriptome Profiling Strategies

Khamis, Abdullah M.; Bajic, Vladimir B.; Harbers, Matthias

doi:10.1007/978-3-319-31350-4_4

Abdullah M. Khamis Ph.D.³,
Vladimir B. Bajic Ph.D.³ &
Matthias Harbers Ph.D.⁴

1768 Accesses

Abstract

With the rapid development of high-speed DNA sequencing technologies, it became feasible to sequence deeply into cDNA libraries prepared from RNA samples. Such cDNA libraries can benefit from the development of full-length cDNA cloning technologies providing means to obtain sequence information on the entire RNA transcripts or their selected 5′ end. Comprehensive overviews on transcriptomes can be obtained today by combination of those new sequencing technologies with large-scale cDNA library preparation forming the basis to different approaches for transcriptome profiling.

In this chapter, we describe the use of full-length cDNA preparations in combination with shotgun sequencing in mRNA profiling (so-called RNA-Seq methods for “RNA sequencing”) and RNA-Seq profiling starting directly from RNA. Moreover, we describe the use of cap analysis gene expression (CAGE) for high-throughput mRNA detection and determination of transcription start sites (TSS) on the genome level. Here we applied “nanoCAGE", which uses template switching in the 5′ end selection step, to obtain CAGE data from very small amounts of RNA. We compare the sequencing data obtained by the three different library preparation methods and give directions for a bioinformatics pipeline used for their analysis. Examples are taken from our studies on the transcriptional regulation of gene expression during behavioral maturation of worker honey bees (Apis mellifera) to advise on transcriptome profiling strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CAGE:: Cap analysis gene expression
CPCC:: Cophenetic correlation coefficient
DEGs:: Differentially expressed genes
EST:: Expressed sequence tag
FPKM:: Fragments per kilobase per million reads
GO:: Gene ontology
MAPQ:: Mapping quality
RNA-Seq:: RNA sequencing
RPKM:: Reads per kilobase per million
TMM:: Trimmed mean of M-values
TPM:: Tags per million
TSSs:: Transcription start sites

References

Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. doi:10.1186/gb-2010-11-10-r106
Article CAS PubMed PubMed Central Google Scholar
Anders S, McCarthy DJ, Chen Y, Okoniewski M, Smyth GK, Huber W, Robinson MD (2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 8(9):1765–1786. doi:10.1038/nprot.2013.099
Article PubMed Google Scholar
Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. doi:10.1093/bioinformatics/btu638
Article PubMed PubMed Central Google Scholar
Auer PL, Doerge RW (2010) Statistical design and analysis of RNA sequencing data. Genetics 185(2):405–416. doi:10.1534/genetics.110.114983
Article CAS PubMed PubMed Central Google Scholar
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A (2013) NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 41(Database issue):D991–D995. doi:10.1093/nar/gks1193
Article CAS PubMed PubMed Central Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. doi:10.1093/bioinformatics/btu170
Article CAS PubMed PubMed Central Google Scholar
Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74. doi:10.1038/nature11247
Article Google Scholar
Consortium F, the RP, Clst, Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, Haberle V, Lassmann T, Kulakovskiy IV, Lizio M, Itoh M, Andersson R, Mungall CJ, Meehan TF, Schmeier S, Bertin N, Jorgensen M, Dimont E, Arner E, Schmidl C, Schaefer U, Medvedeva YA, Plessy C, Vitezic M, Severin J, Semple C, Ishizu Y, Young RS, Francescatto M, Alam I, Albanese D, Altschuler GM, Arakawa T, Archer JA, Arner P, Babina M, Rennie S, Balwierz PJ, Beckhouse AG, Pradhan-Bhatt S, Blake JA, Blumenthal A, Bodega B, Bonetti A, Briggs J, Brombacher F, Burroughs AM, Califano A, Cannistraci CV, Carbajo D, Chen Y, Chierici M, Ciani Y, Clevers HC, Dalla E, Davis CA, Detmar M, Diehl AD, Dohi T, Drablos F, Edge AS, Edinger M, Ekwall K, Endoh M, Enomoto H, Fagiolini M, Fairbairn L, Fang H, Farach-Carson MC, Faulkner GJ, Favorov AV, Fisher ME, Frith MC, Fujita R, Fukuda S, Furlanello C, Furino M, Furusawa J, Geijtenbeek TB, Gibson AP, Gingeras T, Goldowitz D, Gough J, Guhl S, Guler R, Gustincich S, Ha TJ, Hamaguchi M, Hara M, Harbers M, Harshbarger J, Hasegawa A, Hasegawa Y, Hashimoto T, Herlyn M, Hitchens KJ, Ho Sui SJ, Hofmann OM, Hoof I, Hori F, Huminiecki L, Iida K, Ikawa T, Jankovic BR, Jia H, Joshi A, Jurman G, Kaczkowski B, Kai C, Kaida K, Kaiho A, Kajiyama K, Kanamori-Katayama M, Kasianov AS, Kasukawa T, Katayama S, Kato S, Kawaguchi S, Kawamoto H, Kawamura YI, Kawashima T, Kempfle JS, Kenna TJ, Kere J, Khachigian LM, Kitamura T, Klinken SP, Knox AJ, Kojima M, Kojima S, Kondo N, Koseki H, Koyasu S, Krampitz S, Kubosaki A, Kwon AT, Laros JF, Lee W, Lennartsson A, Li K, Lilje B, Lipovich L, Mackay-Sim A, Manabe R, Mar JC, Marchand B, Mathelier A, Mejhert N, Meynert A, Mizuno Y, de Lima Morais DA, Morikawa H, Morimoto M, Moro K, Motakis E, Motohashi H, Mummery CL, Murata M, Nagao-Sato S, Nakachi Y, Nakahara F, Nakamura T, Nakamura Y, Nakazato K, van Nimwegen E, Ninomiya N, Nishiyori H, Noma S, Noma S, Noazaki T, Ogishima S, Ohkura N, Ohimiya H, Ohno H, Ohshima M, Okada-Hatakeyama M, Okazaki Y, Orlando V, Ovchinnikov DA, Pain A, Passier R, Patrikakis M, Persson H, Piazza S, Prendergast JG, Rackham OJ, Ramilowski JA, Rashid M, Ravasi T, Rizzu P, Roncador M, Roy S, Rye MB, Saijyo E, Sajantila A, Saka A, Sakaguchi S, Sakai M, Sato H, Savvi S, Saxena A, Schneider C, Schultes EA, Schulze-Tanzil GG, Schwegmann A, Sengstag T, Sheng G, Shimoji H, Shimoni Y, Shin JW, Simon C, Sugiyama D, Sugiyama T, Suzuki M, Suzuki N, Swoboda RK, ‘t Hoen PA, Tagami M, Takahashi N, Takai J, Tanaka H, Tatsukawa H, Tatum Z, Thompson M, Toyodo H, Toyoda T, Valen E, van de Wetering M, van den Berg LM, Verado R, Vijayan D, Vorontsov IE, Wasserman WW, Watanabe S, Wells CA, Winteringham LN, Wolvetang E, Wood EJ, Yamaguchi Y, Yamamoto M, Yoneda M, Yonekura Y, Yoshida S, Zabierowski SE, Zhang PG, Zhao X, Zucchelli S, Summers KM, Suzuki H, Daub CO, Kawai J, Heutink P, Hide W, Freeman TC, Lenhard B, Bajic VB, Taylor MS, Makeev VJ, Sandelin A, Hume DA, Carninci P, Hayashizaki Y (2014) A promoter-level mammalian expression atlas. Nature 507(7493):462–470. doi:10.1038/nature13182
Article Google Scholar
da Huang W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57. doi:10.1038/nprot.2008.211
Article CAS Google Scholar
de Klerk E, ‘t Hoen PA (2015) Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Genet 31(3):128–139. doi:10.1016/j.tig.2015.01.001
Article PubMed Google Scholar
de Klerk E, den Dunnen JT, ‘t Hoen PA (2014) RNA sequencing: from tag-based profiling to resolving complete transcript structure. Cell Mol Life Sci 71(18):3537–3551. doi:10.1007/s00018-014-1637-9
Article PubMed PubMed Central Google Scholar
Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloe D, Le Gall C, Schaeffer B, Le Crom S, Guedj M, Jaffrezic F, French StatOmique C (2013) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14(6):671–683. doi:10.1093/bib/bbs046
Article CAS PubMed Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. doi:10.1093/bioinformatics/bts635
Article CAS PubMed PubMed Central Google Scholar
Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:48. doi:10.1186/1471-2105-10-48
Article PubMed PubMed Central Google Scholar
Elsik CG, Worley KC, Bennett AK, Beye M, Camara F, Childers CP, de Graaf DC, Debyser G, Deng J, Devreese B, Elhaik E, Evans JD, Foster LJ, Graur D, Guigo R, HGSC production teams, Hoff KJ, Holder ME, Hudson ME, Hunt GJ, Jiang H, Joshi V, Khetani RS, Kosarev P, Kovar CL, Ma J, Maleszka R, Moritz RF, Munoz-Torres MC, Murphy TD, Muzny DM, Newsham IF, Reese JT, Robertson HM, Robinson GE, Rueppell O, Solovyev V, Stanke M, Stolle E, Tsuruda JM, Vaerenbergh MV, Waterhouse RM, Weaver DB, Whitfield CW, Wu Y, Zdobnov EM, Zhang L, Zhu D, Gibbs RA, Honey Bee Genome Sequencing C (2014) Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15:86. doi:10.1186/1471-2164-15-86
Article PubMed PubMed Central Google Scholar
Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin A (2008) A code for transcription initiation in mammalian genomes. Genome Res 18(1):1–12. doi:10.1101/gr.6831208
Article CAS PubMed PubMed Central Google Scholar
Harbers M (2008) The current status of cDNA cloning. Genomics 91(3):232–242. doi:10.1016/j.ygeno.2007.11.004
Article CAS PubMed Google Scholar
Huang S, Zhang J, Li R, Zhang W, He Z, Lam TW, Peng Z, Yiu SM (2011) SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front Genet 2:46. doi:10.3389/fgene.2011.00046
Article PubMed PubMed Central Google Scholar
Kawaji H, Lizio M, Itoh M, Kanamori-Katayama M, Kaiho A, Nishiyori-Sueki H, Shin JW, Kojima-Ishiyama M, Kawano M, Murata M, Ninomiya-Fukuda N, Ishikawa-Kato S, Nagao-Sato S, Noma S, Hayashizaki Y, Forrest AR, Carninci P, Consortium F (2014) Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing. Genome Res 24(4):708–717. doi:10.1101/gr.156232.113
Article CAS PubMed PubMed Central Google Scholar
Khamis AM, Hamilton AR, Medvedeva YA, Alam T, Alam I, Essack M, Umylny B, Jankovic BR, Naeger NL, Suzuki M, Harbers M, Robinson GE, Bajic VB (2015) Insights into the transcriptional architecture of behavioral plasticity in the honey bee Apis mellifera. Sci Rep 5:11136
Article CAS PubMed PubMed Central Google Scholar
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. doi:10.1186/gb-2013-14-4-r36
Article PubMed PubMed Central Google Scholar
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359. doi:10.1038/nmeth.1923
Article CAS PubMed PubMed Central Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. doi:10.1186/gb-2009-10-3-r25
Article PubMed PubMed Central Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. doi:10.1093/bioinformatics/btp324
Article CAS PubMed PubMed Central Google Scholar
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. doi:10.1093/bioinformatics/btt656
Article CAS PubMed Google Scholar
Liu Y, Zhou J, White KP (2014) RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30(3):301–304. doi:10.1093/bioinformatics/btt688
Article CAS PubMed PubMed Central Google Scholar
Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S, Mungall CJ, Arner E, Baillie JK, Bertin N, Bono H, de Hoon M, Diehl AD, Dimont E, Freeman TC, Fujieda K, Hide W, Kaliyaperumal R, Katayama T, Lassmann T, Meehan TF, Nishikata K, Ono H, Rehli M, Sandelin A, Schultes EA, ‘t Hoen PA, Tatum Z, Thompson M, Toyoda T, Wright DW, Daub CO, Itoh M, Carninci P, Hayashizaki Y, Forrest AR, Kawaji H, Consortium F (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:22. doi:10.1186/s13059-014-0560-6
Article CAS PubMed PubMed Central Google Scholar
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550. doi:10.1186/s13059-014-0550-8
Article PubMed PubMed Central Google Scholar
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17(1):10–12, doi: 10.14806/ej.17.1.200
Article Google Scholar
Murata M, Nishiyori-Sueki H, Kojima-Ishiyama M, Carninci P, Hayashizaki Y, Itoh M (2014) Detecting expressed genes using CAGE. Methods Mol Biol 1164:67–85. doi:10.1007/978-1-4939-0805-9_7
Article PubMed Google Scholar
Plessy C, Bertin N, Takahashi H, Simone R, Salimullah M, Lassmann T, Vitezic M, Severin J, Olivarius S, Lazarevic D, Hornig N, Orlando V, Bell I, Gao H, Dumais J, Kapranov P, Wang H, Davis CA, Gingeras TR, Kawai J, Daub CO, Hayashizaki Y, Gustincich S, Carninci P (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7(7):528–534. doi:10.1038/nmeth.1470
Article CAS PubMed PubMed Central Google Scholar
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. doi:10.1093/bioinformatics/btp616
Article CAS PubMed PubMed Central Google Scholar
Salimullah M, Sakai M, Plessy C, Carninci P (2011) NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harbor protocols 2011(1):pdb prot5559. doi: 10.1101/pdb.prot5559
Google Scholar
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP (2014) Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15(2):121–132. doi:10.1038/nrg3642
Article CAS PubMed Google Scholar
Supek F, Bosnjak M, Skunca N, Smuc T (2011) REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6(7):e21800. doi:10.1371/journal.pone.0021800
Article CAS PubMed PubMed Central Google Scholar
Takahashi H, Kato S, Murata M, Carninci P (2012a) CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol Biol 786:181–200. doi:10.1007/978-1-61779-292-2_11
Article CAS PubMed PubMed Central Google Scholar
Takahashi H, Lassmann T, Murata M, Carninci P (2012b) 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat Protoc 7(3):542–561. doi:10.1038/nprot.2012.005
Article CAS PubMed PubMed Central Google Scholar
Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2):178–192. doi:10.1093/bib/bbs017
Article CAS PubMed PubMed Central Google Scholar
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. doi:10.1038/nprot.2012.016
Article CAS PubMed PubMed Central Google Scholar
Wang L, Wang S, Li W (2012) RSeQC: quality control of RNA-seq experiments. Bioinformatics 28(16):2184–2185. doi:10.1093/bioinformatics/bts356
Article CAS PubMed Google Scholar
Yendrek CR, Ainsworth EA, Thimmapuram J (2012) The bench scientist’s guide to statistical analysis of RNA-Seq data. BMC Res Notes 5:506. doi:10.1186/1756-0500-5-506
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgment

We want to express our great thanks to Adam R. Hamilton, Yulia A. Medvedeva, Tanvir Alam, Intikhab Alam, Magbubah Essack, Boris Umylny, Boris R. Jankovic, Nicholas L. Naeger, Makoto Suzuki, and Gene E. Robinson for their great support for our honey bee project, which would have not been possible without working together with them. We further want to thank Charles Plessy and Piero Carninci for their support and encouragement for using CAGE.

Author information

Authors and Affiliations

Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
Abdullah M. Khamis Ph.D. & Vladimir B. Bajic Ph.D.
Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Matthias Harbers Ph.D.

Authors

Abdullah M. Khamis Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir B. Bajic Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Harbers Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Harbers Ph.D. .

Editor information

Editors and Affiliations

Genome Analysis Platform, CIC bioGUNE, Derio, Spain
Ana M. Aransay
Genome Analysis Platform, CIC bioGUNE, Derio, Spain
José Luis Lavín Trueba

Annex: Quick Reference Guide

Table QG4.1 Experimental design considerations

Full size table

Table QG4.2 Available software recommendations

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Khamis, A.M., Bajic, V.B., Harbers, M. (2016). Transcriptome Profiling Strategies. In: Aransay, A., Lavín Trueba, J. (eds) Field Guidelines for Genetic Experimental Designs in High-Throughput Sequencing. Springer, Cham. https://doi.org/10.1007/978-3-319-31350-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-31350-4_4
Published: 03 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31348-1
Online ISBN: 978-3-319-31350-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Transcriptome Profiling Strategies

Abstract

Access this chapter

Abbreviations

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Annex: Quick Reference Guide

Annex: Quick Reference Guide

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation