Bioinformatics pp 191-207 | Cite as

Sequence Based Gene Expression Analysis



Life sciences in the twentieth century made major strides in unraveling several basic biological phenomena applicable to all living systems such as deciphering the genetic code and defining the central dogma (replication, transcription and translation), through observation and simple experimentation. However, biological research in the twenty-first century is primarily driven by high precision instrumentation for exploring the complexity of biological systems in greater detail. Very large datasets are generated from these instruments that require efficient computational tools for data mining and analysis. The definition of the term “high-throughput” has had to be redefined at regular intervals because of the exponential growth in the volume of data generated with each technological advance. For addressing the needs of modeling, simulation and visualization of large and diverse biological datasets from sequence, gene expression and proteomics datasets, “systems biology” (Hood 2003) approaches are being developed for construction of gene regulatory networks (Dojer et al. 2006; Imoto et al. 2002; Xiong 2006; Xiong et al. 2004) and for identification of key control nodes.


Digital Gene Expression Massively Parallel Signature Sequencing Alternative Splice Site Digital Gene Expression Data LongSAGE Library 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H et al (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656CrossRefPubMedGoogle Scholar
  2. Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH et al (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377:3–174PubMedGoogle Scholar
  3. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410PubMedGoogle Scholar
  4. Ambros V (2001) microRNAs: tiny regulators with great potential. Cell 107:823–826CrossRefPubMedGoogle Scholar
  5. Bachellerie JP, Cavaille J, Huttenhofer A (2002) The expanding snoRNA world. Biochimie 84:775–790CrossRefPubMedGoogle Scholar
  6. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangel-ista C et al (2007) NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 35:D760–D765CrossRefPubMedGoogle Scholar
  7. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH et al (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816CrossRefPubMedGoogle Scholar
  8. Blackshaw S, Harpavat S, Trimarchi J, Cai L, Huang H, Kuo WP et al (2004) Genomic analysis of mouse retinal development. PLoS Biol 2:E247CrossRefPubMedGoogle Scholar
  9. Boguski MS, Schuler GD (1995) Establishing a human transcript map. Nat Genet 10:369–371CrossRefPubMedGoogle Scholar
  10. Boguski MS, Lowe TM, Tolstoshev CM (1993) dbEST – database for “expressed sequence tags”. Nat Genet 4:332–333CrossRefPubMedGoogle Scholar
  11. Calarco JA, Saltzman AL, Ip JY, Blencowe BJ (2007) Technologies for the global discovery and analysis of alternative splicing. Adv Exp Med Biol 623:64–84CrossRefPubMedGoogle Scholar
  12. Camargo AA, Samaia HP, Dias-Neto E, Simao DF, Migotto IA, Briones MR et al (2001) The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome. Proc Natl Acad Sci USA 98:12103–12108CrossRefPubMedGoogle Scholar
  13. Claude E, Shannon A, mathematical theory of communication. Bell System Technical Journal, 27:379–423 and 623–656, July and October 1948.
  14. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619CrossRefPubMedGoogle Scholar
  15. Damerval C, Maurice A, Josse JM, de Vienne D (1994) Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics 137:289–301PubMedGoogle Scholar
  16. De Bona F, Ossowski S, Schneeberger K, Ratsch G (2008) Optimal spliced alignments of short sequence reads. Bioinformatics 24:i174–i180CrossRefPubMedGoogle Scholar
  17. de Hoon, M, Hayashizaki, Y (2008) Deep cap analysis gene expression (CAGE): genome-wide identification of promoters, quantification of their expression, and network inference. Biotechniques 44:627–628, 630, 632Google Scholar
  18. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, and Salzberg SL (1999) Nucleic Acids Research 27:11, 2369–2376Google Scholar
  19. Delcher AL, Salzberg SL, Phillippy AM (2003) Using MUMmer to identify similar regions in large sequence sets. Current Protocols in Bioinformatics, Chapter 10:3Google Scholar
  20. Dinel S, Bolduc C, Belleau P, Boivin A, Yoshioka M, Calvo E et al (2005) Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome. Nucleic Acids Res 33:e26CrossRefPubMedGoogle Scholar
  21. Dojer N, Gambin A, Mizera A, Wilczynski B, Tiuryn J (2006) Applying dynamic Bayesian networks to perturbed gene expression data. BMC Bioinform 7:249CrossRefGoogle Scholar
  22. Gorodkin J, Cirera S, Hedegaard J, Gilchrist MJ, Panitz F, Jorgensen C et al (2007) Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1, 021, 891 expressed sequence tags. Genome Biol 8:R45CrossRefPubMedGoogle Scholar
  23. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952CrossRefPubMedGoogle Scholar
  24. Hene L, Sreenu VB, Vuong MT, Abidi SH, Sutton JK, Rowland-Jones SL et al (2007) Deep analysis of cellular transcriptomes – LongSAGE versus classic MPSS. BMC Genomics 8:333CrossRefPubMedGoogle Scholar
  25. Hood L (2003) Systems biology: integrating technology, biology, and computation. Mech Ageing Dev 124:9–16CrossRefPubMedGoogle Scholar
  26. Hou J, Charters AM, Lee SC, Zhao Y, Wu MK, Jones SJ et al (2007) A systematic screen for genes expressed in definitive endoderm by serial analysis of gene expression (SAGE). BMC Dev Biol 7:92CrossRefPubMedGoogle Scholar
  27. Iandolino A, Nobuta K, da Silva FG, Cook DR, Meyers BC (2008) Comparative expression profiling in grape (Vitis vinifera) berries derived from frequency analysis of ESTs and MPSS signatures. BMC Plant Biol 8:53CrossRefPubMedGoogle Scholar
  28. Imoto S, Goto T, Miyano S (2002) Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. Pac Symp Biocomput 175–186Google Scholar
  29. Jiang H, Wong WH (2008) SeqMap : mapping massive amount of oligonucleotides to the genome. Bioinformatics 24:2395–2396CrossRefPubMedGoogle Scholar
  30. Jongeneel CV, Delorenzi M, Iseli C, Zhou D, Haudenschild CD, Khrebtukova I et al (2005) An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res 15:1007–1014CrossRefPubMedGoogle Scholar
  31. Kawaji H, Kasukawa T, Fukuda S, Katayama S, Kai C, Kawai J et al (2006) CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis. Nucleic Acids Res 34:D632–D636CrossRefPubMedGoogle Scholar
  32. Kent WJ (2002) BLAT – the BLAST-like alignment tool. Genome Res 12:656–664PubMedGoogle Scholar
  33. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006PubMedGoogle Scholar
  34. Kim JB, Porreca GJ, Song L, Greenway SC, Gorham JM, Church GM et al (2007) Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic cardiomyopathy. Science 316:1481–1484CrossRefPubMedGoogle Scholar
  35. Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP et al (2006) Characterization of the piRNA complex from rat testes. Science 313:363–367CrossRefPubMedGoogle Scholar
  36. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858CrossRefPubMedGoogle Scholar
  37. Liu ET, Karuturi KR (2004) Microarrays and clinical investigations. N Engl J Med 350:1595–1597CrossRefPubMedGoogle Scholar
  38. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–1517CrossRefPubMedGoogle Scholar
  39. Meyers BC, Lee DK, Vu TH, Tej SS, Edberg SB, Matvienko M et al (2004a) Arabidopsis MPSS. An online resource for quantitative expression analysis. Plant Physiol 135:801–813CrossRefPubMedGoogle Scholar
  40. Meyers BC, Tej SS, Vu TH, Haudenschild CD, Agrawal V, Edberg SB et al (2004b) The use of MPSS for whole-genome transcriptional analysis in Arabidopsis. Genome Res 14:1641–1653CrossRefPubMedGoogle Scholar
  41. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T (2008) Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 45:81–94CrossRefPubMedGoogle Scholar
  42. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628CrossRefPubMedGoogle Scholar
  43. Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC (2006) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34:D731–D735CrossRefPubMedGoogle Scholar
  44. Pan Q, Shai O, Misquitta C, Zhang W, Saltzman AL, Mohammad N et al (2004) Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Mol Cell 16:929–941CrossRefPubMedGoogle Scholar
  45. Peiffer JA, Kaushik S, Sakai H, Arteaga-Vazquez M, Sanchez-Leon N, Ghazal H et al (2008) A spatial dissection of the Arabidopsis floral transcriptome by MPSS. BMC Plant Biol 8:43CrossRefPubMedGoogle Scholar
  46. Reimers M, Carey VJ (2006) Bioconductor: an open source framework for bioinformatics and computational biology. Methods Enzymol 411:119–134CrossRefPubMedGoogle Scholar
  47. Reinartz J, Bruyns E, Lin JZ, Burcham T, Brenner S, Bowen B et al (2002) Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms. Brief Funct Genomic Proteomic 1:95–104CrossRefPubMedGoogle Scholar
  48. Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23:2881–2887CrossRefPubMedGoogle Scholar
  49. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B et al (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20:508–512CrossRefPubMedGoogle Scholar
  50. Schug J, Schuller WP, Kappen C, Salbaum JM, Bucan M, Stoeckert CJ Jr (2005) Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol 6:R33CrossRefPubMedGoogle Scholar
  51. Shannon, C (1949) The Mathematical Theory of CommunicationGoogle Scholar
  52. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 100:15776–15781CrossRefPubMedGoogle Scholar
  53. Siddiqui AS, Delaney AD, Schnerch A, Griffith OL, Jones SJM, Marra MA (2006) Sequence biases in large scale gene expression profiling data. Nucleic Acids Res 34:e83CrossRefPubMedGoogle Scholar
  54. Silva AP, De Souza JE, Galante PA, Riggins GJ, de Souza SJ, Camargo AA (2004) The impact of SNPs on the interpretation of SAGE and MPSS experimental data. Nucleic Acids Res 32:6104–6110CrossRefPubMedGoogle Scholar
  55. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12:1599–1610CrossRefPubMedGoogle Scholar
  56. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315:848–853CrossRefPubMedGoogle Scholar
  57. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D et al (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067CrossRefPubMedGoogle Scholar
  58. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M et al (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–960CrossRefPubMedGoogle Scholar
  59. Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM (2004) SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 5:1CrossRefPubMedGoogle Scholar
  60. Torres TT, Metta M, Ottenwalder B, Schlotterer C (2008) Gene expression profiling by massively parallel sequencing. Genome Res 18:172–177CrossRefPubMedGoogle Scholar
  61. Velculescu VE, Kinzler KW (2007) Gene expression analysis goes digital. Nat Biotechnol 25:878–880CrossRefPubMedGoogle Scholar
  62. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270:484–487CrossRefPubMedGoogle Scholar
  63. Vencio RZ, Varuzza L, de BPC, Brentani H, Shmulevich I. (2007) Simcluster: clustering enumeration gene expression data on the simplex space. BMC Bioinform 8:246Google Scholar
  64. Wang M, Master SR, Chodosh LA (2006) Computational expression deconvolution in a complex mammalian organ. BMC Bioinform 7:328CrossRefGoogle Scholar
  65. Weber AP, Weber KL, Carr K, Wilkerson C, Ohlrogge JB (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 144:32–42CrossRefPubMedGoogle Scholar
  66. Wu C, Delano DL, Mitro N, Su SV, Janes J, McClurg P et al (2008a) Gene set enrichment in eQTL data identifies novel annotations and pathway regulators. PLoS Genet 4:e1000070Google Scholar
  67. Wu JQ, Du J, Rozowsky J, Zhang Z, Urban AE, Euskirchen G et al (2008a) Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome. Genome Biol 9:R3CrossRefPubMedGoogle Scholar
  68. Xiong H (2006) Non-linear tests for identifying differentially expressed genes or genetic networks. Bioinformatics 22:919–923CrossRefPubMedGoogle Scholar
  69. Xiong M, Li J, Fang X (2004) Identification of genetic networks. Genetics 166:1037–1052CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Bioinformatics and Computational BiologyGeorge Mason UniversityManassasUSA

Personalised recommendations