Skip to main content

Build a Dictionary, Learn a Grammar, Decipher Stegoscripts, and Discover Genomic Regulatory Elements

  • Conference paper
Systems Biology and Regulatory Genomics (RSB 2005, RRG 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4023))

Included in the following conference series:

  • 546 Accesses

Abstract

It has been a challenge to discover transcription factor (TF) binding motifs (TFBMs), which are short cis-regulatory DNA sequences playing essential roles in transcriptional regulation. We approach the problem of discovering TFBMs from a steganographic perspective. We view the regulatory regions of a genome as if they constituted a stegoscript with conserved words (i.e., TFBMs) being embedded in a covertext, and model the stegoscript with a statistical model consisting of a dictionary and a grammar. We develop an efficient algorithm, WordSpy, to learn such a model from a stegoscript and to recover conserved motifs. Subsequently, we select biologically meaningful motifs based on a motif’s specificity to the set of genes of interest and/or the expression coherence of the genes whose promoters contain the motif. From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae, our method is able to identify all known cell-cycle related TFBMs among its top ranking motifs. Our method can also be directly applied to discriminative motif finding. By utilizing the ChIP-chip data of Lee et al., we predicted potential binding motifs of 113 known transcription factors of budding yeast.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lemon, B., Tjian, R.: Orchestrated response: A symphony of transcription factors for gene control. Genes Dev. 14(20), 2551–2569 (2000)

    Article  Google Scholar 

  2. Lawrence, C.E., Altschul, S.F., Bogouski, M.S., Liu, J.S., Neuwald, A.F., Wooten, J.C.: Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  3. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21(1-2), 51–80 (1995)

    Article  Google Scholar 

  4. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)

    Article  Google Scholar 

  5. Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Molecular Biology 296(5), 1205–1214 (2000)

    Article  Google Scholar 

  6. van Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Molecular Biology 281(5), 827–842 (1998)

    Article  Google Scholar 

  7. Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: 8th Intern. Conf. on Intelligent Systems for Molecular Biology (2000)

    Google Scholar 

  8. Zhang, M.Q.: Large scale gene expression data analysis: A new challenge to computational biologists. Genome Research 9(8), 681–688 (1999)

    Google Scholar 

  9. Segal, E., Yelensky, R., Koller, D.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19, 273–282 (2003)

    Article  Google Scholar 

  10. Tamada, Y., et al.: Estimating gene networks from gene expression data by combining bayesian network model with promoter element detection. Bioinformatics 19, 227–236 (2003)

    Article  Google Scholar 

  11. Wayner, P.: Disappearing Cryptography, 2nd edn. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  12. Bussemaker, H.J., Li, H., Siggia, E.D.: Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad. Sci. USA. 97(18), 10096–10100 (2002)

    Article  MathSciNet  Google Scholar 

  13. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)

    MATH  Google Scholar 

  14. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  15. Regnier, M.: unified approach to word statistics. In: RECOMB, pp. 207–213 (1998)

    Google Scholar 

  16. Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. J. Computational Biology 7(1-2), 1–46 (2000)

    Article  Google Scholar 

  17. Pilpel, Y., Sudarsanam, P., Church, G.M.: Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics 29(2), 153–159 (2001)

    Article  Google Scholar 

  18. Spellman, P.T., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9, 3273–3297 (1998)

    Google Scholar 

  19. Dohrmann, P., Voth, W., Stillman, D.: Role of negative regulation in promoter specificity of the homologous transcriptional activators ace2p and swi5p. Mol. Cell Biol. 16(4), 1746–1758 (1996)

    Google Scholar 

  20. Zhu, J., Zhang, M.Q.: SCPD: A Promoter Database of Yeast Saccharomyces cerevisiae. Bioinformatics 15, 607–611 (1999)

    Article  Google Scholar 

  21. Kato, M., Hata, N., Banerjee, N., Futcher, B., Zhang, M.Q.: Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biology 5, R56 (2004)

    Article  Google Scholar 

  22. Dolan, J.W., Kirkman, C., Fields, S.: The yeast STE12 protein binds to the DNA sequence mediating pheromone induction. Proc. Natl. Acad. Sci. USA 86(15), 5703–5707 (1989)

    Article  Google Scholar 

  23. Blaiseau, P.L., Thomas, D.: Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA. EMBO J. 17, 6327–6336 (1998)

    Article  Google Scholar 

  24. van Helden, J., Andre, B., Collado-Vides, J.: A web site for the computational analysis of yeast regulatory sequences. Yeast 16(2), 177–187 (2000)

    Article  Google Scholar 

  25. Stuart, J.M., Segal, E., Koller, D., Kim, S.K.: A gene coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)

    Article  Google Scholar 

  26. Koch, C., Moll, T., Neuberg, M., Ahorn, H., Nasmyth, K.: A role for the transcription factors Mbp1 and Swi4 in progression from G1 to S phas. Science 261, 1551–1557 (1993)

    Article  Google Scholar 

  27. Hollenhorst, P.C., Bose, M.E., Mielke, M.R., Müller, U., Fox, C.A.: Forkhead genes in transcriptional silencing, cell morphology and the cell cycle: Overlapping and distinct functions for FKH1 and FKH2 in Saccharomyces cerevisiae. Genetics 154, 1533–1548 (2000)

    Google Scholar 

  28. Lee, T.I., et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002)

    Article  Google Scholar 

  29. Gupta, M., Liu, J.: Discovery of conserved sequence patterns using a stochastic dictionary model. J. Amer. Statist. Assoc. 98, 55–66 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  30. Sinha, S., Nimwegen, E.V., Siggia, E.D.: A probabilistic method to detect regulatory modules. Bioinformatics 19, 292–301 (2003)

    Article  Google Scholar 

  31. Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423(6937), 241–254 (2003)

    Article  Google Scholar 

  32. Wasserman, W.W., Palumbo, M., Thompson, W., Fickett, J.W., Lawrence, C.E.: Human-mouse genome comparisons to locate regulatory sites. Nature Genetics 26(2), 225–228 (2000)

    Article  Google Scholar 

  33. Siggia, E.D.: Computational methods for transcriptional regulation. Cur. Opin. Gene. and Deve. 15, 214–221 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eleazar Eskin Trey Ideker Ben Raphael Christopher Workman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Wang, G., Zhang, W. (2007). Build a Dictionary, Learn a Grammar, Decipher Stegoscripts, and Discover Genomic Regulatory Elements. In: Eskin, E., Ideker, T., Raphael, B., Workman, C. (eds) Systems Biology and Regulatory Genomics. RSB RRG 2005 2005. Lecture Notes in Computer Science(), vol 4023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48540-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-48540-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-48293-2

  • Online ISBN: 978-3-540-48540-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics