Skip to main content

Parameter Landscape Analysis for Common Motif Discovery Programs

  • Conference paper
Regulatory Genomics (RRG 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3318))

Included in the following conference series:

Abstract

The identification of regulatory elements as over-represented motifs in the promoters of potentially co-regulated genes is an important and challenging problem in computational biology. Although many motif detection programs have been developed so far, they still seem to be immature practically. In particular the choice of tunable parameters is often critical to success. Thus knowledge regarding which parameter settings are most appropriate for various types of target motifs is invaluable, but unfortunately has been scarce. In this paper, we report our parameter landscape analysis of two widely-used programs (the Gibbs Sampler (GS) and MEME). Our results show that GS is relatively sensitive to the changes of some parameter values while MEME is more stable. We present recommended parameter settings for GS optimized for four different motif lengths. Thus, running GS four times with these settings should significantly decrease the risk of overlooking subtle motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genomewide expression patterns. PNAS 95, 14863–14868 (1998)

    Article  Google Scholar 

  2. Stormo, G.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)

    Article  Google Scholar 

  3. Stormo, G., Hartzell, G.: Identifying protein-binding sites from unaligned DNA fragments. PNAS 86, 1183–1187 (1989)

    Article  Google Scholar 

  4. Lawrence, C., Reilly, A.: An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins 7, 41–51 (1990)

    Article  Google Scholar 

  5. Lawrence, C., Altschul, S., Boguski, M., Lui, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  6. Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  7. Frith, M., Hansen, U., Spouge, J., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucl. Acid Res. 32, 189–200 (2004)

    Article  Google Scholar 

  8. Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  9. Horton, P.: Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Alignment of DNA and Protein Sequences. Journal of Computational Biology 8, 249–282 (2001)

    Article  Google Scholar 

  10. Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical over-represenatation. Nucleic Acids Res. 30, 5549–5560 (2002)

    Article  Google Scholar 

  11. Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)

    Article  Google Scholar 

  12. Yada, T., Totoki, Y., Ishikawa, M., Asai, N.K.: Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences. Bioinformatics 14, 317–325 (1998)

    Article  Google Scholar 

  13. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 269–278 (2000)

    Google Scholar 

  14. Sze, S., Gelfand, M., Pevzner, P.: Finding weak motifs in DNA sequences. In: Proceedings of the Pacific Symposium of Biocomputing (PSB), vol. 7, pp. 235–246 (2002)

    Google Scholar 

  15. Poluliakh, N., Takagi, T., Nakai, K.: MELINA: motif extraction from the promoter regions of co-regulated genes. Bioinformatics 19, 423–424 (2003)

    Article  Google Scholar 

  16. Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000)

    Article  Google Scholar 

  17. Makita, Y., Nakao, M., Ogasawara, N., Nakai, K.: DBTBS: Database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res. 32, 75–77 (2004)

    Article  Google Scholar 

  18. n - An initial “guesstimate” of the total number of motifs in all of the sequences

    Google Scholar 

  19. c - The Plateau period is the number of iterations between successive local maxima. For each seed, the program samples until Plateau period iterations are performed without an increase in the MAP value (MAP value (maximum a posteriori) is a measure of the statistical significance of the motifs alignment comparing to a “null” alignment.)

    Google Scholar 

  20. t - The number of times the sampler restarts with different seeds

    Google Scholar 

  21. m - Maximum number of iterations per seed

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Polouliakh, N., Konno, M., Horton, P., Nakai, K. (2005). Parameter Landscape Analysis for Common Motif Discovery Programs. In: Eskin, E., Workman, C. (eds) Regulatory Genomics. RRG 2004. Lecture Notes in Computer Science(), vol 3318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32280-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32280-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24456-1

  • Online ISBN: 978-3-540-32280-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics