Parameter Landscape Analysis for Common Motif Discovery Programs

Polouliakh, Natalia; Konno, Michiko; Horton, Paul; Nakai, Kenta

doi:10.1007/978-3-540-32280-1_8

Natalia Polouliakh^21,22,
Michiko Konno²²,
Paul Horton²³ &
…
Kenta Nakai²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3318))

Included in the following conference series:

RECOMB Workshop on Regulatory Genomics

Abstract

The identification of regulatory elements as over-represented motifs in the promoters of potentially co-regulated genes is an important and challenging problem in computational biology. Although many motif detection programs have been developed so far, they still seem to be immature practically. In particular the choice of tunable parameters is often critical to success. Thus knowledge regarding which parameter settings are most appropriate for various types of target motifs is invaluable, but unfortunately has been scarce. In this paper, we report our parameter landscape analysis of two widely-used programs (the Gibbs Sampler (GS) and MEME). Our results show that GS is relatively sensitive to the changes of some parameter values while MEME is more stable. We present recommended parameter settings for GS optimized for four different motif lengths. Thus, running GS four times with these settings should significantly decrease the risk of overlooking subtle motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genomewide expression patterns. PNAS 95, 14863–14868 (1998)
Article Google Scholar
Stormo, G.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Article Google Scholar
Stormo, G., Hartzell, G.: Identifying protein-binding sites from unaligned DNA fragments. PNAS 86, 1183–1187 (1989)
Article Google Scholar
Lawrence, C., Reilly, A.: An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins 7, 41–51 (1990)
Article Google Scholar
Lawrence, C., Altschul, S., Boguski, M., Lui, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers. Machine Learning 21, 51–80 (1995)
Google Scholar
Frith, M., Hansen, U., Spouge, J., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucl. Acid Res. 32, 189–200 (2004)
Article Google Scholar
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Article Google Scholar
Horton, P.: Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Alignment of DNA and Protein Sequences. Journal of Computational Biology 8, 249–282 (2001)
Article Google Scholar
Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical over-represenatation. Nucleic Acids Res. 30, 5549–5560 (2002)
Article Google Scholar
Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)
Article Google Scholar
Yada, T., Totoki, Y., Ishikawa, M., Asai, N.K.: Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences. Bioinformatics 14, 317–325 (1998)
Article Google Scholar
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 269–278 (2000)
Google Scholar
Sze, S., Gelfand, M., Pevzner, P.: Finding weak motifs in DNA sequences. In: Proceedings of the Pacific Symposium of Biocomputing (PSB), vol. 7, pp. 235–246 (2002)
Google Scholar
Poluliakh, N., Takagi, T., Nakai, K.: MELINA: motif extraction from the promoter regions of co-regulated genes. Bioinformatics 19, 423–424 (2003)
Article Google Scholar
Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000)
Article Google Scholar
Makita, Y., Nakao, M., Ogasawara, N., Nakai, K.: DBTBS: Database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res. 32, 75–77 (2004)
Article Google Scholar
n - An initial “guesstimate” of the total number of motifs in all of the sequences
Google Scholar
c - The Plateau period is the number of iterations between successive local maxima. For each seed, the program samples until Plateau period iterations are performed without an increase in the MAP value (MAP value (maximum a posteriori) is a measure of the statistical significance of the motifs alignment comparing to a “null” alignment.)
Google Scholar
t - The number of times the sampler restarts with different seeds
Google Scholar
m - Maximum number of iterations per seed
Google Scholar

Download references

Author information

Authors and Affiliations

Human Genome Center, University of Tokyo, 4-6-1 Shirokanedai, Tokyo, Japan
Natalia Polouliakh & Kenta Nakai
Graduate School of Humanity and Science, Ochanomizu University, 2-1-1 Otsuka, Tokyo, Japan
Natalia Polouliakh & Michiko Konno
National Institute of Advanced Industrial Science and Technology, 2-43 Aomi, Tokyo, Japan
Paul Horton

Authors

Natalia Polouliakh
View author publications
You can also search for this author in PubMed Google Scholar
Michiko Konno
View author publications
You can also search for this author in PubMed Google Scholar
Paul Horton
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Nakai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Human Genetics, University of California Los Angeles, 90095, Los Angeles, CA
Eleazar Eskin
Department of Bioengineering, University of California, San Diego, CA, USA
Christopher Workman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Polouliakh, N., Konno, M., Horton, P., Nakai, K. (2005). Parameter Landscape Analysis for Common Motif Discovery Programs. In: Eskin, E., Workman, C. (eds) Regulatory Genomics. RRG 2004. Lecture Notes in Computer Science(), vol 3318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32280-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-32280-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24456-1
Online ISBN: 978-3-540-32280-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics