Abstract
The identification of regulatory elements as over-represented motifs in the promoters of potentially co-regulated genes is an important and challenging problem in computational biology. Although many motif detection programs have been developed so far, they still seem to be immature practically. In particular the choice of tunable parameters is often critical to success. Thus knowledge regarding which parameter settings are most appropriate for various types of target motifs is invaluable, but unfortunately has been scarce. In this paper, we report our parameter landscape analysis of two widely-used programs (the Gibbs Sampler (GS) and MEME). Our results show that GS is relatively sensitive to the changes of some parameter values while MEME is more stable. We present recommended parameter settings for GS optimized for four different motif lengths. Thus, running GS four times with these settings should significantly decrease the risk of overlooking subtle motifs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genomewide expression patterns. PNAS 95, 14863–14868 (1998)
Stormo, G.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Stormo, G., Hartzell, G.: Identifying protein-binding sites from unaligned DNA fragments. PNAS 86, 1183–1187 (1989)
Lawrence, C., Reilly, A.: An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins 7, 41–51 (1990)
Lawrence, C., Altschul, S., Boguski, M., Lui, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers. Machine Learning 21, 51–80 (1995)
Frith, M., Hansen, U., Spouge, J., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucl. Acid Res. 32, 189–200 (2004)
Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Horton, P.: Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Alignment of DNA and Protein Sequences. Journal of Computational Biology 8, 249–282 (2001)
Sinha, S., Tompa, M.: Discovery of novel transcription factor binding sites by statistical over-represenatation. Nucleic Acids Res. 30, 5549–5560 (2002)
Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002)
Yada, T., Totoki, Y., Ishikawa, M., Asai, N.K.: Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences. Bioinformatics 14, 317–325 (1998)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 269–278 (2000)
Sze, S., Gelfand, M., Pevzner, P.: Finding weak motifs in DNA sequences. In: Proceedings of the Pacific Symposium of Biocomputing (PSB), vol. 7, pp. 235–246 (2002)
Poluliakh, N., Takagi, T., Nakai, K.: MELINA: motif extraction from the promoter regions of co-regulated genes. Bioinformatics 19, 423–424 (2003)
Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000)
Makita, Y., Nakao, M., Ogasawara, N., Nakai, K.: DBTBS: Database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res. 32, 75–77 (2004)
n - An initial “guesstimate” of the total number of motifs in all of the sequences
c - The Plateau period is the number of iterations between successive local maxima. For each seed, the program samples until Plateau period iterations are performed without an increase in the MAP value (MAP value (maximum a posteriori) is a measure of the statistical significance of the motifs alignment comparing to a “null” alignment.)
t - The number of times the sampler restarts with different seeds
m - Maximum number of iterations per seed
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Polouliakh, N., Konno, M., Horton, P., Nakai, K. (2005). Parameter Landscape Analysis for Common Motif Discovery Programs. In: Eskin, E., Workman, C. (eds) Regulatory Genomics. RRG 2004. Lecture Notes in Computer Science(), vol 3318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32280-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-32280-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24456-1
Online ISBN: 978-3-540-32280-1
eBook Packages: Computer ScienceComputer Science (R0)