Abstract
Accurate recognition of motifs in biological sequences has become a central problem in computational biology. Though previous approaches have shown reasonable performances in detecting motifs having clear consensus, they are inapplicable to the recognition of weak motifs in noisy datasets, where only a fraction of the sequences may contain motif instances. This paper presents a graphical approach to deal with the real biological sequences, which are noisy in nature, and find potential weak motifs in the higher eukaryotic datasets. We examine our approach on synthetic datasets embedded with the degenerate motifs and show that it outperforms the earlier techniques. Moreover, the present approach is able to find the wet-lab proven motifs and other unreported significant consensus in real biological datasets.
Chapter PDF
Similar content being viewed by others
References
Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: 2nd ISMB, pp. 33–54 (1994)
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9(2), 225–242 (2002)
Chin, F., Leung, H., Yiu, S., Lam, T., Rosenfeld, R., Tsang, W., Smith, D., Jiang, Y.: Finding Motifs for Insufficient Number of Sequences with Strong Binding to Transcription Factor. In: RECOMB 2004, pp. 125–132 (2004)
Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(suppl. 1), S354–S363 (2002)
Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. (1998)
Hertz, G., Stormo G, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)
Hu, J., Li, B., Kihara, D.: Limitations and Potentials of Current Motif Discovery Algorithms. Nucleic Acids Res. 33(15), 4899–4913 (2005)
Jensen, K., Styczynski, M., Rigoutsos, I., Stephanopoulos, G.: A generic motif discovery algorithm for sequential data. Bioinformatics (in press, 2005)
Keich, U., Pevzner, P.A.: Finding motifs in the twilight zone. Bioinformatics 18(10), 1374–1381 (2002)
Latchman, S.: Eukaryotic Transcription Factors. Academic Press, London (2003)
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwland, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Liang, S., Samanta, M., Biegel, B.A.: cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Journal of Bioinformatics and Computational Biology 2(1), 47–60 (2004)
Liu, S., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90, 1157–1170 (1995)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. Intelligent Systems for Molecular Biology, 269–278 (2000)
Price, A., Ramabhadran, S., Pevzner, P.: Finding subtle motifs by branching from sample strings. Bioinformatics 19(2), II149-II155 (2003)
Rajasekaran, S., Balla, S., Huang, C.: Exact Algorithm for Planted Motif Challenge Problems. In: 3rd Asia-Pacific Bioinformatics Conference, pp. 249–259 (2003)
Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 344–354 (2000)
Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23(1), 137–144 (2005)
Yang, X., Rajapakse, J.: Graphical approach to weak motif recognition. Genome Informatics 15(2), 52–62 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ho, L.S., Rajapakse, J.C. (2006). Graphical Approach to Weak Motif Recognition in Noisy Data Sets. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds) Pattern Recognition in Bioinformatics. PRIB 2006. Lecture Notes in Computer Science(), vol 4146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11818564_4
Download citation
DOI: https://doi.org/10.1007/11818564_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37446-6
Online ISBN: 978-3-540-37447-3
eBook Packages: Computer ScienceComputer Science (R0)