Generalized Planted (l,d)-Motif Problem with Negative Set
Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze  defined the planted (l,d)-motif problem as trying to find a length-l pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.
KeywordsLocal Search Input Sequence Extra Information Find Motif Motif Problem
Unable to display preview. Download preview PDF.
- 1.Bailey, T., Charles Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)Google Scholar
- 4.Buhler, J., Tompa, M.: Finding motifs using random projections. Research in Computational Molecular Biology RECOMB 1, 69–76 (2001)Google Scholar
- 6.Chin, F., Leung, H., Yiu, S.M., Lam, T.W., Rosenfeld, R., Tsang, W.W., Smith, D., Jiang, Y.: Finding Motifs for Insufficient Number of Sequences with Strong Binding to Transcription Factor. Research in Computational Molecular Biology RECOMB 4, 125–132 (2004)Google Scholar
- 7.Chin, F., Leung, H., Yiu, S.M., Rosenfeld, R., Tsang, W.W.: Finding Motifs with Insufficient Number of Strong Binding Sites. Jour. Comp. Biol. (to appear)Google Scholar
- 11.Hertz, G.Z., Stormo, G.D.: Identification of consensus patterns in unaligned dna and protein sequences: a large-deviation statistical basis for penalizing gaps. International Conference on Bioinformatics and Genome Research 3, 201–216 (1995)Google Scholar
- 14.Leung, H., Chin, F.: Finding Exact Optimal Motif in Matrix Representation by Partitioning. In: European Conference on Computational Biology ECCB (2005) (to appear)Google Scholar
- 15.Liang, S.: cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Computer Society Bioinformatics Conference 2, 260–265 (2003)Google Scholar
- 18.Pevzner, P., Sze, S.H.: Combinatorial approaches to finding subtle signals in dna sequences. In: International Conference on Intelligent Systems for Molecular Biology vol. 8, pp. 269–278 (2000)Google Scholar