Advertisement

Generalized Planted (l,d)-Motif Problem with Negative Set

  • Henry C. M. Leung
  • Francis Y. L. Chin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)

Abstract

Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)-motif problem as trying to find a length-l pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.

Keywords

Local Search Input Sequence Extra Information Find Motif Motif Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bailey, T., Charles Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)Google Scholar
  2. 2.
    Barash, Y., Bejerano, G., Friedman, N.: A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites. Workshop on Algorithms in Bioinformatics WABI 1, 278–293 (2001)CrossRefGoogle Scholar
  3. 3.
    Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Jour. Comp. Biol. 5, 279–305 (1998)CrossRefGoogle Scholar
  4. 4.
    Buhler, J., Tompa, M.: Finding motifs using random projections. Research in Computational Molecular Biology RECOMB 1, 69–76 (2001)Google Scholar
  5. 5.
    Chin, F., Leung, H.: Voting Algorithms for Discovering Long Motifs. Asia-Pacific Bioinformatics Conference APBC 3, 261–271 (2005)CrossRefGoogle Scholar
  6. 6.
    Chin, F., Leung, H., Yiu, S.M., Lam, T.W., Rosenfeld, R., Tsang, W.W., Smith, D., Jiang, Y.: Finding Motifs for Insufficient Number of Sequences with Strong Binding to Transcription Factor. Research in Computational Molecular Biology RECOMB 4, 125–132 (2004)Google Scholar
  7. 7.
    Chin, F., Leung, H., Yiu, S.M., Rosenfeld, R., Tsang, W.W.: Finding Motifs with Insufficient Number of Strong Binding Sites. Jour. Comp. Biol. (to appear)Google Scholar
  8. 8.
    Fraenkel, Y., Mandel, Y., Friedberg, D., Margalit, H.: Identification of common motifs in unaligned dna sequences: application to Escherichia coli Lrp regulon. Bioinformatics 11, 379–387 (1995)CrossRefGoogle Scholar
  9. 9.
    Gelfand, M., Koonin, E., Mironov, A.: Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl. Acids Res. 28, 695–705 (2000)CrossRefGoogle Scholar
  10. 10.
    van Helden, J., Andre, B., Vides, J.C.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281(5), 827–842 (1998)CrossRefGoogle Scholar
  11. 11.
    Hertz, G.Z., Stormo, G.D.: Identification of consensus patterns in unaligned dna and protein sequences: a large-deviation statistical basis for penalizing gaps. International Conference on Bioinformatics and Genome Research 3, 201–216 (1995)Google Scholar
  12. 12.
    Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtule sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar
  13. 13.
    Lawrence, C., Reilly, A.: An expectation maximization (em) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function and Genetics 7, 41–51 (1990)CrossRefGoogle Scholar
  14. 14.
    Leung, H., Chin, F.: Finding Exact Optimal Motif in Matrix Representation by Partitioning. In: European Conference on Computational Biology ECCB (2005) (to appear)Google Scholar
  15. 15.
    Liang, S.: cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Computer Society Bioinformatics Conference 2, 260–265 (2003)Google Scholar
  16. 16.
    Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Jour. Comp. Biol. 7(3-4), 345–362 (2000)CrossRefGoogle Scholar
  17. 17.
    Pesole, G., Prunella, N., Liuni, S., Attimonelli, M., Saccone, C.: Wordup: an efficient algorithm for discovering statistically significant patterns in dna sequences. Nucl. Acids. Res. 20(11), 2871–2875 (1992)CrossRefGoogle Scholar
  18. 18.
    Pevzner, P., Sze, S.H.: Combinatorial approaches to finding subtle signals in dna sequences. In: International Conference on Intelligent Systems for Molecular Biology vol. 8, pp. 269–278 (2000)Google Scholar
  19. 19.
    Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  20. 20.
    Sinha, S.: Discriminative motifs. Jour. Comp. Biol. 10, 599–616 (2003)CrossRefGoogle Scholar
  21. 21.
    Zhu, J., Zhang, M.: SCPD: a promoter database of the yeast Saccha-romyces cerevisiae. Bioinformatics 15, 563–577 (1999), http://cgsigma.cshl.org/jian/ CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Henry C. M. Leung
    • 1
  • Francis Y. L. Chin
    • 1
  1. 1.Department of Computer ScienceThe University of Hong KongHong Kong

Personalised recommendations