Skip to main content

Extracting best consensus motifs from positive and negative examples

  • Conference paper
  • First Online:
STACS 96 (STACS 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1046))

Included in the following conference series:

Abstract

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ *. A motif π of type Ω is a string π=π 1π n of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S=(α (1)(1),..., (α(m)(m))} of pairs of strings in Σ* with α (i) ≠β(i) for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α (i), β (i)) is good for π if π accepts α (i) and rejects β (i) We prove that the BCM problem is NP-complete even for a very simple type Ω 1=2 −{θ}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω 1∪ {Σ+}∪{Σ[i,j]¦1≤i≤ j}, where Σ [i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω 1 we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angluin, D., Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980) 46–62.

    Article  Google Scholar 

  2. Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., and Shinohara, T., A machine discovery from amino acid sequences by decision trees over regular patterns, New Generation Computing 11 (1993) 361–375.

    Google Scholar 

  3. Bairoch, A., PROSITE: a dictionary of sites and patterns in proteins, Nucleic Acids Res. 19 (1991) 2241–2245.

    PubMed  Google Scholar 

  4. Garey, M.R., Johnson, D.S. and Stockmeyer, L., Some simplified NP-complete problems, Theoret. Comput. Sci. 1 (1976) 237–267.

    Article  Google Scholar 

  5. Gribskov, M. and Devereux, J., Sequence Analysis Primer, Stockholm Press, 1991.

    Google Scholar 

  6. Helgesen, C. and Sibbald, P.R., PALM — A pattern language for molecular biology, Proc. First International Conference on Intelligent Systems for Morecular Biology, 1993, 172–180.

    Google Scholar 

  7. Jiang, T. and Li, M., On the complexity of learning strings and sequences, Proc. 4th Workshop on Computational Learning Theory, 1991, 367–371.

    Google Scholar 

  8. Miyano, S., Shinohara, A. and Shinohara, T., Which classes of elementary formal systems are polynomial-time learnable?, Proc. Second Workshop on Algorithmic Learning Theory, 1991, 139–150.

    Google Scholar 

  9. Papadimitriou, C.H., Computational Complexity, Addison-Wesley, 1994.

    Google Scholar 

  10. Quinlan, J.R., Induction on decision trees, Machine Learning 1 (1986) 81–106.

    Google Scholar 

  11. Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and Arikawa, S., Knowledge acquisition from amino acid sequences by machine learning system BONSAI, Transactions of Information Processing Society of Japan 35 (1994) 2009–2018.

    Google Scholar 

  12. Shinohara, T., Polynomial time inference of extended regular pattern languages, Lecture Notes in Computer Science 147 (1983) 115–127.

    Google Scholar 

  13. Shoudai, T., Lappe, M., Miyano, S., Shinohara, A., Okazaki, T., Arikawa, S., Uchida, T., Shimozono, S., Shinohara, T., and Kuhara, S., BONSAI Garden: parallel knowledge discovery system for amino acid sequences, Proc. Third International Conference on Intelligent Systems for Molecular Biology (AAAI Press), 1995, 359–366.

    Google Scholar 

  14. Tateishi, E. and Miyano, S., A greedy strategy for finding motifs from positive and negative examples, to appear in Proc. First Pacific Symposium on Biocomputing, 1996.

    Google Scholar 

  15. Yannakakis, M., On the approximation of maximum satisfiability, J. Algorithms 17 (1994) 475–502.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erika Tateishi .

Editor information

Claude Puech Rüdiger Reischuk

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tateishi, E., Maruyama, O., Miyano, S. (1996). Extracting best consensus motifs from positive and negative examples. In: Puech, C., Reischuk, R. (eds) STACS 96. STACS 1996. Lecture Notes in Computer Science, vol 1046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60922-9_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-60922-9_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60922-3

  • Online ISBN: 978-3-540-49723-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics