Extracting best consensus motifs from positive and negative examples

Tateishi, Erika; Maruyama, Osamu; Miyano, Satoru

doi:10.1007/3-540-60922-9_19

Erika Tateishi¹,
Osamu Maruyama¹ &
Satoru Miyano²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1046))

Included in the following conference series:

Annual Symposium on Theoretical Aspects of Computer Science

168 Accesses
3 Citations

Abstract

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet Σ is a family Ω of subsets of Σ ^*. A motif π of type Ω is a string π=π ₁ ⋯ π _n of motif components, each of which stands for an element in Ω. The BCM problem for Ω is, given a yes-no sample S=(α ⁽¹⁾,β⁽¹⁾,..., (α^(m),β^(m))} of pairs of strings in Σ^* with α ⁽ⁱ⁾ ≠β⁽ⁱ⁾ for 1 ≤ i ≤ m, to find a motif π of type Ω that maximizes the number of good pairs in S, where (α ⁽ⁱ⁾, β ⁽ⁱ⁾) is good for π if π accepts α ⁽ⁱ⁾ and rejects β ⁽ⁱ⁾ We prove that the BCM problem is NP-complete even for a very simple type Ω ¹=2^∑ −{θ}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type Ω _∞=Ω₁∪ {Σ⁺}∪{Σ^[i,j]¦1≤i≤ j}, where Σ ^[i,j] is the set of strings over Σ of length between i and j Furthermore, for the BCM problem for Ω ₁ we provide a polynomial-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angluin, D., Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980) 46–62.
Article Google Scholar
Arikawa, S., Miyano, S., Shinohara, A., Kuhara, S., Mukouchi, Y., and Shinohara, T., A machine discovery from amino acid sequences by decision trees over regular patterns, New Generation Computing 11 (1993) 361–375.
Google Scholar
Bairoch, A., PROSITE: a dictionary of sites and patterns in proteins, Nucleic Acids Res. 19 (1991) 2241–2245.
PubMed Google Scholar
Garey, M.R., Johnson, D.S. and Stockmeyer, L., Some simplified NP-complete problems, Theoret. Comput. Sci. 1 (1976) 237–267.
Article Google Scholar
Gribskov, M. and Devereux, J., Sequence Analysis Primer, Stockholm Press, 1991.
Google Scholar
Helgesen, C. and Sibbald, P.R., PALM — A pattern language for molecular biology, Proc. First International Conference on Intelligent Systems for Morecular Biology, 1993, 172–180.
Google Scholar
Jiang, T. and Li, M., On the complexity of learning strings and sequences, Proc. 4th Workshop on Computational Learning Theory, 1991, 367–371.
Google Scholar
Miyano, S., Shinohara, A. and Shinohara, T., Which classes of elementary formal systems are polynomial-time learnable?, Proc. Second Workshop on Algorithmic Learning Theory, 1991, 139–150.
Google Scholar
Papadimitriou, C.H., Computational Complexity, Addison-Wesley, 1994.
Google Scholar
Quinlan, J.R., Induction on decision trees, Machine Learning 1 (1986) 81–106.
Google Scholar
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and Arikawa, S., Knowledge acquisition from amino acid sequences by machine learning system BONSAI, Transactions of Information Processing Society of Japan 35 (1994) 2009–2018.
Google Scholar
Shinohara, T., Polynomial time inference of extended regular pattern languages, Lecture Notes in Computer Science 147 (1983) 115–127.
Google Scholar
Shoudai, T., Lappe, M., Miyano, S., Shinohara, A., Okazaki, T., Arikawa, S., Uchida, T., Shimozono, S., Shinohara, T., and Kuhara, S., BONSAI Garden: parallel knowledge discovery system for amino acid sequences, Proc. Third International Conference on Intelligent Systems for Molecular Biology (AAAI Press), 1995, 359–366.
Google Scholar
Tateishi, E. and Miyano, S., A greedy strategy for finding motifs from positive and negative examples, to appear in Proc. First Pacific Symposium on Biocomputing, 1996.
Google Scholar
Yannakakis, M., On the approximation of maximum satisfiability, J. Algorithms 17 (1994) 475–502.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, Kyushu University 39, 816, Kasuga, Japan
Erika Tateishi & Osamu Maruyama
Research Institute of Fundamental Information Science, Kyushu University 33, 812, Fukuoka, Japan
Satoru Miyano

Authors

Erika Tateishi
View author publications
You can also search for this author in PubMed Google Scholar
Osamu Maruyama
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Miyano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erika Tateishi .

Editor information

Claude Puech Rüdiger Reischuk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tateishi, E., Maruyama, O., Miyano, S. (1996). Extracting best consensus motifs from positive and negative examples. In: Puech, C., Reischuk, R. (eds) STACS 96. STACS 1996. Lecture Notes in Computer Science, vol 1046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60922-9_19

Download citation

DOI: https://doi.org/10.1007/3-540-60922-9_19
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60922-3
Online ISBN: 978-3-540-49723-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics