New Generation Computing

, Volume 11, Issue 3–4, pp 361–375 | Cite as

A machine discovery from amino acid sequences by decision trees over regular patterns

  • Setsuo Arikawa
  • Satoru Miyano
  • Ayumi Shinohara
  • Satoru Kuhara
  • Yasuhito Mukouchi
  • Takeshi Shinohara
Special Issue Invited


This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.


Machine Discovery PAC-Learning Decision Tree Pattern Language Protein Structure Prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1).
    Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A. and Shinohara, T., “A Learning Algorithm for Elementary Formal Systems and Its Experiments on Identification of Transmembrane Domains,” inProc. 25th Hawaii Int. Conf. on Sys. Sci., pp. 675–684, IEEE, 1992.Google Scholar
  2. 2).
    Bairoch, A., “PROSITE: A Dictionary of Sites and Patterns in Proteins,”Nucleic Acids Res., 19, pp. 2241–2245, 1991.Google Scholar
  3. 3).
    Blumer, A., Ehrenfeucht, A., and Haussler, D. and Warmuth, M. K., “Learnability and the Vapnik-Chervonenkis Dimension,”JACM, 36, pp. 929–965, 1989.MATHCrossRefMathSciNetGoogle Scholar
  4. 4).
    Ehrenfeucht, A. and Haussler, D., “Learning Decision Trees from Random Examples,”Inform. Comput., 82, pp. 231–246, 1989.MATHCrossRefMathSciNetGoogle Scholar
  5. 5).
    Engelman, D. M., Steiz, T. A. and Goldman, A., “Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins,”Ann. Rev. Biophys. Biophys. Chem., 15, pp. 321–353, 1986.CrossRefGoogle Scholar
  6. 6).
    Gusev, V. and Chuzhanova, N., “The Algorithms for Recognition of the Functional Sites in Genetic Texts,” inProc. 1st Workshop on Algorithmic Learning Theory, Tokyo, pp. 109–119, 1990.Google Scholar
  7. 7).
    Hartmann, E., Rapoport, T. A. and Lodish, H. F., “Predicting the Orientation of Eukaryotic Membrane-Spanning Proteins,” inProc. Natl. Acad. Sci. U.S.A., 86, pp. 5786–5790, 1989.Google Scholar
  8. 8).
    Holly, L. H. and Karplus, M., “Protein Secondary Structure Prediction with a Neural Network,” inProc. Natl. Acad. Sci. USA, 86, pp. 152–156, 1989.Google Scholar
  9. 9).
    Kyte, J. and Doolittle, R. F., “A Simple Method for Displaying the Hydropathic Character of Protein,”J. Mol. Biol., 157, pp. 105–132, 1982.CrossRefGoogle Scholar
  10. 10).
    Lipp, J., Flint, N., Haeuptle, M. T. and Dobberstein, B., “Structural Requirements for Membrane Assembly of Proteins Spanning the Membrane Several Times,”J. Cell Biol., 109, pp. 2013–2022, 1989.CrossRefGoogle Scholar
  11. 11).
    Miyano, S., Shinohara, A. and Shinohara, T., “Which Classes of Elementary Formal Systems are Polynomial-Time Learnable?” inProc. 2nd Algorithmic Learning Theory, Tokyo, pp. 139–150, 1991.Google Scholar
  12. 12).
    Natarajan, B. K., “On Learning Sets and Functions,”Machine Learning, 4, pp. 67–97, 1989.Google Scholar
  13. 13).
    Protein Identification Resource, National Biomedical Research Foundation.Google Scholar
  14. 14).
    Quinlan, J. R., “Induction of Decision Trees,”Machine Learning, 1, pp. 81–106, 1986.Google Scholar
  15. 15).
    Quinlan, J. R. and Rivest, R. L., “Inferring Decision Trees using the Minimum Description Length Principle,”Inform. Comput., 80, pp. 227–248, 1989.MATHCrossRefMathSciNetGoogle Scholar
  16. 16).
    Rao, J. K. M. and Argos, P., “A Confirmational Preference Parameter to Predict Helices in Integral Membrane Proteins,”Biochim. Biophys. Acta, 869, pp. 197–214, 1986.Google Scholar
  17. 17).
    Shinohara, T., “Polynomial Time Inference of Pattern Languages and its Applications,” inProc. 7th IBM Symp. Mathematical Foundations of Computer Science, pp. 191–209, 1982.Google Scholar
  18. 18).
    Shinohara, T., “Polynomial Time Inference of Regular Pattern Languages,” inProc. RIMS Symp. Software Science and Engineering (Lecture Notes in Computer Science, 147), pp. 115–127, 1983.Google Scholar
  19. 19).
    Utgoff, P. E., “Incremental Induction of Decision Tree,”Machine Learning, 4, pp. 161–186, 1989.CrossRefGoogle Scholar
  20. 20).
    Valiant, L., “A Theory of the Learnable,”Commun. ACM, 27, pp. 1134–1142, 1984.MATHCrossRefGoogle Scholar
  21. 21).
    Von Heijine, G., “Transcending the Impenetrable: How Proteins Come to Terms with Membranes,”Biochim. Biophys. Acta, 947, pp. 307–333, 1988.Google Scholar
  22. 22).
    Wu C. H., Whiston, G. M. and Montllor, G. J., “PROCANS: A Protein Classification System Using a Neural Network,”IJCNN Int. Joint Conf. Neural Networks, 2, pp. 91–96, 1990.Google Scholar

Copyright information

© Ohmsha, Ltd. and Springer 1993

Authors and Affiliations

  • Setsuo Arikawa
    • 1
  • Satoru Miyano
    • 1
  • Ayumi Shinohara
    • 1
  • Satoru Kuhara
    • 2
  • Yasuhito Mukouchi
    • 3
  • Takeshi Shinohara
    • 4
  1. 1.Research Institute of Fundamental Information ScienceKyushu University 33FukuokaJapan
  2. 2.Graduate School of Genetic Resources TechnologyKyushu University 46FukuokaJapan
  3. 3.Department of Information SystemsKyushu University 39KasugaJapan
  4. 4.Department of Artificial IntelligenceKyushu Institute of TechnologyIizukaJapan

Personalised recommendations