Protein secondary structure prediction based on stochastic-rule learning

Mamitsuka, Hiroshi; Yamanishi, Kenji

doi:10.1007/3-540-57369-0_43

Protein secondary structure prediction based on stochastic-rule learning

Hiroshi Mamitsuka¹ &
Kenji Yamanishi¹

Technical Papers
Conference paper
First Online: 01 January 2005

125 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 743))

Abstract

This paper proposes a new strategy for predicting α-helix regions for any given protein sequence, on the basis of the theory of learning stochastic rules. We confine our study to the problem of predicting where α-helix regions are located in a given protein sequence, rather than the conventional three-state prediction problem, i.e., that of predicting to which among the three-states (α-helix, β-sheet, or coil) each of the amino acids in the sequence corresponds.

Our strategy consists of three steps: generation of training examples, learning, and prediction.

In the learning phase, we construct a rule for secondary-structure prediction from training examples. Here a rule is represented not as a deterministic rule but as a stochastic rule, i.e., a probability distribution which assigns, to each region in a sequence, a probability that it corresponds to α-helix. Each stochastic rule used here is further represented as the product of a number of stochastic rules with finite partitioning developed by Yamanishi. Optimal stochastic rules with finite partitioning are obtained from training examples by Laplace estimation of real-valued parameters and by model selection based on the minimum description length (MDL) principle. We allow our stochastic rules to make use of not only the characters themselves of amino acids but also their physico-chemical properties (i.e., numerical attributes, e.g. hydrophobicity, molecular weight, etc).

In the prediction phase, when given a test sequence, the likelihood that any given region (i.e., any subsequence of amino acids) in the test sequence corresponds to α-helix is calculated with the stochastic rules constructed in the learning phase.

We evaluate the predictive performance of our strategy from experimental viewpoints. In generating training examples, examples of α-helix regions are drawn from hemoglobin sequences alone. Experimental results show that the prediction accuracy rate of our prediction strategy was 94.8% for hemoglobin α- chain (1HBSα), 68.5% for parvalbumin β (1CDP), and 73.6% for lysozyme c (1LYM), a significant rate over the rate achieved with the Garnier-Osguthorpe-Robson's (GOR) method.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

P.Y. Chou and G.D. Fasman. Conformational parameters for amino adds in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry, 13(2):211–221, 1974.
PubMed Google Scholar
P.Y. Chou and G.D. Fasman. Prediction of protein conformation. Biochemistry, 13(2):222–245, 1974.
PubMed Google Scholar
J. Garnier, D.J. Osguthorpe, and B. Robson. Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins. J.Mol.Biol., 120:97–120, 1978.
PubMed Google Scholar
J.F. Gibrat, J. Garnier, and B. Robson. Further developments of protein secondary structure prediction using information theory. J.Mol.Biol., 198:425–443, 1987.
PubMed Google Scholar
N.Qian and T.J.Sejnowski. Predicting the secondary structure of globular proteins using neural network models. J.Mol.Biol., 202:865–884, 1988.
PubMed Google Scholar
H. Bohr, J. Bohr, S. Brunek, M.J.R. Cotterill, B. Lautrup, L. Norskov, H.O. Olsen, and B.S.Petersen. Protein secondary structure and homology by neural networks. FEBS Letters, 241(1,2):223–228, 1988.
PubMed Google Scholar
R.D. King and M.J.E. Sternberg. Machine learning approach for the prediction of protein secondary structure. J.Mol.Biol., 216:441–457, 1990.
PubMed Google Scholar
K. Yamanishi. A learning criterion for stochastic rules. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 67–81. Morgan Kaufmann, 1990. To appear in Machine Learning.
Google Scholar
J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.
Article Google Scholar
J. Rissanen. Stochastic complexity in statistical inquiry, volume 15 of Comp. Sci. 1989. World Scientific.
Google Scholar
F. Schreiber. The bayes laplace statistics of the multinomial distributions. AEU, 39(5):293–298, 1985.
Google Scholar
J.L. Fauchere and V. Pliska. Hydrophobic parameters of amino acid side chains from the partitioning of N-acetyl-amino acid amides. Eur.J.Med.Chem.Chim.Ther., 18:369–375, 1983.
Google Scholar

Download references

Author information

Authors and Affiliations

C&C Information Technology Research Labs., NEC Corporation, 1-1, Miyazaki 4-chome, Miyamaeku, Kawasaki, 216, Kanagawa, Japan
Hiroshi Mamitsuka & Kenji Yamanishi

Authors

Hiroshi Mamitsuka
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Shuji Doshita Koichi Furukawa Klaus P. Jantke Toyaki Nishida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamitsuka, H., Yamanishi, K. (1993). Protein secondary structure prediction based on stochastic-rule learning. In: Doshita, S., Furukawa, K., Jantke, K.P., Nishida, T. (eds) Algorithmic Learning Theory. ALT 1992. Lecture Notes in Computer Science, vol 743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57369-0_43

Download citation

DOI: https://doi.org/10.1007/3-540-57369-0_43
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57369-2
Online ISBN: 978-3-540-48093-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics