Estimating the Class Posterior Probabilities in Protein Secondary Structure Prediction

  • Yann Guermeur
  • Fabienne Thomarat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)

Abstract

Support vector machines, let them be bi-class or multi-class, have proved efficient for protein secondary structure prediction. They can be used either as sequence-to-structure classifier, structure-to-structure classifier, or both. Compared to the classifier most commonly found in the main prediction methods, the multi-layer perceptron, they exhibit one single drawback: their outputs are not class posterior probability estimates. This paper addresses the problem of post-processing the outputs of multi-class support vector machines used as sequence-to-structure classifiers with a structure-to-structure classifier estimating the class posterior probabilities. The aim of this comparative study is to obtain improved performance with respect to both criteria: prediction accuracy and quality of the estimates.

Keywords

protein secondary structure prediction multi-class support vector machines class membership probabilities 

References

  1. 1.
    Qian, N., Sejnowski, T.J.: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 202, 865–884 (1988)CrossRefGoogle Scholar
  2. 2.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)CrossRefGoogle Scholar
  3. 3.
    Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)CrossRefGoogle Scholar
  4. 4.
    Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36, W197–W201 (2008)CrossRefGoogle Scholar
  5. 5.
    Kountouris, P., Hirst, J.D.: Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 10, 437 (2009)CrossRefGoogle Scholar
  6. 6.
    Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)CrossRefMATHGoogle Scholar
  7. 7.
    Hua, S., Sun, Z.: A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. Journal of Molecular Biology 308, 397–407 (2001)CrossRefGoogle Scholar
  8. 8.
    Guermeur, Y.: Combining discriminant models with new multi-class SVMs. Pattern Analysis and Applications 5, 168–179 (2002)CrossRefMATHGoogle Scholar
  9. 9.
    Guermeur, Y., Pollastri, G., Elisseeff, A., Zelus, D., Paugam-Moisy, H., Baldi, P.: Combining protein secondary structure prediction models with ensemble methods of optimal complexity. Neurocomputing 56, 305–327 (2004)CrossRefGoogle Scholar
  10. 10.
    Nguyen, M.N., Rajapakse, J.C.: Two-stage multi-class support vector machines to protein secondary structure prediction. In: 10th Pacific Symposium on Biocomputing, pp. 346–357 (2005)Google Scholar
  11. 11.
    Richard, M.D., Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 3, 461–483 (1991)CrossRefGoogle Scholar
  12. 12.
    Rojas, R.: A short proof of the posterior probability property of classifier neural networks. Neural Computation 8, 41–43 (1996)CrossRefGoogle Scholar
  13. 13.
    Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159 (2005)CrossRefGoogle Scholar
  14. 14.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  15. 15.
    Guermeur, Y.: VC theory of large margin multi-category classifiers. Journal of Machine Learning Research 8, 2551–2594 (2007)MATHGoogle Scholar
  16. 16.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press, Cambridge (2002)Google Scholar
  17. 17.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefMATHGoogle Scholar
  18. 18.
    Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, Boston (2004)CrossRefMATHGoogle Scholar
  19. 19.
    Wahba, G.: Multivariate function and operator estimation, based on smoothing splines and reproducing kernels. In: Casdagli, M., Eubank, S. (eds.) Nonlinear Modeling and Forecasting, SFI Studies in the Sciences of Complexity, vol. XII, pp. 95–112. Addison-Wesley (1992)Google Scholar
  20. 20.
    Guermeur, Y.: A generic model of multi-class support vector machine. International Journal of Intelligent Information and Database Systems (accepted)Google Scholar
  21. 21.
    Weston, J., Watkins, C.: Multi-class support vector machines. Technical Report CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science (1998)Google Scholar
  22. 22.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2001)MATHGoogle Scholar
  23. 23.
    Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99, 67–81 (2004)CrossRefMATHGoogle Scholar
  24. 24.
    Guermeur, Y., Monfrini, E.: A quadratic loss multi-class SVM for which a radius-margin bound applies. Informatica 22, 73–96 (2011)MATHGoogle Scholar
  25. 25.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  26. 26.
    Guermeur, Y., Lifchitz, A., Vert, R.: A kernel for protein secondary structure prediction. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 193–206. The MIT Press, Cambridge (2004)Google Scholar
  27. 27.
    Lauer, F., Guermeur, Y.: MSVMpack: a multi-class support vector machine package. Journal of Machine Learning Research 12, 2293–2296 (2011)MATHGoogle Scholar
  28. 28.
    Platt, J.C.: Probabilities for SV machines. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–73. The MIT Press, Cambridge (2000)Google Scholar
  29. 29.
    Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, London (1989)MATHGoogle Scholar
  30. 30.
    Lin, H.-T., Lin, C.-J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68, 267–276 (2007)CrossRefGoogle Scholar
  31. 31.
    Guermeur, Y.: Combining multi-class SVMs with linear ensemble methods that estimate the class posterior probabilities. Communications in Statistics (submitted)Google Scholar
  32. 32.
    Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508–519 (1999)CrossRefGoogle Scholar
  33. 33.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)CrossRefGoogle Scholar
  34. 34.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)CrossRefGoogle Scholar
  35. 35.
    Riis, S.K., Krogh, A.: Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. Journal of Computational Biology 3, 163–183 (1996)CrossRefGoogle Scholar
  36. 36.
    Hastie, T., Tibshirani, R.: Classification by pairwise coupling. The Annals of Statistics 26, 451–471 (1998)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yann Guermeur
    • 1
  • Fabienne Thomarat
    • 1
  1. 1.LORIA – Equipe ABCVandœuvre-lès-NancyFrance

Personalised recommendations