Understanding Protein Structure Prediction Using SVM_DT

  • Jieyue He
  • Hae-Jin Hu
  • Robert Harrison
  • Phang C. Tai
  • Yisheng Dong
  • Yi Pan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3759)


The explanation of a decision made is important for the acceptance of machine learning technology, especially for such applications as bioinformatics. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, it is a black box model. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for understanding protein secondary structure prediction by integrating merits of both support vector machine and decision tree is presented. This approach combines SVM with decision tree into a new algorithm called SVM_DT. The results of the experiments of protein secondary structure prediction on RS126 data sets show that the comprehensibility of SVM_DT is much better than that of SVM. Moreover, the generalization ability of SVM_DT is better than that of decision tree and is similar to that of SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments.


Support Vector Machine Decision Tree Structure Prediction Binary Classifier Protein Secondary Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sikder, A.R., Zomaya, A.Y.: An overview of protein-folding techniques: issues and perspectives. Int. J. Bioinformatics Research and Applications 1(1), 121–143 (2005)CrossRefGoogle Scholar
  2. 2.
    Barakat, N., Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. In: The third Conference on Neuro-Computing and Evolving Intelligence, NCEI 2004 (2004)Google Scholar
  3. 3.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  4. 4.
    Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002)Google Scholar
  5. 5.
    Chandonia, J.M., Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins 35, 293–306 (1999)CrossRefGoogle Scholar
  6. 6.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 237–297 (1995)Google Scholar
  7. 7.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  8. 8.
    Gorgevik, D., Cakmakov, D., Radevski, V.: Handwritten Digit Recognition Using Statistical and Rule-Based Decision Fusion. IEEE MELECON 2002, May 7-9 (2002)Google Scholar
  9. 9.
    Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)CrossRefGoogle Scholar
  10. 10.
    Hu, H., Pan, Y., Harrison, R., Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience 3(4), 265–271 (2004)CrossRefGoogle Scholar
  11. 11.
    Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308, 397–407 (2001)CrossRefGoogle Scholar
  12. 12.
    Joachims, T.: SVMlight (2002),
  13. 13.
    Kim, H., Park, H.: Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach (2002)Google Scholar
  14. 14.
    Lim, T.S., Loh, W.Y., Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty_Tree Old and New Classification Algorithm. Machine Learning 40(3), 203–228 (2000)zbMATHCrossRefGoogle Scholar
  15. 15.
    Lin, S., Patel, S., Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. In: Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics (2003)Google Scholar
  16. 16.
    Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997)Google Scholar
  17. 17.
    Mitsdorffer, R., Diederich, J., Tan, C.: Rule-extraction from Technology IPOs in the US Stock Market. In: ICONIP 2002, Singapore (2002)Google Scholar
  18. 18.
    Noble, W.S.: Kernel Methods in Computational Biology. In: Schoelkopf, B., Tsuda, K., Vert, J.-P. (eds.), pp. 71–92. MIT Press, Cambridge (2004)Google Scholar
  19. 19.
    Núñez, H., Angulo, C., Catala, A.: Rule-extraction from Support Vector Machines. In: The European Symposium on Artifical Neural Networks, Burges, pp.107-112 (2002), ISBN 2-930307-02-1Google Scholar
  20. 20.
    Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. J. Artificial Intelligence Research 4, 77–90 (1996)zbMATHGoogle Scholar
  21. 21.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  22. 22.
    Rost, B., Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. 232, 584–599 (1993)CrossRefGoogle Scholar
  23. 23.
    Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, Inc., New York (1998)zbMATHGoogle Scholar
  24. 24.
    Yang, Z.R., Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5) (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Jieyue He
    • 1
    • 2
  • Hae-Jin Hu
    • 2
  • Robert Harrison
    • 2
    • 3
    • 4
  • Phang C. Tai
    • 3
  • Yisheng Dong
    • 1
  • Yi Pan
    • 2
  1. 1.Department of Computer ScienceSoutheast UniversityNanjingChina
  2. 2.Department of Computer Science 
  3. 3.Department of BiologyGeorgia State UniversityAtlantaUSA
  4. 4.GCC Distinguished Cancer Scholar 

Personalised recommendations