Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3759))

Abstract

The explanation of a decision made is important for the acceptance of machine learning technology, especially for such applications as bioinformatics. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, it is a black box model. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for understanding protein secondary structure prediction by integrating merits of both support vector machine and decision tree is presented. This approach combines SVM with decision tree into a new algorithm called SVM_DT. The results of the experiments of protein secondary structure prediction on RS126 data sets show that the comprehensibility of SVM_DT is much better than that of SVM. Moreover, the generalization ability of SVM_DT is better than that of decision tree and is similar to that of SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sikder, A.R., Zomaya, A.Y.: An overview of protein-folding techniques: issues and perspectives. Int. J. Bioinformatics Research and Applications 1(1), 121–143 (2005)

    Article  Google Scholar 

  2. Barakat, N., Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. In: The third Conference on Neuro-Computing and Evolving Intelligence, NCEI 2004 (2004)

    Google Scholar 

  3. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  4. Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002)

    Google Scholar 

  5. Chandonia, J.M., Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins 35, 293–306 (1999)

    Article  Google Scholar 

  6. Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 237–297 (1995)

    Google Scholar 

  7. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  8. Gorgevik, D., Cakmakov, D., Radevski, V.: Handwritten Digit Recognition Using Statistical and Rule-Based Decision Fusion. IEEE MELECON 2002, May 7-9 (2002)

    Google Scholar 

  9. Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)

    Article  Google Scholar 

  10. Hu, H., Pan, Y., Harrison, R., Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience 3(4), 265–271 (2004)

    Article  Google Scholar 

  11. Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308, 397–407 (2001)

    Article  Google Scholar 

  12. Joachims, T.: SVMlight (2002), http://www.cs.cornell.edu/People/tj/svm_light/

  13. Kim, H., Park, H.: Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach (2002)

    Google Scholar 

  14. Lim, T.S., Loh, W.Y., Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty_Tree Old and New Classification Algorithm. Machine Learning 40(3), 203–228 (2000)

    Article  MATH  Google Scholar 

  15. Lin, S., Patel, S., Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. In: Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics (2003)

    Google Scholar 

  16. Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997)

    Google Scholar 

  17. Mitsdorffer, R., Diederich, J., Tan, C.: Rule-extraction from Technology IPOs in the US Stock Market. In: ICONIP 2002, Singapore (2002)

    Google Scholar 

  18. Noble, W.S.: Kernel Methods in Computational Biology. In: Schoelkopf, B., Tsuda, K., Vert, J.-P. (eds.), pp. 71–92. MIT Press, Cambridge (2004)

    Google Scholar 

  19. Núñez, H., Angulo, C., Catala, A.: Rule-extraction from Support Vector Machines. In: The European Symposium on Artifical Neural Networks, Burges, pp.107-112 (2002), ISBN 2-930307-02-1

    Google Scholar 

  20. Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. J. Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  21. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  22. Rost, B., Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. 232, 584–599 (1993)

    Article  Google Scholar 

  23. Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, Inc., New York (1998)

    MATH  Google Scholar 

  24. Yang, Z.R., Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5) (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, J., Hu, HJ., Harrison, R., Tai, P.C., Dong, Y., Pan, Y. (2005). Understanding Protein Structure Prediction Using SVM_DT. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds) Parallel and Distributed Processing and Applications - ISPA 2005 Workshops. ISPA 2005. Lecture Notes in Computer Science, vol 3759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576259_23

Download citation

  • DOI: https://doi.org/10.1007/11576259_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29770-3

  • Online ISBN: 978-3-540-32115-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics