Understanding Protein Structure Prediction Using SVM_DT

He, Jieyue; Hu, Hae-Jin; Harrison, Robert; Tai, Phang C.; Dong, Yisheng; Pan, Yi

doi:10.1007/11576259_23

Jieyue He^20,21,
Hae-Jin Hu²¹,
Robert Harrison^21,22,23,
Phang C. Tai²²,
Yisheng Dong²⁰ &
…
Yi Pan²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3759))

Included in the following conference series:

International Symposium on Parallel and Distributed Processing and Applications

592 Accesses
2 Citations

Abstract

The explanation of a decision made is important for the acceptance of machine learning technology, especially for such applications as bioinformatics. Support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, it is a black box model. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for understanding protein secondary structure prediction by integrating merits of both support vector machine and decision tree is presented. This approach combines SVM with decision tree into a new algorithm called SVM_DT. The results of the experiments of protein secondary structure prediction on RS126 data sets show that the comprehensibility of SVM_DT is much better than that of SVM. Moreover, the generalization ability of SVM_DT is better than that of decision tree and is similar to that of SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sikder, A.R., Zomaya, A.Y.: An overview of protein-folding techniques: issues and perspectives. Int. J. Bioinformatics Research and Applications 1(1), 121–143 (2005)
Article Google Scholar
Barakat, N., Diederich, J.: Learning-based Rule-Extraction from Support Vector Machine. In: The third Conference on Neuro-Computing and Evolving Intelligence, NCEI 2004 (2004)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Article Google Scholar
Casbon, J.: Protein Secondary Structure Prediction with Support Vector Machines (2002)
Google Scholar
Chandonia, J.M., Karplus, M.: New Methods for accurate prediction of protein secondary structure. Proteins 35, 293–306 (1999)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 237–297 (1995)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Gorgevik, D., Cakmakov, D., Radevski, V.: Handwritten Digit Recognition Using Statistical and Rule-Based Decision Fusion. IEEE MELECON 2002, May 7-9 (2002)
Google Scholar
Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices from Protein Blocks. PNAS 89, 10915–10919 (1992)
Article Google Scholar
Hu, H., Pan, Y., Harrison, R., Tai, P.C.: Improved Protein Secondary Structure Prediction Using Support Vector Machine with a New Encoding Scheme and an Advanced Tertiary Classifier. IEEE Transactions on NanoBioscience 3(4), 265–271 (2004)
Article Google Scholar
Hua, S., Sun, Z.: A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. J. Mol. Biol. 308, 397–407 (2001)
Article Google Scholar
Joachims, T.: SVMlight (2002), http://www.cs.cornell.edu/People/tj/svm_light/
Kim, H., Park, H.: Protein Secondary Structure Prediction Based on an Improved Support Vector Machines Approach (2002)
Google Scholar
Lim, T.S., Loh, W.Y., Shih, Y.S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty_Tree Old and New Classification Algorithm. Machine Learning 40(3), 203–228 (2000)
Article MATH Google Scholar
Lin, S., Patel, S., Duncan, A.: Using Decision Trees and Support Vector Machines to Classify Genes by Names. In: Proceeding of the Europen Workshop on Data Mining and Text Mining for Bioinformatics (2003)
Google Scholar
Mitchell, M.T.: Machine Learning. McGraw-Hill, US (1997)
Google Scholar
Mitsdorffer, R., Diederich, J., Tan, C.: Rule-extraction from Technology IPOs in the US Stock Market. In: ICONIP 2002, Singapore (2002)
Google Scholar
Noble, W.S.: Kernel Methods in Computational Biology. In: Schoelkopf, B., Tsuda, K., Vert, J.-P. (eds.), pp. 71–92. MIT Press, Cambridge (2004)
Google Scholar
Núñez, H., Angulo, C., Catala, A.: Rule-extraction from Support Vector Machines. In: The European Symposium on Artifical Neural Networks, Burges, pp.107-112 (2002), ISBN 2-930307-02-1
Google Scholar
Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. J. Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Rost, B., Sander, C.: Prediction of protein Secondary Structure at Better than 70% Accuracy. J. Mol. Biol. 232, 584–599 (1993)
Article Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, Inc., New York (1998)
MATH Google Scholar
Yang, Z.R., Chou, K.: Bio-support Vector Machines for Computational Proteomics. Bioinformatics 20(5) (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Southeast University, Nanjing, 210096, China
Jieyue He & Yisheng Dong
Department of Computer Science,
Jieyue He, Hae-Jin Hu, Robert Harrison & Yi Pan
Department of Biology, Georgia State University, Atlanta, GA, 30303-4110, USA
Robert Harrison & Phang C. Tai
GCC Distinguished Cancer Scholar,
Robert Harrison

Authors

Jieyue He
View author publications
You can also search for this author in PubMed Google Scholar
Hae-Jin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Robert Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Phang C. Tai
View author publications
You can also search for this author in PubMed Google Scholar
Yisheng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, 210093, Nanjing, China
Guihai Chen
Dept. of CS, Georgia State University, 30302, Atlanta, GA, USA
Yi Pan
Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Minyi Guo
Department of Computer Science, University of Virginia,
Jian Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Hu, HJ., Harrison, R., Tai, P.C., Dong, Y., Pan, Y. (2005). Understanding Protein Structure Prediction Using SVM_DT. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds) Parallel and Distributed Processing and Applications - ISPA 2005 Workshops. ISPA 2005. Lecture Notes in Computer Science, vol 3759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576259_23

Download citation

DOI: https://doi.org/10.1007/11576259_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29770-3
Online ISBN: 978-3-540-32115-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics