TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM

Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun

doi:10.1007/s00726-016-2274-4

TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM

Original Article
Published: 14 June 2016

Volume 48, pages 2533–2547, (2016)
Cite this article

Amino Acids Aims and scope Submit manuscript

Jun Hu¹,
Ke Han¹,
Yang Li¹,
Jing-Yu Yang¹,
Hong-Bin Shen² &
…
Dong-Jun Yu¹

1754 Accesses
36 Citations
Explore all metrics

Abstract

The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the “intermediate” decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Protein complex detection based on partially shared multi-view clustering

Article Open access 13 September 2016

Le Ou-Yang, Xiao-Fei Zhang, … Hong Yan

Crysalis: an integrated server for computational analysis and design of protein crystallization

Article Open access 24 February 2016

Huilin Wang, Liubin Feng, … Jiangning Song

Multi-view Learning for Classification of X-Ray Crystallography Images

Notes

In this study, multi-view features mean the features extracted from different sources, such as amino acids composition, protein evolutionary profile, and so on.

References

Babnigg G, Joachimiak A (2010) Predicting protein crystallization propensity from protein sequence. J Struct Funct Genomics 11(1):71–80
Article CAS PubMed PubMed Central Google Scholar
Berman HM et al (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242
Article CAS PubMed PubMed Central Google Scholar
Bradshaw NI et al (2012) 15: 30 structural elucidation of disc1 pathway proteins using electron microscopy, chemical cross-linking and mass spectroscopy. Schizophr Res 136:S74
Article Google Scholar
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Charoenkwan P, Shoombuatong W, Lee HC, Chaijaruwanich J, Huang HL, Ho SY (2013) SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One 8(9):e72368
Article CAS PubMed PubMed Central Google Scholar
Chauhan JS, Mishra NK, Raghava GP (2009) Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinform 10:434
Article Google Scholar
Chen K, Kurgan L, Rahbari M (2007) Prediction of protein crystallization using collocation of amino acid pairs. Biochem Bioph Res Co 355(3):764–769
Article CAS Google Scholar
Chen C, Chen LX, Zou XY, Cai PX (2008) Predicting protein structural class based on multi-features fusion. J Theor Biol 253(2):388–392
Article CAS PubMed Google Scholar
Chen K, Mizianty MJ, Kurgan L (2011) ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci 9(Suppl 1):S4
Article PubMed PubMed Central Google Scholar
Chen K, Mizianty MJ, Kurgan L (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3):331–341
Article PubMed Google Scholar
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genetics 43(3):246–255
Article CAS Google Scholar
Chou K-C (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11(16):2105–2134
Article CAS PubMed Google Scholar
Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19
Article CAS PubMed Google Scholar
Chou K-C, Shen H-B (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Bioph Res Co 360(2):339–345
Article CAS Google Scholar
Dieckmann A, Rieskamp J (2007) The influence of information redundancy on probabilistic inferences. Memory Cogn 35(7):1801–1813
Article Google Scholar
Ding C, Yuan L-F, Guo S-H, Lin H, Chen W (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteom 77:321–328
Article CAS Google Scholar
Foulonneau M (2007) Information redundancy across metadata collections. Inf Process Manage 43(3):740–751
Article Google Scholar
Gao JZ, Hu G, Wu ZH, Ruan JS, Shen SY, Hanlon M, Wang K (2014) Improved prediction of protein crystallization, purification and production propensity using hybrid sequence representation. Curr Bioinform 9(1):57–64
Article CAS Google Scholar
Gromiha MM (2010) Protein bioinformatics: from sequence to function. Academic Press, Cambridge
Google Scholar
Haibo H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Hu G et al (2014a) Human structural proteome-wide characterization of Cyclosporine A targets. Bioinformatics 30(24):3561–3566
Article CAS PubMed PubMed Central Google Scholar
Hu J, He X, Yu D-J, Yang X-B, Yang J-Y, Shen H-B (2014b) A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 9(9):e107676
Article PubMed PubMed Central Google Scholar
Jackman L (2012) Dynamic nuclear magnetic resonance spectroscopy. Elsevier, New York
Google Scholar
Jahandideh S, Mahdavi A (2012) RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest. J Theor Biol 306:115–119
Article CAS PubMed Google Scholar
Kandaswamy KK, Pugalenthi G, Suganthan PN, Gangal R (2010) SVMCRYS: an SVM approach for the prediction of protein crystallization propensity from protein sequence. Protein Peptide Lett 17(4):423–430
Article CAS Google Scholar
Kantardjieff KA, Rupp B (2004) Protein isoelectric point as a predictor for increased crystallization screening efficiency. Bioinformatics 20(14):2162–2168
Article CAS PubMed Google Scholar
Kantardjieff KA, Jamshidian M, Rupp B (2004) Distributions of pI versus pH provide prior information for the design of crystallization screening experiments: response to comment on ‘Protein isoelectric point as a predictor for increased crystallization screening efficiency’. Bioinformatics 20(14):2171–2174
Article CAS Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Article Google Scholar
Kurgan L, Razib AA, Aghakhani S, Dick S, Mizianty M, Jahandideh S (2009) CRYSTALP2: sequence-based protein crystallization propensity prediction. BMC Struct Biol 9:50
Article PubMed PubMed Central Google Scholar
Mizianty MJ, Kurgan L (2009) Meta prediction of protein crystallization propensity. Biochem Bioph Res Co 390(1):10–15
Article CAS Google Scholar
Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33
Article CAS PubMed PubMed Central Google Scholar
Mizianty MJ, Kurgan LA (2012) CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics. Protein Pept Lett 19(1):40–49
Article CAS PubMed Google Scholar
Mizianty MJ, Fan X, Yan J, Chalmers E, Woloschuk C, Joachimiak A, Kurgan L (2014) Covering complete proteomes with X-ray structures: a current snapshot. Biol Crystallogr 70(11):2781–2793
Article CAS Google Scholar
Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(2):467–475
Article Google Scholar
Overton IM, Barton GJ (2006) A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett 580(16):4005–4009
Article CAS PubMed Google Scholar
Overton IM, Padovani G, Girolami MA, Barton GJ (2008) ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics 24(7):901–907
Article CAS PubMed Google Scholar
Price Ii WN et al (2009) Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 27(1):51–57
Article Google Scholar
Rodrigues A, Hubbard RE (2003) Making decisions for structural genomics. Brief Bioinform 4(2):150–167
Article CAS PubMed Google Scholar
Roy A, Zhang Y (2012) Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. Structure 20(6):987–997
Article CAS PubMed PubMed Central Google Scholar
Rung J, Brazma A (2013) Reuse of public genome-wide gene expression data. Nat Rev Genet 14(2):89–99
Article CAS PubMed Google Scholar
Rupp B, Wang J (2004) Predictive models for protein crystallization. Methods 34(3):390–407
Article CAS PubMed Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article CAS PubMed Google Scholar
Schaffer AA et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005
Article CAS PubMed PubMed Central Google Scholar
Service R (2005) Structural biology. Structural genomics, round 2. Science 307(5715):1554–1558
Article PubMed Google Scholar
Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373(2):386–388
Article CAS PubMed Google Scholar
Singh H, Chauhan JS, Gromiha MM, Raghava GP (2011) ccPDB: compilation and creation of data sets from Protein Data Bank. Nucleic Acids Res gkr1150
Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405
Article CAS PubMed Google Scholar
Smialowski P, Schmidt T, Cox J, Kirschner A, Frishman D (2006) Will my protein crystallize? A sequence-based predictor. Proteins 62(2):343–355
Article CAS PubMed Google Scholar
Todd AE, Marsden RL, Thornton JM, Orengo CA (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348(5):1235–1260
Article CAS PubMed Google Scholar
Tramontano A, Cozzetto D (2004) The relationship between protein sequence, structure and function: protein function prediction. Supramolecular Struct Funct 8:15–29
Google Scholar
Vapnik VN (ed) (1998) Statistical learning theory. Wiley, New York
Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y (2012) Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features. IEEE Trans Nanobioscience 11(4):375–385
Article PubMed Google Scholar
Yu D-J et al (2013a) Learning protein multi-view features in complex space. Amino Acids 44(5):1365–1379
Article CAS PubMed Google Scholar
Yu DJ, Hu J, Huang Y, Shen HB, Qi Y, Tang ZM, Yang JY (2013b) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem 34(11):974–985
Article PubMed Google Scholar
Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY (2013c) Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104:180–190
Article Google Scholar
Zhang Y (2014) Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins Struct Funct Bioinform 82(S2):175–187
Article CAS Google Scholar
Zucker FH et al (2010) Prediction of protein crystallization outcome using a hybrid method. J Struct Biol 171(1):64–73
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61373062, 61175024, 61222306, and 61233011), the Natural Science Foundation of Jiangsu (No. BK20141403), the Jiangsu University Graduate Research and Innovation Project (No. KYZZ_0123), Jiangsu Postdoctoral Science Foundation (No. 1201027C), the Science and Technology Commission of Shanghai Municipality (No. 16JC1404300), “The Six Top Talents” of Jiangsu Province (No. 2013-XXRJ-022), and the Fundamental Research Funds for the Central Universities (No. 30916011327). D. J. Yu is the corresponding author for this paper.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China
Jun Hu, Ke Han, Yang Li, Jing-Yu Yang & Dong-Jun Yu
Department of Automation, Shanghai Jiao Tong University, Dongchuan Road 800, Shanghai, 200240, China
Hong-Bin Shen

Authors

Jun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ke Han
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Yu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-Jun Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Handling Editor: S. C. E. Tosatto.

Appendix A: Parameters of TargetCrys

See Table 8.

Table 8 Parameters of SVM models identified with the grid search program of LIBSVM software on TRAIN3587 and TRAIN1500.

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, J., Han, K., Li, Y. et al. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids 48, 2533–2547 (2016). https://doi.org/10.1007/s00726-016-2274-4

Download citation

Received: 30 July 2015
Accepted: 07 June 2016
Published: 14 June 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00726-016-2274-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM

Abstract

Access this article

Similar content being viewed by others

Protein complex detection based on partially shared multi-view clustering

Crysalis: an integrated server for computational analysis and design of protein crystallization

Multi-view Learning for Classification of X-Ray Crystallography Images

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix A: Parameters of TargetCrys

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM

Abstract

Access this article

Similar content being viewed by others

Protein complex detection based on partially shared multi-view clustering

Crysalis: an integrated server for computational analysis and design of protein crystallization

Multi-view Learning for Classification of X-Ray Crystallography Images

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix A: Parameters of TargetCrys

Appendix A: Parameters of TargetCrys

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation