Skip to main content
Log in

TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the “intermediate” decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In this study, multi-view features mean the features extracted from different sources, such as amino acids composition, protein evolutionary profile, and so on.

References

  • Babnigg G, Joachimiak A (2010) Predicting protein crystallization propensity from protein sequence. J Struct Funct Genomics 11(1):71–80

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Berman HM et al (2000) The Protein Data Bank. Nucleic Acids Res 28(1):235–242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bradshaw NI et al (2012) 15: 30 structural elucidation of disc1 pathway proteins using electron microscopy, chemical cross-linking and mass spectroscopy. Schizophr Res 136:S74

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  • Charoenkwan P, Shoombuatong W, Lee HC, Chaijaruwanich J, Huang HL, Ho SY (2013) SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One 8(9):e72368

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chauhan JS, Mishra NK, Raghava GP (2009) Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinform 10:434

    Article  Google Scholar 

  • Chen K, Kurgan L, Rahbari M (2007) Prediction of protein crystallization using collocation of amino acid pairs. Biochem Bioph Res Co 355(3):764–769

    Article  CAS  Google Scholar 

  • Chen C, Chen LX, Zou XY, Cai PX (2008) Predicting protein structural class based on multi-features fusion. J Theor Biol 253(2):388–392

    Article  CAS  PubMed  Google Scholar 

  • Chen K, Mizianty MJ, Kurgan L (2011) ATPsite: sequence-based prediction of ATP-binding residues. Proteome Sci 9(Suppl 1):S4

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen K, Mizianty MJ, Kurgan L (2012) Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. Bioinformatics 28(3):331–341

    Article  PubMed  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genetics 43(3):246–255

    Article  CAS  Google Scholar 

  • Chou K-C (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11(16):2105–2134

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C, Shen H-B (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Bioph Res Co 360(2):339–345

    Article  CAS  Google Scholar 

  • Dieckmann A, Rieskamp J (2007) The influence of information redundancy on probabilistic inferences. Memory Cogn 35(7):1801–1813

    Article  Google Scholar 

  • Ding C, Yuan L-F, Guo S-H, Lin H, Chen W (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteom 77:321–328

    Article  CAS  Google Scholar 

  • Foulonneau M (2007) Information redundancy across metadata collections. Inf Process Manage 43(3):740–751

    Article  Google Scholar 

  • Gao JZ, Hu G, Wu ZH, Ruan JS, Shen SY, Hanlon M, Wang K (2014) Improved prediction of protein crystallization, purification and production propensity using hybrid sequence representation. Curr Bioinform 9(1):57–64

    Article  CAS  Google Scholar 

  • Gromiha MM (2010) Protein bioinformatics: from sequence to function. Academic Press, Cambridge

    Google Scholar 

  • Haibo H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  • Hu G et al (2014a) Human structural proteome-wide characterization of Cyclosporine A targets. Bioinformatics 30(24):3561–3566

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hu J, He X, Yu D-J, Yang X-B, Yang J-Y, Shen H-B (2014b) A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 9(9):e107676

    Article  PubMed  PubMed Central  Google Scholar 

  • Jackman L (2012) Dynamic nuclear magnetic resonance spectroscopy. Elsevier, New York

    Google Scholar 

  • Jahandideh S, Mahdavi A (2012) RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest. J Theor Biol 306:115–119

    Article  CAS  PubMed  Google Scholar 

  • Kandaswamy KK, Pugalenthi G, Suganthan PN, Gangal R (2010) SVMCRYS: an SVM approach for the prediction of protein crystallization propensity from protein sequence. Protein Peptide Lett 17(4):423–430

    Article  CAS  Google Scholar 

  • Kantardjieff KA, Rupp B (2004) Protein isoelectric point as a predictor for increased crystallization screening efficiency. Bioinformatics 20(14):2162–2168

    Article  CAS  PubMed  Google Scholar 

  • Kantardjieff KA, Jamshidian M, Rupp B (2004) Distributions of pI versus pH provide prior information for the design of crystallization screening experiments: response to comment on ‘Protein isoelectric point as a predictor for increased crystallization screening efficiency’. Bioinformatics 20(14):2171–2174

    Article  CAS  Google Scholar 

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  Google Scholar 

  • Kurgan L, Razib AA, Aghakhani S, Dick S, Mizianty M, Jahandideh S (2009) CRYSTALP2: sequence-based protein crystallization propensity prediction. BMC Struct Biol 9:50

    Article  PubMed  PubMed Central  Google Scholar 

  • Mizianty MJ, Kurgan L (2009) Meta prediction of protein crystallization propensity. Biochem Bioph Res Co 390(1):10–15

    Article  CAS  Google Scholar 

  • Mizianty MJ, Kurgan L (2011) Sequence-based prediction of protein crystallization, purification and production propensity. Bioinformatics 27(13):i24–i33

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mizianty MJ, Kurgan LA (2012) CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics. Protein Pept Lett 19(1):40–49

    Article  CAS  PubMed  Google Scholar 

  • Mizianty MJ, Fan X, Yan J, Chalmers E, Woloschuk C, Joachimiak A, Kurgan L (2014) Covering complete proteomes with X-ray structures: a current snapshot. Biol Crystallogr 70(11):2781–2793

    Article  CAS  Google Scholar 

  • Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(2):467–475

    Article  Google Scholar 

  • Overton IM, Barton GJ (2006) A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett 580(16):4005–4009

    Article  CAS  PubMed  Google Scholar 

  • Overton IM, Padovani G, Girolami MA, Barton GJ (2008) ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics 24(7):901–907

    Article  CAS  PubMed  Google Scholar 

  • Price Ii WN et al (2009) Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 27(1):51–57

    Article  Google Scholar 

  • Rodrigues A, Hubbard RE (2003) Making decisions for structural genomics. Brief Bioinform 4(2):150–167

    Article  CAS  PubMed  Google Scholar 

  • Roy A, Zhang Y (2012) Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. Structure 20(6):987–997

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rung J, Brazma A (2013) Reuse of public genome-wide gene expression data. Nat Rev Genet 14(2):89–99

    Article  CAS  PubMed  Google Scholar 

  • Rupp B, Wang J (2004) Predictive models for protein crystallization. Methods 34(3):390–407

    Article  CAS  PubMed  Google Scholar 

  • Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  CAS  PubMed  Google Scholar 

  • Schaffer AA et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Service R (2005) Structural biology. Structural genomics, round 2. Science 307(5715):1554–1558

    Article  PubMed  Google Scholar 

  • Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373(2):386–388

    Article  CAS  PubMed  Google Scholar 

  • Singh H, Chauhan JS, Gromiha MM, Raghava GP (2011) ccPDB: compilation and creation of data sets from Protein Data Bank. Nucleic Acids Res gkr1150

  • Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A (2007) XtalPred: a web server for prediction of protein crystallizability. Bioinformatics 23(24):3403–3405

    Article  CAS  PubMed  Google Scholar 

  • Smialowski P, Schmidt T, Cox J, Kirschner A, Frishman D (2006) Will my protein crystallize? A sequence-based predictor. Proteins 62(2):343–355

    Article  CAS  PubMed  Google Scholar 

  • Todd AE, Marsden RL, Thornton JM, Orengo CA (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J Mol Biol 348(5):1235–1260

    Article  CAS  PubMed  Google Scholar 

  • Tramontano A, Cozzetto D (2004) The relationship between protein sequence, structure and function: protein function prediction. Supramolecular Struct Funct 8:15–29

    Google Scholar 

  • Vapnik VN (ed) (1998) Statistical learning theory. Wiley, New York

  • Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y (2012) Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features. IEEE Trans Nanobioscience 11(4):375–385

    Article  PubMed  Google Scholar 

  • Yu D-J et al (2013a) Learning protein multi-view features in complex space. Amino Acids 44(5):1365–1379

    Article  CAS  PubMed  Google Scholar 

  • Yu DJ, Hu J, Huang Y, Shen HB, Qi Y, Tang ZM, Yang JY (2013b) TargetATPsite: a template-free method for ATP-binding sites prediction with residue evolution image sparse representation and classifier ensemble. J Comput Chem 34(11):974–985

    Article  PubMed  Google Scholar 

  • Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY (2013c) Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 104:180–190

    Article  Google Scholar 

  • Zhang Y (2014) Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins Struct Funct Bioinform 82(S2):175–187

    Article  CAS  Google Scholar 

  • Zucker FH et al (2010) Prediction of protein crystallization outcome using a hybrid method. J Struct Biol 171(1):64–73

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61373062, 61175024, 61222306, and 61233011), the Natural Science Foundation of Jiangsu (No. BK20141403), the Jiangsu University Graduate Research and Innovation Project (No. KYZZ_0123), Jiangsu Postdoctoral Science Foundation (No. 1201027C), the Science and Technology Commission of Shanghai Municipality (No. 16JC1404300), “The Six Top Talents” of Jiangsu Province (No. 2013-XXRJ-022), and the Fundamental Research Funds for the Central Universities (No. 30916011327). D. J. Yu is the corresponding author for this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Jun Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Handling Editor: S. C. E. Tosatto.

Appendix A: Parameters of TargetCrys

Appendix A: Parameters of TargetCrys

See Table 8.

Table 8 Parameters of SVM models identified with the grid search program of LIBSVM software on TRAIN3587 and TRAIN1500.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, J., Han, K., Li, Y. et al. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids 48, 2533–2547 (2016). https://doi.org/10.1007/s00726-016-2274-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-016-2274-4

Keywords

Navigation