Selecting near-native protein structures from ab initio models using ensemble clustering
- 38 Downloads
Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.
Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.
We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.
The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.
Keywordsnear-native structure protein structure prediction ab initio decoy ensemble clustering k-medoids
This work is supported by the National Key R&D Program of China (No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).
- 1.UniProtKB/TrEMBL Protein Database Release Statistics. https://doi.org/www.ebi.ac.uk/uniprot/TrEMBLstats (Accessed Jun 30, 2017)
- 7.Kaufman, L. and Rousseeuw, P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge, Y. (ed.). Basel: Birkhäuser BaselGoogle Scholar
- 13.Pirim, H. and Seker, S. E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpenGoogle Scholar
- 18.The 11th Critical Assessment of Techniques for Protein Structure Prediction. predictioncenter.org/casp11/zscores_final.cgi (Accessed Jun 30, 2017)Google Scholar