Finding Largest Well-Predicted Subset of Protein Structure Models
How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset — a maximum subset of the model which matches the native structure . The problem is solvable in O(n 7) time, albeit too slow for practical usage. We present a (1 + ε)d distance approximation algorithm that runs in time O(n 3logn/ε 5) for general protein structures. In the case of globular proteins, this result can be enhanced to a randomized O(nlog2 n) time algorithm with probability at least 1 − O(1/n). In addition, we propose a (1 + ε)-approximation algorithm to compute the minimum distance to fit all the points of a model to its native structure in time O(n(loglogn + log1/ε)/ε 5). We have implemented our algorithms and results indicate our program finds much more matched pairs with less running time than TMScore, which is one of the most popular tools to assess the quality of predicted models.
Unable to display preview. Download preview PDF.
- 5.Choi, V., Goyal, N.: A combinatorial shape matching algorithm for rigid protein docking. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 285–296. Springer, Heidelberg (2004)Google Scholar
- 10.Lancia, G., Istrail, S.: Protein structure comparison: Algorithms and applications. In: Mathematical Methods for Protein Structure Analysis and Design, pp. 1–33 (2003)Google Scholar
- 13.Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol. 268 (1997)Google Scholar