Abstract
How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset — a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n 7) time, albeit too slow for practical usage. We present a (1 + ε)d distance approximation algorithm that runs in time O(n 3logn/ε 5) for general protein structures. In the case of globular proteins, this result can be enhanced to a randomized O(nlog2 n) time algorithm with probability at least 1 − O(1/n). In addition, we propose a (1 + ε)-approximation algorithm to compute the minimum distance to fit all the points of a model to its native structure in time O(n(loglogn + log1/ε)/ε 5). We have implemented our algorithms and results indicate our program finds much more matched pairs with less running time than TMScore, which is one of the most popular tools to assess the quality of predicted models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agarwal, P.K., Matoušek, J., Suri, S.: Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput. Geom. Theory Appl. 1(4), 189–201 (1992)
Alt, H., Mehlhorn, K., Wagener, H., Welzl, E.: Congruence, similarity, and symmetries of geometric objects. In: SCG 1987: Proceedings of the third annual symposium on Computational geometry, pp. 308–315. ACM Press, New York (1987)
Ambühl, C., Chakraborty, S., Gärtner, B.: Computing largest common point sets under approximate congruence. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 52–64. Springer, Heidelberg (2000)
Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 698–700 (1987)
Choi, V., Goyal, N.: A combinatorial shape matching algorithm for rigid protein docking. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 285–296. Springer, Heidelberg (2004)
Choi, V., Goyal, N.: An efficient approximation algorithm for point pattern matching under noise. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 298–310. Springer, Heidelberg (2006)
Hamelryck, T., Kent, J.T., Krogh, A.: Sampling Realistic Protein Conformations Using Local Structural Bias. PLoS Computational Biology 2(9), e131 (2006)
Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small libraries of protein fragments model native protein structures accurately. J. Mol. Biol. 323, 297–307 (2002)
Kolodny, R., Linial, N.: Approximate protein structural alignment in polynomial time. Proc. Natl. Acad. Sci. 101, 12201–12206 (2004)
Lancia, G., Istrail, S.: Protein structure comparison: Algorithms and applications. In: Mathematical Methods for Protein Structure Analysis and Design, pp. 1–33 (2003)
Moult, J., Fidelis, K., Rost, B., Hubbard, T., Tramontano, A.: Critical assessment of methods of protein structure prediction (casp):round 6. Proteins: Struct. Funct. Genet. 61, 3–7 (2005)
Siew, N., Elofsson, A., Rychlewski, L., Fischer, D.: Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9), 776–785 (2000)
Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol. 268 (1997)
Zemla, A.: LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res. 31(13), 3370–3374 (2003)
Zhang, Y., Skolnick, J.: Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57(4), 702–710 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, S.C., Bu, D., Xu, J., Li, M. (2008). Finding Largest Well-Predicted Subset of Protein Structure Models. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-69068-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69066-5
Online ISBN: 978-3-540-69068-9
eBook Packages: Computer ScienceComputer Science (R0)