Finding Largest Well-Predicted Subset of Protein Structure Models

  • Shuai Cheng Li
  • Dongbo Bu
  • Jinbo Xu
  • Ming Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5029)


How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset — a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n 7) time, albeit too slow for practical usage. We present a (1 + ε)d distance approximation algorithm that runs in time O(n 3logn/ε 5) for general protein structures. In the case of globular proteins, this result can be enhanced to a randomized O(nlog2 n) time algorithm with probability at least 1 − O(1/n). In addition, we propose a (1 + ε)-approximation algorithm to compute the minimum distance to fit all the points of a model to its native structure in time O(n(loglogn + log1/ε)/ε 5). We have implemented our algorithms and results indicate our program finds much more matched pairs with less running time than TMScore, which is one of the most popular tools to assess the quality of predicted models.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, P.K., Matoušek, J., Suri, S.: Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput. Geom. Theory Appl. 1(4), 189–201 (1992)zbMATHGoogle Scholar
  2. 2.
    Alt, H., Mehlhorn, K., Wagener, H., Welzl, E.: Congruence, similarity, and symmetries of geometric objects. In: SCG 1987: Proceedings of the third annual symposium on Computational geometry, pp. 308–315. ACM Press, New York (1987)CrossRefGoogle Scholar
  3. 3.
    Ambühl, C., Chakraborty, S., Gärtner, B.: Computing largest common point sets under approximate congruence. In: Paterson, M. (ed.) ESA 2000. LNCS, vol. 1879, pp. 52–64. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 698–700 (1987)CrossRefGoogle Scholar
  5. 5.
    Choi, V., Goyal, N.: A combinatorial shape matching algorithm for rigid protein docking. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 285–296. Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Choi, V., Goyal, N.: An efficient approximation algorithm for point pattern matching under noise. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 298–310. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Hamelryck, T., Kent, J.T., Krogh, A.: Sampling Realistic Protein Conformations Using Local Structural Bias. PLoS Computational Biology 2(9), e131 (2006)CrossRefGoogle Scholar
  8. 8.
    Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small libraries of protein fragments model native protein structures accurately. J. Mol. Biol. 323, 297–307 (2002)CrossRefGoogle Scholar
  9. 9.
    Kolodny, R., Linial, N.: Approximate protein structural alignment in polynomial time. Proc. Natl. Acad. Sci. 101, 12201–12206 (2004)CrossRefGoogle Scholar
  10. 10.
    Lancia, G., Istrail, S.: Protein structure comparison: Algorithms and applications. In: Mathematical Methods for Protein Structure Analysis and Design, pp. 1–33 (2003)Google Scholar
  11. 11.
    Moult, J., Fidelis, K., Rost, B., Hubbard, T., Tramontano, A.: Critical assessment of methods of protein structure prediction (casp):round 6. Proteins: Struct. Funct. Genet. 61, 3–7 (2005)CrossRefGoogle Scholar
  12. 12.
    Siew, N., Elofsson, A., Rychlewski, L., Fischer, D.: Maxsub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16(9), 776–785 (2000)CrossRefGoogle Scholar
  13. 13.
    Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol. 268 (1997)Google Scholar
  14. 14.
    Zemla, A.: LGA: a method for finding 3D similarities in protein structures. Nucl. Acids Res. 31(13), 3370–3374 (2003)CrossRefGoogle Scholar
  15. 15.
    Zhang, Y., Skolnick, J.: Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57(4), 702–710 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Shuai Cheng Li
    • 1
  • Dongbo Bu
    • 1
    • 3
  • Jinbo Xu
    • 2
  • Ming Li
    • 1
  1. 1.David R. Cheriton School of Computer ScienceUniversity of WaterlooCanada
  2. 2.Toyota Technological Institute at ChicagoUSA
  3. 3.Institute of Computing TechnologyChinese Academy of SciencesChina

Personalised recommendations