Ballast: A Ball-Based Algorithm for Structural Motifs

  • Lu He
  • Fabio Vandin
  • Gopal Pandurangan
  • Chris Bailey-Kellogg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


Structural motifs encapsulate local sequence-structure-function relationships characteristic of related proteins, enabling the prediction of functional characteristics of new proteins, providing molecular-level insights into how those functions are performed, and supporting the development of variants specifically maintaining or perturbing function in concert with other properties. Numerous computational methods have been developed to search through databases of structures for instances of specified motifs. However, it remains an open problem as to how best to leverage the local geometric and chemical constraints underlying structural motifs in order to develop motif-finding algorithms that are both theoretically and practically efficient. We present a simple, general, efficient approach, called Ballast (Ball-based algorithm for structural motifs), to match given structural motifs to given structures. Ballast combines the best properties of previously developed methods, exploiting the composition and local geometry of a structural motif and its possible instances in order to effectively filter candidate matches. We show that on a wide range of motif matching problems, Ballast efficiently and effectively finds good matches, and we provide theoretical insights into why it works well. By supporting generic measures of compositional and geometric similarity, Ballast provides a powerful substrate for the development of motif matching algorithms.


protein structure structural motif sequence-structure-function relationship geometric matching motif matching algorithm probabilistic analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W., Willett, P.: A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J. Mol. Biol. 243, 327–344 (1994)CrossRefGoogle Scholar
  2. 2.
    Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-d point sets. IEEE Trans. Pattern Anal. Mach. Intell. 9, 698–700 (1987)CrossRefGoogle Scholar
  3. 3.
    Babbitt, P.C., Hasson, M.S., et al.: The enolase superfamily: A general strategy for enzyme-catalyzed abstraction of the α-protons of carboxylic acids. Biochemistry 35(51), 16489–16501 (1996)CrossRefGoogle Scholar
  4. 4.
    Bandyopadhyay, D., Huan, J., et al.: Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J. Comput. Aided Mol. Des. 23, 773–784 (2009)CrossRefGoogle Scholar
  5. 5.
    Bandyopadhyay, D., Snoeyink, J.: Almost-delaunay simplices: nearest neighbor relations for imprecise points. In: Proc. SODA, pp. 410–419 (2004)Google Scholar
  6. 6.
    Barker, J.A., Thornton, J.M.: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 19, 1644–1649 (2003)CrossRefGoogle Scholar
  7. 7.
    Bernstein, F.C., Koetzle, T.F., et al.: The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977)CrossRefGoogle Scholar
  8. 8.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16, 575–577 (1973)zbMATHCrossRefGoogle Scholar
  9. 9.
    Chen, B.Y., Fofanov, V.Y., et al.: The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs. J. Comput. Biol. 14, 791–816 (2007)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Feige, U., Goldwasser, S., Lovász, L., Safra, S., Szegedy, M.: Interactive proofs and the hardness of approximating cliques. J. ACM 43, 268–292 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Gardiner, E.J., Artymiuk, P.J., et al.: Clique-detection algorithms for matching three-dimensional molecular structures. J. Mol. Graph. Model. 15, 245–253 (1997)CrossRefGoogle Scholar
  12. 12.
    Hegyi, H., Gerstein, M.: The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J. Mol. Biol. 288, 147–164 (1999)CrossRefGoogle Scholar
  13. 13.
    Karp, R.M.: Reducibility among combinatorial problems. Complexity of Computer Computations 40(4), 85–103 (1972)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Kleywegt, G.J.: Recognition of spatial motifs in protein structures. J. Mol. Biol. 285, 1887–1897 (1999)CrossRefGoogle Scholar
  15. 15.
    Loewenstein, Y., Raimondo, D., et al.: Protein function annotation by homology-based inference. Genome Biol. 10, 207 (2009)CrossRefGoogle Scholar
  16. 16.
    Lueker, G.S.: A data structure for orthogonal range queries. In: Proc. FOCS, pp. 28–34. IEEE Computer Society, Washington, DC (1978)Google Scholar
  17. 17.
    Meng, E.C., et al.: Superfamily active site templates. Proteins 55, 962–976 (2004)CrossRefGoogle Scholar
  18. 18.
    Milik, M., Szalma, S., Olszewski, K.A.: Common Structural Cliques: a tool for protein structure and function analysis. Protein Eng. 16, 543–552 (2003)CrossRefGoogle Scholar
  19. 19.
    Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, New York (2005)zbMATHGoogle Scholar
  20. 20.
    Moll, M., Bryant, D.H., Kavraki, L.E.: The labelhash algorithm for substructure matching. BMC Bioinformatics 11, 555 (2010)CrossRefGoogle Scholar
  21. 21.
    Muthukrishnan, S., Pandurangan, G.: The bin-covering technique for thresholding random geometric graph properties. In: Proc. SODA, pp. 989–998 (2005)Google Scholar
  22. 22.
    Najmanovich, R., Kurbatova, N., Thornton, J.: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. Bioinformatics 24, i105–i111 (2008)CrossRefGoogle Scholar
  23. 23.
    Nussinov, R., Wolfson, H.J.: Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. PNAS 88, 10495–10499 (1991)CrossRefGoogle Scholar
  24. 24.
    Pegg, S.C., Brown, S.D., et al.: Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry 45, 2545–2555 (2006)CrossRefGoogle Scholar
  25. 25.
    Penrose, M.D.: Random Geometric Graphs. Oxford University Press (2003)Google Scholar
  26. 26.
    Porter, C.T., Bartlett, G.J., Thornton, J.M.: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res. 32, D129–D133 (2004)CrossRefGoogle Scholar
  27. 27.
    Shulman-Peleg, A., Nussinov, R., Wolfson, H.J.: Recognition of functional sites in protein structures. J. Mol. Biol. 339, 607–633 (2004)CrossRefGoogle Scholar
  28. 28.
    Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Wallace, A.C., Borkakoti, N., Thornton, J.M.: TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci. 6, 2308–2323 (1997)CrossRefGoogle Scholar
  30. 30.
    Wangikar, P.P., et al.: Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J. Mol. Biol. 326, 955–978 (2003)CrossRefGoogle Scholar
  31. 31.
    Willard, D.E.: Predicate-Oriented Database Search Algorithms. Outstanding Dissertations in the Computer Sciences. Garland Publishing, New York (1978)Google Scholar
  32. 32.
    Wolfson, H.J., Rigoutsos, I.: Geometric hashing: An overview. Computing in Science and Engineering 4, 10–21 (1997)Google Scholar
  33. 33.
    Xie, L., Bourne, P.E.: Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. PNAS 105, 5441–5446 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Lu He
    • 1
  • Fabio Vandin
    • 2
  • Gopal Pandurangan
    • 3
    • 4
  • Chris Bailey-Kellogg
    • 1
  1. 1.Department of Computer Science, 6211 Sudikoff LaboratoryDartmouth CollegeHanoverUSA
  2. 2.Department of Computer Science and Center for Computational Molecular BiologyBrown UniversityProvidenceUSA
  3. 3.Division of Mathematical SciencesNanyang Technological UniversitySingapore
  4. 4.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations