Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites
A new binding site comparison algorithm using optimal superposition of the continuous pharmacophoric property distributions is reported. The method demonstrates high sensitivity in discovering both, distantly homologous and convergent binding sites. Good quality of superposition is also observed on multiple examples. Using the new approach, a measure of site similarity is derived and applied to clustering of ligand binding pockets in PDB.
KeywordsProtein Data Bank Atomic Property Pharmacophoric Feature Chymosin Adenosyl
Experimental structural biology efforts are uncovering protein structures at unprecedented rate. There is a need to understand relationships and discover similarities between the solved structures. While fold comparisons are routinely performed to identify homologies that are at or beyond the limit of the sequence comparison methods, some functional relationships can only be detected at the level of binding sites. Ultimately, it is the configuration of these sites rather than overall sequence or fold, that determine enzymatic or signal transduction activity of a protein.
Most existing methods for binding site comparison are based on some form of coarse-grain representation of the geometry and properties of the pocket as a set of points or centers. Using a variety of algorithms, correspondence between the two sets is established. FLAP  algorithm first generates GRID  molecular interaction fields, which are used to detect locations where interactions of chemical groups with particular pharmacophoric features would be most favorable. Four-point pharmacophores are constructed from these points and used for target site matching. PocketMatch  is an algorithm for comparison of binding sites in a frame-invariant manner, based on representation of the sites by sorted lists of distances capturing shape and chemical nature of the site. Lists are compared using a special alignment algorithm and PMScore function. IsoCleft  detects 3D atomic similarities between binding sties using a graph-matching method. Protein functional surfaces  methodology attempts to optimize global shape and local physicochemical ‘texture’ match between a pair of surfaces using object recognition techniques. Often, search algorithm is combined with a specially compiled database of binding sites, for example CPASS database comprises ligand-defined binding sites found in the protein data bank (PDB) and CPASS algorithm compares these ligand defined sites to determine similarity without maintaining sequence connectivity. Similarly, SURFACE is a database of protein surface regions, with finctional surface patches defined by sets of residues, and searches performed by matching the residue sets. CavBase is a dataset of cavities extracted from PDB and searcheable using an algorithm that matches pseudocenters analogous to pharmacophoric points . The Superimposé webserver  implements several superposition and comparison methods in an on-line format and allows detection of similarities between binding sites or entire proteins. A searchable database for comparing protein-ligand binding sites for the analysis of structure-function relationships has been reported , including comparison method based on geometric hashing, which identifies maximum common sub-graph of atomic features. Med-SuMo rapidly compares protein surfaces represented by triplets of chemical groups. Standard 3-, 4- and 5-point pharmacophores extracted from binding pockets identified by icmPocketFinder across human PDB protein structures were used create a virtual library of sites in human pocketome, and querying the library with a pharmacophore of methyl-lysine binding site, interesting non-trivial hits were retrieved . Of note, another perspective on the pocket comparison problem, which is to detect principal differences between related sites, was taken by several groups [15, 16, 17].
Discretized representation of the continuous pocket surface by amino-acid residues, chemical groups, pharmacophoric points or similar descriptors, allows very rapid comparison but may not be always adequate to capture distant similarities. Pharmacophoric points are well-suited to represent highly localized interaction centers, such as hydrogen bond donors and acceptors. Hydrophobic interactions and shape complementarity on the other hand are continuously distributed properties that lend themselves poorly to point representation. Moreover, to detect distant pocket similarities, ‘fuzzy’ matching may be needed because some of the discrete features may disappear, appear or change. These issues can be partially overcome by increasing the number of representative points and allowing partial matches.
In the present work, the APF approach is adapted to the problem of binding site/pocket superposition. The resulting pocket superposition method is tested on multiple distantly similar pocket examples. The method also produces a score characterizing the degree of similarity of the pockets. The utility of the APF site superposition as a site comparison method is evaluated by calculating a complete distance matrix for the set of over 5000 binding sites in scPDB binding site database. Finally, clustering of this available slice of the pocketome is performed.
Adaptation of the APF ligand superposition method to binding site superposition
The original APF ligand superposition protocol consists of (I) generation of grids with 7 APF potential components from the template (static) ligand and (II) optimization of the target ligand in the grid APF potentials combined with internal force-field energy of the ligand. Monte-Carlo with gradient minimization after each random step is used as a global energy optimizer. Six variables controlling overall position of the ligand as well as torsions around rotatable bonds are optimized.
Distance matrix calculation and clustering
APF pseudo-energy or score EAPF for the optimal superposition reflects the similarity of the atomic property distributions of the two binding pockets. It can be used directly for ranking of the database binding sites by their similarity to a query. However, for some other applications such as clustering, it is necessary to derive a similarity measure that behaves distance-like, rather then ranking score-like. In particular, for a pair of non-identical sites it has to be a positive value that increases as they become more dissimilar and becomes zero for identical pairs. On the other hand, EAPF is always negative, and the value for identical sites varies depending on the size and composition of the site. To convert EAPF to a normalized dot product-like measure with a correct asymptotic behavior, we used the following formula:
SAPF = tanh((EAPF-E0)/∆0),
where E0 and ∆0 are empiric parameters. Next, distance-like similarity measure is obtained from dot-product-like:
Results and discussion
Interestingly, a super-cluster emerged around GTP- and ATPases, grouping together other phosphatases, phosphorylases and phosphodiesterases, very likely due to common features associated with phosphate binding. Rossman fold-based NAD- and FAD- oxydases/reductases and SAM methyltransferases formed another large loose supercluster, having in common the adenine binding sub-site.
Similarly, FAD and NAD cofactors in in UDP-galactose 4-epimerase and D-amino acid oxydase share the same binding mode for the common nucleotide and this homology is successfully detected despite very different portions that coordinate flavine and nicotinamide (Fig 7b).
Sensitive and accurate binding site comparison is a technology with multiple important applications. Binding site databases could be screened for putative off-target sites for known or candidate drugs, either to discover and avoid side-effects or to find new applications. Functional annotation of ‘orphan’ pockets on newly resolved protein structures could be aided by identification of similar sites if known function. Initial drug design leads for new target proteins may be suggested by ligands binding similar sites in well-studied proteins. In contrast to previously reported methods, APF BSS utilizes continuous similarity measure and optimization algorithm which may identify and successfully superimpose distantly related sites missed by point-based approaches. Promising results in PDB-wide site comparisons illustrate sensitivity and accuracy of APF BSS.
Author wishes to acknowledge stimulating discussions with Ruben Abagyan. This work was partially supported by the NIH grant 1R43GM74343.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
- 4.Najmanovich R, Kurbatova N, Thornton J: Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites. In Bioinformatics. Volume 24. Oxford, England; 2008:i105–111. 10.1093/bioinformatics/btn263Google Scholar
- 12.Jambon M, Andrieu O, Combet C, Deleage G, Delfaud F, Geourjon C: The SuMo server: 3D search for protein functional sites. In Bioinformatics. Volume 21. Oxford, England; 2005:3929–3930. 10.1093/bioinformatics/bti645Google Scholar
- 27.Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, et al.: CDD: a Conserved Domain Database for protein classification. Nucleic acids research 2005, 33(Database issue):D192–196. 10.1093/nar/gki069PubMedCentralCrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.