Identification of Pockets on Protein Surface to Predict Protein–Ligand Binding Sites

Part of the Focus on Structural Biology book series (FOSB, volume 8)


Proteins perform their biological functions in different cell processes mainly by interacting with other molecules such as other proteins, ligands, DNAs and RNAs etc. Not all but only parts of residues in proteins are involved in such interactions. Therefore, identification of these interacting residues on a protein is of great importance to understanding of protein functions. In the variety of interactions, the interactions between proteins and ligands have been widely studied in protein-ligand docking, in virtual screening and structure-based drug design etc. There exist a number of cavities or pocket sites on protein surface where small molecules might bind. Therefore, identification of such pocket sites is often the first step in protein ligand-binding site prediction. Many computational algorithms and tools have been developed in recent decades to predict protein-ligand binding site from identification of pockets on protein structures, such as POCKET (Levitt and Banaszak 1992), LIGSITE (Hendlich et al. 1997), CAST (Dundas et al. 2006; Binkowski et al. 2003), LIGSITECS/C (Huang and Schroeder 2006), PASS (Brady and Stouten 2000), Q-SiteFinder (Laurie and Jackson 2005), SURFNET (Laskowski 1995), Fpocket (Le Guilloux et al. 2009), GHECOM (Kawabata 2010), ConCavity (Capra et al. 2009), POCASA (Yu et al. 2010), PocketPicker (Weisel et al. 2007), SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) and so on. Some of these methods have been described in details in other chapters. Most of the existing methods for protein-ligand binding site prediction can be classified into two types: geometry-based and energy-based. The geometry-based methods can be further classified into grid-based, sphere-based and α-shape-based (Kawabata 2010; Yu et al. 2010). In the grid based methods, the protein structure is projected into a 3D grid and the grid points are categorized into different types such as “outside protein”, “inside protein” and “near protein surface” according to their positions related to the protein. Then those grid points not inside protein are clustered using some geometry attributes and those grids points at the pocket sites can be recognized in the end. LIGSITECS, GHECOM, PocketPicker and ConCavity are the representatives of such type. In LIGSITEcs, the grid points are categorized into three types: inside protein, near surface and in the solvent. For all the solvent points, a seven-direction scanning is applied. All the solvent grid points will be evaluated by the number of SSS (surface-solvent-surface) event it has, and if the grid point has more or equal than five such events, it normally locates at a pocket site point. LIGSITEcs will be explained in details in the next section. GHECOM also firstly projects the protein into a 3D grid, and the geometry attribute used in this method is mathematical morphology. It uses the theory of mathematical morphology to define the pocket region on protein surface. In mathematical morphology (Masuya and Doi 1995), there are four basic operations of dilation, erosion, opening and closing for a probe to define a pocket site. In ConCavity, a 3D grid is constructed to include the protein as well. Each grid point is evaluated and scored by the structural information and the evolutional information. In the end, the regions with many high-scoring grid points are considered to be pocket sites. In the sphere-based approaches, the common strategy is to fulfill the spheres on protein surface layer by layer and a cutting method is applied when fulfilling. The final pocket sites are that those regions which are in rich of such spheres. This kind of methods include SURFNET, PASS, PHECOM (Kawabata and Go 2007) and POCASA (Yu et al. 2010). Approaches based on α-shape include CAST and Fpocket. CAST computes the triangulations of the protein’s surface atoms and these triangulations are grouped by letting small sized ones flow towards the neighboring larger one. The pocket sites are the collection of empty triangles. Different from CAST, Fpocket uses the idea of α- sphere which is a sphere contacting four atoms on its boundary and containing no inside atom. The next step is to identify clusters of spheres close together and those clusters are potential pocket sites. In contrast to geometry-based methods, there are some methods which Q-SiteFinder (Laurie and Jackson 2005) aims to find pocket sites by computing the interaction energy between protein atoms and a small molecule probe. In Q-SiteFinder, layers of methyl (―CH3) probes are initialized on protein surface to calculate the van der Waals interaction energy between the protein atoms and the probes. Then the probes are clustered into many groups and are ranked by the total energy of probes. Those clusters with high energy will be the potential ligand binding sites. SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) is similar to Q-SiteFinder but it includes Lennard-Jones and electrostatics energy terms and uses different types of probes to calculate interaction energy. Table 2.1 briefly summarizes the category of these existing computational methods.


LIGSITE csc MetaPocket Pocket identification Protein-ligand binding sites 3D grid Residues conservation Cavity Pocket Mathematical morphology Q-SiteFinder Inside protein Near surface In solvent Cubic diagonals 


  1. Binkowski TA, Naghibzadeh S, Liang J (2003) CASTp: computed atlas of surface topography of proteins. Nucleic Acids Res 31(13):3352–3355PubMedCrossRefGoogle Scholar
  2. Brady GP Jr, Stouten PF (2000) Fast prediction and visualization of protein binding pockets with PASS. J Comput Aided Mol Des 14(4):383–401PubMedCrossRefGoogle Scholar
  3. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA (2009) Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 5(12):e1000585PubMedCrossRefGoogle Scholar
  4. Connolly M (1983) Analytical molecular surface calculation. J Appl Cryst 16:548–558CrossRefGoogle Scholar
  5. Delano W (2002) The PyMOL Molecular Graphics System.
  6. Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res 34(Web Server issue):W116–W118Google Scholar
  7. Ghersi D, Sanchez R (2009) EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures. Bioinformatics 25(23):3185–3186PubMedCrossRefGoogle Scholar
  8. Glaser F, Rosenberg Y, Kessel A, Pupko T, Ben-Tal N (2005) The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins 58:610–617PubMedCrossRefGoogle Scholar
  9. Hendlich M, Rippmann F, Barnickel G (1997) LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 15(6):359–363PubMedCrossRefGoogle Scholar
  10. Hernandez M, Ghersi D, Sanchez R (2009) SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37(Web Server issue):W413–W416Google Scholar
  11. Huang B (2009) MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS 13(4):325–330PubMedCrossRefGoogle Scholar
  12. Huang B, Schroeder M (2006) LIGSITEcsc: predicting ligand binding sites using the Connolly surface and degree of conservation. BMC Struct Biol 6(1):19PubMedCrossRefGoogle Scholar
  13. Kawabata T (2010) Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins 78(5):1195–1211PubMedCrossRefGoogle Scholar
  14. Kawabata T, Go N (2007) Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins 68(2):516–529PubMedCrossRefGoogle Scholar
  15. Laskowski RA (1995) SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. J Mol Graph 13(5):323–330, 307–308Google Scholar
  16. Laurie AT, Jackson RM (2005) Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 21(9):1908–1916PubMedCrossRefGoogle Scholar
  17. Le Guilloux V, Schmidtke P, Tuffery P (2009) Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 10:168PubMedCrossRefGoogle Scholar
  18. Levitt D, Banaszak L (1992) POCKET: a computer graphics method for identifying and displaying protein cavities and their surrounding amino acids. J Mol Graph 10:229–234PubMedCrossRefGoogle Scholar
  19. Masuya M, Doi J (1995) Detection and grometric modeling of molecular surfaces and cavities using digital mathematical morphological operations. J Mol Graph 13(6):331–336PubMedCrossRefGoogle Scholar
  20. Weisel M, Proschak E, Schneider G (2007) PocketPicker: analysis of ligand binding-sites with shape descriptors. Chem Cent J 1:7PubMedCrossRefGoogle Scholar
  21. Yu J, Zhou Y, Tanaka I, Yao M (2010) Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics 26(1):46–52PubMedCrossRefGoogle Scholar
  22. Zhang Z, Li Y, Lin B, Schroeder M, Huang B (2011) Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics 27(15):2083–2088CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Systems Biology Division, Zhejiang-California International NanoSystems InstituteZhejiang UniversityHangzhouChina
  2. 2.Bioinformatics Group, Biotechnology CenterTechnical University of DresdenDresdenGermany

Personalised recommendations