Protein Clustering on a Grassmann Manifold

  • Chendra Hadi Suryanto
  • Hiroto Saigo
  • Kazuhiro Fukui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


We propose a new method for clustering 3D protein structures. In our method, the 3D structure of a protein is represented by a linear subspace, which is generated using PCA from the set of synthesized multi-view images of the protein. The similarity of two protein structures is then defined by the canonical angles between the corresponding subspaces. The merit of this approach is that we can avoid the difficulties of protein structure alignments because this similarity measure does not rely on the precise alignment and geometry of each alpha carbon atom. In this approach, we tackle the protein structure clustering problem by considering the set of subspaces corresponding to the various proteins. The clustering of subspaces with the same dimension is equivalent to the clustering of a corresponding set of points on a Grassmann manifold. Therefore, we call our approach the Grassmannian Protein Clustering Method (GPCM). We evaluate the effectiveness of our method through experiments on the clustering of randomly selected proteins from the Protein Data Bank into four classes: alpha, beta, alpha/beta, alpha+beta (with multi-domain protein). The results show that GPCM outperforms the k-means clustering with Gauss Integrals Tuned, which is a state-of-the-art descriptor of protein structure.


protein structure clustering k-means Mutual Subspace Method Grassmann manifold Gauss Integrals 


  1. 1.
    Holm, L., Sander, C.: DALI: a network tool for protein structure comparison. Trends Biochem. Sci. 20, 478–480 (1995)CrossRefGoogle Scholar
  2. 2.
    Shindyalov, I., Bourne, P.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 11, 739–747 (1998)CrossRefGoogle Scholar
  3. 3.
    Orengo, C.A., Taylor, W.R.: SSAP: Sequential structure alignment program for protein structure comparison. Methods in Enzymology 266, 617–635 (1996)CrossRefGoogle Scholar
  4. 4.
    Røgen, P., Bohr, H.G.: A new family of global protein shape descriptors. Mathematical Biosciences 182(2), 167–181 (2003)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Røgen, P.: Evaluating protein structure descriptors and tuning Gauss Integrals based descriptors. Journal of Physics Condensed Matter 17, 1523–1538 (2005)CrossRefGoogle Scholar
  6. 6.
    Suryanto, C.H., Jiang, S., Fukui, K.: Protein structures similarity based on multi-view images generated from 3D molecular visualization. In: International Conf. on Pattern Recognition, ICPR 2012 (to appear 2012)Google Scholar
  7. 7.
    Chatelin, F.: Eigenvalues of matrices. John Wiley & Sons, Chichester (1993)zbMATHGoogle Scholar
  8. 8.
    Jmol: an open-source Java viewer for chemical structures in 3D,
  9. 9.
    Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: International Conf. on Face and Gesture Recognition, pp. 318–323 (1998)Google Scholar
  10. 10.
    Fukui, K., Yamaguchi, O.: Face recognition using multi-viewpoint patterns for robot vision. In: 11th International Symposium of Robotics Research, pp. 192–201 (2003)Google Scholar
  11. 11.
    Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T.: Fast large-scale clustering of protein structures using Gauss Integrals. Journal of Bioinformatics, 510–515 (2012)Google Scholar
  12. 12.
    Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press, England (1983)Google Scholar
  13. 13.
    Begelfor, E., Werman, M.: Affine invariance revisited. In: Proceedings of International Conf. on Computer Vision and Pattern Recognition, pp. 2087–2094 (2006)Google Scholar
  14. 14.
    Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Information Theory 28, 129–137 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Otsu, N., Kurita, T.: A new scheme for practical flexible and intelligent vision systems. In: Proc. of IAPR Workshop on CV, pp. 431–435 (1988)Google Scholar
  16. 16.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., et al.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  17. 17.
    Fukui, K., Stenger, B., Yamaguchi, O.: A Framework for 3D Object Recognition Using the Kernel Constrained Mutual Subspace Method. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 315–324. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Chendra Hadi Suryanto
    • 1
  • Hiroto Saigo
    • 2
  • Kazuhiro Fukui
    • 1
  1. 1.Graduate School of Systems and Information Engineering, Department of Computer ScienceUniversity of TsukubaJapan
  2. 2.Department of Bioscience and BioinformaticsKyushu Institute of TechnologyJapan

Personalised recommendations