Fast SCOP Classification of Structural Class and Fold Using Secondary Structure Mining in Distance Matrix

  • Jian-Yu Shi
  • Yan-Ning Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


It is an urgent need to understand the structure-function relationship in proteomic era. One of the important techniques to meet this demand is to analyze and represent the spatial structure of domain which is the functional unit of the whole protein, and perform fast domain classification. In this paper, we introduce a novel method of rapid domain classification. Instead of analyzing directly protein sequence or 3-D tertiary structure, the presented method maps firstly tertiary structure of protein domain into 2-D Cα-Cα distance matrix. Then, two distance functions for alpha helix and beta strand are modeled by considering their geometrical properties respectively. After that, the distance functions are further applied to mine secondary structure elements in such distance matrix with the way similar to image processing. Furthermore, composition feature and arrangement feature of secondary structure elements are presented to characterize domain structure for classification of structural class and fold in Structural Classification of Proteins (SCOP) database. Finally, the results compared with other methods show that the presented method can perform effectively and efficiently automatic classification of domain with the benefit of low dimension and meaningful features, but also no need of complicated classifier system.


SCOP classification protein structure distance matrix secondary structure mining image processing support vector machines 


  1. 1.
    Krissinel, E.: On the Relationship between Sequence and Structure Similarities in Proteomics. Bioinformatics 23, 717–723 (2007)CrossRefPubMedGoogle Scholar
  2. 2.
    Andreeva, A., Howorth, D., Chandonia, J.-M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data Growth and Its Impact on the SCOP Database: New Developments. Nucleic Acids Research 36, D419–D425 (2008)CrossRefGoogle Scholar
  3. 3.
    Alison, L.C., Ian, S., Tony, L., Oliver, C.R., Richard, G., Janet, T., Christine, A.: The CATH Classification Revisited–Architectures Reviewed and New Ways to Characterize Structural Divergence in Superfamilies. Nucleic Acids Research 37, D310–D314 (2008)Google Scholar
  4. 4.
    Bastolla, U., Ortíz, A.R., Porto, M., Teichert, F.: Effective Connectivity Profile: A Structural Representation That Evidences the Relationship between Protein Structures and Sequences. Proteins: Structure, Function, and Bioinformatics 73, 872–888 (2008)CrossRefGoogle Scholar
  5. 5.
    Ding, C.H.Q., Dubchak, I.: Multi-Class Protein Fold Recognition Using Support Vector Machines and Neural Networks. Bioinformatics 17, 349–358 (2001)CrossRefPubMedGoogle Scholar
  6. 6.
    Shi, J.-Y., Zhang, S.-W., Liang, Y., Pan, Q.: Prediction of Protein Subcellular Localizations Using Moment Descriptors and Support Vector Machine. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds.) PRIB 2006. LNCS (LNBI), vol. 4146, pp. 105–114. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Lin, K.L., Lin, C.-Y., Huang, C.-D., Chang, H.-M., Yang, C.-Y., Lin, C.-T., Tang, C.Y., Hsu, D.F.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on NanoBioscience 6, 186–196 (2007)CrossRefPubMedGoogle Scholar
  8. 8.
    Shi, J.-Y., Zhang, S.-W., Pan, Q., Zhou, G.-P.: Using Pseudo Amino Acid Composition to Predict Protein Subcellular Location: Approached with Amino Acid Composition Distribution. Amino Acids 35, 321–327 (2008)CrossRefPubMedGoogle Scholar
  9. 9.
    Cai, Y.D., Liu, X.J., Xu, X.B., Chou, K.C.: Support Vector Machines for Prediction of Protein Subcellular Location by Incorporating Quasi-Sequence-Order Effect. Journal of Cellular Biochemistry 84, 343–348 (2002)CrossRefPubMedGoogle Scholar
  10. 10.
    Shi, J.-Y., Zhang, S.-W., Pan, Q., Cheng, Y.-M., Xie, J.: Prediction of Protein Subcellular Localization by Support Vector Machines Using Multi-Scale Energy and Pseudo Amino Acid Composition. Amino Acids 33, 69–74 (2007)CrossRefPubMedGoogle Scholar
  11. 11.
    Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3D shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–228. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Daras, P., Zarpalas, D., Axenopoulos, A., Tzovaras, D., Strintzis, M.G.: Three-Dimensional Shape-Structure Comparison Method for Protein Classification. IEEE Trans. Comput. Biol. Bioinformatics 3, 193–207 (2006)CrossRefGoogle Scholar
  13. 13.
    Gilbert, D., Westhead, D., Viksna, J., Thornton, J.: A Computer System to Perform Structure Comparison Using Tops Representations of Protein Structure. Comput. Chem. 26, 23–30 (2001)CrossRefPubMedGoogle Scholar
  14. 14.
    Anne, P.: Voronoi and Voronoi-Related Tessellations in Studies of Protein Structure and Interaction. Current Opinion in Structural Biology 14, 233–241 (2004)CrossRefGoogle Scholar
  15. 15.
    Choi, I.-G., Kwon, J., Kim, S.-H.: Local Feature Frequency Profile: A Method to Measure Structural Similarity in Proteins. Proceedings of the National Academy of Sciences of the United States of America 101, 3797–3802 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Marsolo, K., Parthasarathy, S.: Alternate Representation of Distance Matrices for Characterization of Protein Structure. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 298–305. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  17. 17.
    Sayre, T., Singh, R.: Protein Structure Comparison and Alignment Using Residue Contexts. In: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications – Workshops, pp. 796–801. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  18. 18.
    Shi, J.-Y., Zhang, S.-W., Pan, Q., Zhang, Y.-N.: Using Decision Templates to Predict Subcellular Localization of Protein. In: Rajapakse, J.C., Schmidt, B., Volkert, L.G. (eds.) PRIB 2007. LNCS (LNBI), vol. 4774, pp. 71–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Shi, J.-Y., Zhang, S.-W., Pan, Q., Liang, Y.: Protein Fold Recognition with Support Vector Machines Fusion Network. Progress in Biochemistry and Biophysics 33, 155–162 (2006)Google Scholar
  20. 20.
    Huang, C.-D., Lin, C.-T., Pal, N.R.: Hierarchical Learning Architecture with Automatic Feature Selection for Multiclass Protein Fold Classification. IEEE Transactions on NanoBioscience 2, 221–232 (2003)CrossRefPubMedGoogle Scholar
  21. 21.
    Marsolo, K., Parthasarathy, S., Ding, C.: A Multi-Level Approach to SCOP Fold Recognition. In: Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, pp. 57–64. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  22. 22.
    Chandonia, J., Hon, G., Walker, N., Lo Conte, L., Koehl, P., Levitt, M., Brenner, S.: The Astral Compendium in 2004. Nucleic Acids Research 32, D189–D192 (2004)CrossRefGoogle Scholar
  23. 23.
    Taylor, W.R., Orengo, C.A.: Protein Structure Alignment. J. Mol. Biol. 208, 1–22 (1989)CrossRefPubMedGoogle Scholar
  24. 24.
    Timothy, H., Irwin, K., Gordon, C.: The Theory and Practice of Distance Geometry. Bulletin of Mathematical Biology 45, 665–720 (1983)CrossRefGoogle Scholar
  25. 25.
    Vapnik, V.N.: An Overview of Statistical Learning Theory. IEEE Transactions on Neural Networks 10, 988–999 (1999)CrossRefPubMedGoogle Scholar
  26. 26.
    Hsu, C., Lin, C.J.: A Comparison of Methods for Multi-Class Support Vector Machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)CrossRefPubMedGoogle Scholar
  27. 27.
    Chinnasamy, A., Sung, W.K., Mittal, A.: Protein Structure and Fold Prediction Using Tree-Augmented Naive Bayesian Classifier. Journal of Bioinformatics and Computational Biology 3, 803–820 (2005)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jian-Yu Shi
    • 1
    • 2
  • Yan-Ning Zhang
    • 2
  1. 1.Faculty of Life SciencesNorthwestern Polytechnical UniversityChina
  2. 2.College of Computer ScienceNorthwestern Polytechnical UniversityXi’AnChina

Personalised recommendations