FAST Community Detection for Proteins Graph-Based Functional Classification

  • Arbi Ben RejabEmail author
  • Imen Boukhris
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 941)


In this paper we present and evaluate a fast and parallel method that addresses the problem of similarity assessment between node-labeled and edge-weighted graphs which represent the binding pockets of protein. In order to predict the functional family of proteins, graphs can be used to model binding pockets to depict their geometry and physiochemical composition without information loss. To facilitate the measure of similarity on graphs, community detection can be used. Our approach is based on a parallel implementation of community detection algorithm which is an adaptation and extension of Louvain method. Compared to the existing solutions, our method can achieve nearly well-balanced workload among processors and higher accuracy of graph clustering on real-world large graphs.


Bioinformatics Graph-based similarity Community detection Protein binding sites classification Parallel processing 


  1. 1.
    Awal, G.K., Bharadwaj, K.: Team formation in social networks based on collective intelligence: an evolutionary approach, pp. 627–648 (2014)Google Scholar
  2. 2.
    Bengoetxea, E.: Inexact graph matching using estimation of distribution algorithms. Ecole Nationale Supérieure des Télécommunications, Paris 2(4), 49 (2002)Google Scholar
  3. 3.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRefGoogle Scholar
  4. 4.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks, P1008 (2008)Google Scholar
  5. 5.
    Boukhris, I., Elouedi, Z., Fober, T., Mernberger, M., Hullermeier, E.: Similarity analysis of protein binding sites: a generalization of the maximum common subgraph measure based on quasi-clique detection. In: ISDA, pp. 1245–1250. IEEE Computer Society (2009)Google Scholar
  6. 6.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)CrossRefGoogle Scholar
  7. 7.
    Cohen, J., Castonguay, P.: Efficient graph matching and coloring on the GPU. In: GPU Technology Conference, pp. 1–10 (2012)Google Scholar
  8. 8.
    Daxin, J., Jian, P.: Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. 16(1), 16–42 (2009)Google Scholar
  9. 9.
    Emmert-Streib, F., Dehmer, M., Shi, Y.: Fifty years of graph matching, network alignment and network comparison. Inf. Sci. 346, 180–197 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ferrer, M., Valveny, E., Serratosa, F.: Median graph: a new exact algorithm using a distance based on the maximum common subgraph. Pattern Recogn. Lett. 30(5), 579–588 (2009)CrossRefGoogle Scholar
  11. 11.
    Fober, T., Klebe, G., Hullermeier, E.: Local clique merging: an extension of the maximum common subgraph measure with applications in structural bioinformatics. In: Algorithms from and for Nature and Life, pp. 279–286 (2013)Google Scholar
  12. 12.
    Frasconi, P., Passerini, A.: Predicting the geometry of metal binding sites from protein sequence 9, 203–213 (2012)Google Scholar
  13. 13.
    Harary, F., Norman, R.Z.: Graph theory as a mathematical model in social science, p. 45 (1953)Google Scholar
  14. 14.
    Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9(4), 341 (1973)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)Google Scholar
  16. 16.
    Mallek, S., Boukhris, I., Elouedi, Z.: Community detection for graphbased similarity: application to protein binding pockets classification. Pattern Recogn. Lett. 62, 49–54 (2015)CrossRefGoogle Scholar
  17. 17.
    McGregor, J.J.: Backtrack search algorithms and the maximal common subgraph problem. Softw.: Pract. Experience 12(1), 23–34 (1982)zbMATHGoogle Scholar
  18. 18.
    Schmitt, S., Kuhn, D., Klebe, G.: A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323(2), 387–406 (2002)CrossRefGoogle Scholar
  19. 19.
    Shiokawa, H., Fujiwara, Y., Onizuka, M.: Fast algorithm for modularity-based graph clustering. In: AAAI, pp. 1170–1176 (2013)Google Scholar
  20. 20.
    Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Weskamp, N., Hullermeier, E., Kuhn, D., Klebe, G.: Multiple graph alignment for the structural analysis of protein active sites. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 4(2), 310–320 (2007)CrossRefGoogle Scholar
  22. 22.
    Wu, S.D., Byeon, E.S., Storer, R.: A graph-theoretic decomposition of the job shop scheduling problem to achieve scheduling robustness. Oper. Res. 47(1), 113–124 (1999)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: Data Mining (ICDM), pp. 1151–1156 (2013)Google Scholar
  24. 24.
    Chi, Y., Dai, G., Wang, Y., Sun, G., Li, G., Yang, H.: Nxgraph: an efficient graph processing system on a single machine. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 409-420, May 2016Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.LARODEC, Institut Supérieur de Gestion de Tunis, Université de TunisTunisTunisia

Personalised recommendations