Abstract
Given an undirected/directed large weighted data graph and a similar smaller weighted pattern graph, the problem of weighted subgraph matching is to find a mapping of the nodes in the pattern graph to a subset of nodes in the data graph such that the sum of edge weight differences is minimum. Biological interaction networks such as protein-protein interaction networks and molecular pathways are often modeled as weighted graphs in order to account for the high false positive rate occurring intrinsically during the detection process of the interactions. Nonetheless, complex biological problems such as disease gene prioritization and conserved phylogenetic tree construction largely depend on the similarity calculation among the networks. Although several existing methods provide efficient methods for graph and subgraph similarity measurement, they produce nonintuitive results due to the underlying unweighted graph model assumption. Moreover, very few algorithms exist for weighted graph matching that are applicable with the restriction that the data and pattern graph sizes are equal. In this paper, we introduce a novel algorithm for weighted subgraph matching which can effectively be applied to directed/undirected weighted subgraph matching. Experimental results demonstrate the superiority and relative scalability of the algorithm over available state of the art methods.
Similar content being viewed by others
References
Almohamad, H. A., & Duffuaa, S. O. (1993). A linear programming approach for the weighted graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(5), 522–525.
Amin, M. S., Bhattacharjee, A., Finley, Jr., R. L., & Jamil, H. (2010a). A stochastic approach to candidate disease gene subnetwork extraction. In ACM international symposium on applied computing (pp. 1534–1538). Sierre, Switzerland.
Amin, M. S., Bhattacharjee, A., & Jamil, H. (2010b). A cytoscape based integrative framework for efficient sub-graph isomorphic protein-protein interaction motif lookup. In ACM international symposium on applied computing (pp. 1572–1576). Sierre, Switzerland.
Basuchowdhuri, P. (2009). Greedy methods for approximate graph matching with applications for social network analysis. Master’s thesis, Louisiana State University.
Bhattacharjee, A., & Jamil, H. (2011). CodeBlast: A graph matching approach toward computing functional similarity of interacting networks. Department of Computer Science, Wayne State University.
Date, S. V. (2007). Estimating protein function using protein-protein relationships. Methods in Molecular Biology, 408(12), 109–127.
El-Sonbaty, Y., & Ismail, M. A. (1998). A new algorithm for subgraph optimal isomorphism. Pattern Recognition, 31(2), 205–218.
Fortin, S. (1996). The graph isomorphism problem. Tech. rep., University of Alberta, Edmonton, Alberta, Canada.
Frank, M., & Wolfe, P. (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3, 95–110.
Gold, S., & Rangarajan, A. (1996). A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18, 377–388.
Hardy, J., & Singleton, A. (2009). Genomewide association studies and human disease. New England Journal of Medicine, 360, 1759–1768.
Ideker, T. (2007). Network genomics. Ernst Schering Foundation Symposium Proceedings, 61, 89–115.
Kann, M. G. (2007). Protein interactions and disease: computational approaches to uncover the etiology of diseases. Briefings in Bioinformatics, 8(5), 333–346.
Knossow, D., Sharma, A., Mateus, D., & Horaud, R. (2009). Inexact matching of large and sparse graphs using laplacian eigenvectors. In International workshop on graph-based representations in pattern recognition (pp. 144–153).
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistic Quarterly, 2, 83–97.
Luo, B., & Hancock, E. R. (2001). Structural graph matching using the em algorithm and singular value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1120–1136.
McKusick, V. A. (1998). Mendelian inheritance in man. A catalog of human genes and genetic disorders (12th ed.). Baltimore: Johns Hopkins University Press.
Munkres, J. (1957). Algorithms for the assignment and transportation problems. Journal of the Society of Industrial and Applied Mathematics, 5(1), 32–38.
Navarro, G. (1999). A guided tour to approximate string matching. ACM Computing Surveys, 33, 2001.
Raveaux, R., Burie, J. C., & Ogier, J. M. (2010). A graph matching method and a graph matching distance based on subgraph assignments. Pattern Recognition Letters, 31(5), 394–406.
Riesen, K., & Bunke, H. (2009). Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing, 27(7), 950–959.
Schwikowski, B., Uetz, P., & Fields, S. (2000). A network of protein-protein interactions in yeast. Nature Biotechnology, 18(12), 1257–1261.
Tarapata, Z., & Kasprzyk, R. (2009). An application of multicriteria weighted graph similarity method to social networks analyzing. In International conference on advances in social network analysis and mining (pp. 366–368).
Tian, Y., McEachin, R. C., Santos, C., States, D. J., & Patel, J. M. (2007). SAGA: A subgraph matching tool for biological graphs. Bioinformatics, 23(2), 232–239.
Tohsato, Y., Matsuda, H., & Hashimoto, A. (2000). A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy. In: ISMB (pp. 376–383).
Uetz, P., & Finley, Jr., R. L. (2005). From protein networks to biological systems. FEBS Letters, 579(8), 1821–1827.
Umeyama, S. (1988). An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5), 695–703.
Yu, J., Finley, J., & Russell, L. (2009). Combining multiple positive training sets to generate confidence scores for protein-protein interactions. Bioinformatics, 25(1), 105–111.
Yu, J., Pacifico, S., Liu, G., & Finley, R. (2008). DroID: The Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions. BMC Genomics, 9(1), 461–469.
Zaslavskiy, M., Bach, F., & Vert, J. P. (2009). A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2227–2242.
Zavlanos, M. M., & Pappas, G. J. (2008). A dynamical systems approach to weighted graph matching. Automatica, 44(11), 2817–2824.
Zhao, G., Luo, B., Tang, J., & Ma, J. (2007). Using eigen-decomposition method for weighted graph matching. In ICIC (pp. 1283–1294).
Acknowledgement
This research was supported in part by National Science Foundation grant IIS 0612203.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is dedicated to the loving memory of Anupam Bhattacharjee, the co-author and the driving force behind this paper, who passed away unexpectedly on September 6, 2010.
Rights and permissions
About this article
Cite this article
Bhattacharjee, A., Jamil, H.M. WSM: a novel algorithm for subgraph matching in large weighted graphs. J Intell Inf Syst 38, 767–784 (2012). https://doi.org/10.1007/s10844-011-0178-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-011-0178-z