Online Structural Graph Clustering Using Frequent Subgraph Mining

Seeland, Madeleine; Girschick, Tobias; Buchwald, Fabian; Kramer, Stefan

doi:10.1007/978-3-642-15939-8_14

Online Structural Graph Clustering Using Frequent Subgraph Mining

Madeleine Seeland²³,
Tobias Girschick²³,
Fabian Buchwald²³ &
…
Stefan Kramer²³

Conference paper

3946 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6323))

Abstract

The goal of graph clustering is to partition objects in a graph database into different clusters based on various criteria such as vertex connectivity, neighborhood similarity or the size of the maximum common subgraph. This can serve to structure the graph space and to improve the understanding of the data. In this paper, we present a novel method for structural graph clustering, i.e. graph clustering without generating features or decomposing graphs into parts. In contrast to many related approaches, the method does not rely on computationally expensive maximum common subgraph (MCS) operations or variants thereof, but on frequent subgraph mining. More specifically, our problem formulation takes advantage of the frequent subgraph miner gSpan (that performs well on many practical problems) without effectively generating thousands of subgraphs in the process. In the proposed clustering approach, clusters encompass all graphs that share a sufficiently large common subgraph. The size of the common subgraph of a graph in a cluster has to take at least a user-specified fraction of its overall size. The new algorithm works in an online mode (processing one structure after the other) and produces overlapping (non-disjoint) and non-exhaustive clusters. In a series of experiments, we evaluated the effectiveness and efficiency of the structural clustering algorithm on various real world data sets of molecular graphs.

Download to read the full chapter text

Chapter PDF

References

Inokuchi, A., Washio, T., Motoda, H.: An APriori-based algorithm for mining frequent substructures from graph data. In: PKDD ’00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 13–23 (2000)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721–724 (2002)
Google Scholar
Tsuda, K., Kudo, T.: Clustering Graphs by Weighted Substructure Mining. In: Cohen, W.W., Moore, A. (eds.) ICML 2006, pp. 953–960. ACM Press, New York (2006)
Chapter Google Scholar
Stahl, M., Mauser, H.: Database clustering with a combination of fingerprint and maximum common substructure methods. J. Chem. Inf. Model. 45, 542–548 (2005)
Article Google Scholar
Tsuda, K., Kurihara, K.: Graph mining with variational Dirichlet process mixture models. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 432–442 (2008)
Google Scholar
Martin, Y.C., Kofron, J.L., Traphagen, L.M.: Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002)
Article Google Scholar
Weinstein, J., Kohn, K., Grever, M., Viswanadhan, V.: Neural computing in cancer drug development: Predicting mechanism of action. Science 258, 447–451 (1992)
Article Google Scholar
Koutsoukos, A.D., Rubinstein, L.V., Faraggi, D., Simon, R.M., Kalyandrug, S., Weinstein, J.N., Kohn, K.W., Paull, K.D.: Discrimination techniques applied to the NCI in vitro anti-tumour drug screen: predicting biochemical mechanism of action. Stat. Med. 13, 719–730 (1994)
Article Google Scholar
Raymond, J.W., Willett, P.: Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided. Mol. Des. 16(1), 59–71 (2002)
Article Google Scholar
McGregor, M.J., Pallai, P.V.: Clustering of large databases of compounds: Using the MDL ”keys” as structural descriptors. J. Chem. Inform. Comput. Sci. 37(3), 443–448 (1997)
Google Scholar
Yoshida, T., Shoda, R., Motoda, H.: Graph clustering based on structural similarity of fragments. In: Jantke, K.P., et al. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 97–114. Springer, Heidelberg (2006)
Chapter Google Scholar
Günter, S., Bunke, H.: Validation indices for graph clustering. Pattern Recogn. Lett. 24(8), 1107–1113 (2003)
Article MATH Google Scholar
Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1997)
MATH Google Scholar
Chen, Y.L., Hu, H.L.: An overlapping cluster algorithm to provide non-exhaustive clustering. Eur. J. Oper. Res. 173(3), 762–780 (2006)
Article MATH MathSciNet Google Scholar
Raymond, J.W., Blankley, C.J., Willett, P.: Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J. Mol. Graph. Model. 21(5), 421–433 (2003)
Article Google Scholar
Bunke, H., Foggia, P., Guidobaldi, C., Sansone, C., Vento, M.: A comparison of algorithms for maximum common subgraph on randomly connected graphs. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 123–132. Springer, Heidelberg (2002)
Chapter Google Scholar
Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: XProj: a framework for projected structural clustering of XML documents. In: KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55. ACM, New York (2007)
Chapter Google Scholar
Wang, K., Han, J.: Bide: Efficient mining of frequent closed sequences. In: International Conference on Data Engineering (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik/I12, Technische Universität München, Boltzmannstr. 3, 85748, Garching b. München, Germany
Madeleine Seeland, Tobias Girschick, Fabian Buchwald & Stefan Kramer

Authors

Madeleine Seeland
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Girschick
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Buchwald
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Seeland, M., Girschick, T., Buchwald, F., Kramer, S. (2010). Online Structural Graph Clustering Using Frequent Subgraph Mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-15939-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics