Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors

Günnemann, Stephan; Boden, Brigitte; Färber, Ines; Seidl, Thomas

doi:10.1007/978-3-642-37453-1_22

Stephan Günnemann²³,
Brigitte Boden²³,
Ines Färber²³ &
…
Thomas Seidl²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7818))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4091 Accesses
17 Citations

Abstract

Large graphs are ubiquitous in today’s applications. Besides the mere graph structure, data sources usually provide information about single objects by feature vectors. To realize the full potential for knowledge extraction, recent approaches consider both information types simultaneously. Thus, for the task of clustering, combined clustering models determine object groups within one network that are densely connected and show similar characteristics. However, due to the inherent complexity of such a combination, the existing methods are not efficiently executable and are hardly applicable to large graphs.

In this work, we develop a method for an efficient clustering of combined data sources, while at the same time finding high-quality results. We prove the complexity of our model and identify the critical parts inhibiting an efficient execution. Based on this analysis, we develop the algorithm EDCAR that approximates the optimal clustering solution using the established GRASP (Greedy Randomized Adaptive Search) principle. In thorough experiments we show that EDCAR outperforms all competing approaches in terms of runtime and simultaneously achieves high clustering qualities. For repeatability and further research we publish all datasets, executables and parameter settings on our website.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Wang, H.: Managing and Mining Graph Data. Springer (2010)
Google Scholar
Günnemann, S., Färber, I., Boden, B., Seidl, T.: Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In: ICDM, pp. 845–850 (2010)
Google Scholar
Günnemann, S., Boden, B., Seidl, T.: DB-CSC: A density-based approach for subspace clustering in graphs with feature vectors. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS (LNAI), vol. 6911, pp. 565–580. Springer, Heidelberg (2011)
Chapter Google Scholar
Günnemann, S., Boden, B., Seidl, T.: Finding density-based subspace clusters in graphs with feature vectors. Data Min. Knowl. Discov. 25(2), 243–269 (2012)
Article MathSciNet MATH Google Scholar
Günnemann, S., Färber, I., Müller, E., Assent, I., Seidl, T.: External evaluation measures for subspace clustering. In: CIKM, pp. 1363–1372 (2011)
Google Scholar
Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, 145–154 (2002)
Article Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)
Article Google Scholar
Liu, G., Wong, L.: Effective pruning techniques for mining quasi-cliques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 33–49. Springer, Heidelberg (2008)
Chapter Google Scholar
Long, B., Wu, X., Zhang, Z.M., Yu, P.S.: Unsupervised learning on k-partite graphs. In: KDD, pp. 317–326 (2006)
Google Scholar
Long, B., Zhang, Z.M., Yu, P.S.: A probabilistic framework for relational clustering. In: KDD, pp. 470–479 (2007)
Google Scholar
Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp. 533–541 (2008)
Google Scholar
Moser, F., Colak, R., Rafiey, A., Ester, M.: Mining cohesive patterns from graphs with feature vectors. In: SDM, pp. 593–604 (2009)
Google Scholar
Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386 (2009)
Google Scholar
Pitsoulis, L., Resende, M.: Greedy randomized adaptive search procedures. In: Handbook of Applied Optimization, pp. 168–183. Oxford University Press, New York (2002)
Google Scholar
Procopiuc, C., Jones, M., Agarwal, P., Murali, T.: A Monte Carlo algorithm for fast projective clustering. In: SIGMOD, pp. 418–427 (2002)
Google Scholar
Resende, M.G.C., Ribeiro, C.C.: Greedy Randomized Adaptive Search Procedures: Advances, Hybridizations, and Applications. Int. Series in Op. Research & Management Science, pp. 283–320 (2010)
Google Scholar
Rymon, R.: Search through systematic set enumeration. In: KR, pp. 539–550 (1992)
Google Scholar
Shiga, M., Takigawa, I., Mamitsuka, H.: A spectral clustering approach to optimally combining numerical vectors with a modular network. In: KDD, pp. 647–656 (2007)
Google Scholar
Ulitsky, I., Shamir, R.: Identification of functional modules using network topology and high-throughput data. BMC Systems Biology 1(1) (2007)
Google Scholar
Zeng, Z., Wang, J., Zhou, L., Karypis, G.: Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp. 797–802 (2006)
Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. PVLDB 2(1), 718–729 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

RWTH Aachen University, Germany
Stephan Günnemann, Brigitte Boden, Ines Färber & Thomas Seidl

Authors

Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Boden
View author publications
You can also search for this author in PubMed Google Scholar
Ines Färber
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Dept. of Computer Science and Information Engineering, Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
Vincent S. Tseng
Faculty of Engineering and Information Technology, University of Technology Sydney, Broadway, P.O. Box 123, 2007, Sydney, NSW, Australia
Longbing Cao & Guandong Xu &
Asian Office of Aerospace Research and Development (AOARD), Air Force Office of Scientific Research (AFOSR), Air Force Research Laboratory USA, Osaka University, 7-23-17 Roppongi, 106-0032, Minato-ku, Tokyo, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Günnemann, S., Boden, B., Färber, I., Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7818. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37453-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-37453-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37452-4
Online ISBN: 978-3-642-37453-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics