Graph Mining: Repository vs. Canonical Form

Borgelt, Christian; Fiedler, Mathias

doi:10.1007/978-3-540-78246-9_27

Graph Mining: Repository vs. Canonical Form

Christian Borgelt⁵ &
Mathias Fiedler⁵

Conference paper

6010 Accesses

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

In frequent subgraph mining one tries to find all subgraphs that occur with a userspecified minimum frequency in a given graph database. The basic approach is to grow subgraphs, adding an edge and maybe a node in each step, to count the number of database graphs containing them, and to eliminate infrequent subgraphs. The predominant method to avoid redundant search (the same subgraph can be grown in several ways) is to define a canonical form that uniquely identifies a graph up to automorphisms. The obvious alternative, a repository of processed subgraphs, has received fairly little attention yet. However, if the repository is laid out as a hash table with a carefully designed hash function, this approach is competitive with canonical form pruning. In experiments we conducted, the repository-based approach could sometimes outperform canonical form pruning by 15%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BORGELT, C., and BERTHOLD, M.R. (2002): Mining Molecular Fragments: Finding Rel-evant Substructures of Molecules. Proc. IEEE Int. Conf. on Data Mining (ICDM 2002, Maebashi, Japan), 51-58. IEEE Press, Piscataway, NJ, USA
Google Scholar
BORGELT, C., MEINL, T., and BERTHOLD, M.R. (2005): MoSS: A Program for Molec-ular Substructure Mining. Workshop Open Source Data Mining Software (OSDM’05, Chicago, IL), 6-15. ACM Press, New York, NY, USA
Google Scholar
BORGELT, C. (2006): Canonical Forms for Frequent Graph Mining. Proc. 30th Ann. Conf. of the German Classification Society (GfKl 2006, Berlin, Germany). Springer-Verlag, Heidelberg, Germany
Google Scholar
COOK, D.J., and HOLDER, L.B. (2000) Graph-Based Data Mining. IEEE Trans. on Intelli-gent Systems 15(2):32-41. IEEE Press, Piscataway, NJ, USA
Google Scholar
FINN, P.W., MUGGLETON, S., PAGE, D., and SRINIVASAN, A. (1998): Pharmacore Dis-covery Using the Inductive Logic Programming System PROGOL. Machine Learning, 30 (2-3):241-270. Kluwer, Amsterdam, Netherlands
Google Scholar
HUAN, J., WANG, W., and PRINS, J. (2003): Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Proc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003, Melbourne, FL), 549-552. IEEE Press, Piscataway, NJ, USA
Google Scholar
INDEX CHEMICUS — Subset from 1993. Institute of Scientific Information, Inc. (ISI). Thomson Scientific, Philadelphia, PA, USA 1993 http://www.thomsonscientific.com/products/indexchemicus/
KRAMER, S., DE RAEDT, L., and HELMA, C. (2001): Molecular Feature Mining in HIV Data. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001, San Francisco, CA), 136-143. ACM Press, New York, NY, USA
Google Scholar
KURAMOCHI, M., and KARYPIS, G. (2001): Frequent Subgraph Discovery. Proc. 1st IEEE Int. Conf. on Data Mining (ICDM 2001, San Jose, CA), 313-320. IEEE Press, Piscataway, NJ, USA
Google Scholar
NIJSSEN, S., and KOK, J.N. (2004): A Quickstart in Frequent Structure Mining Can Make a Difference. Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD2004, Seattle, WA), 647-652. ACM Press, New York, NY, USA
Google Scholar
YAN, X., and HAN, J. (2002): gSpan: Graph-Based Substructure Pattern Mining. Proc. 2nd IEEE Int. Conf. on Data Mining (ICDM 2003, Maebashi, Japan), 721-724. IEEE Press, Piscataway, NJ, USA
Google Scholar
YAN, X., and HAN, J. (2003): Closegraph: Mining Closed Frequent Graph Patterns. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), 286-295. ACM Press, New York, NY, USA
Google Scholar

Download references

Author information

Authors and Affiliations

European Center for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, 33600, Mieres, Spain
Christian Borgelt & Mathias Fiedler

Authors

Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Fiedler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borgelt, C., Fiedler, M. (2008). Graph Mining: Repository vs. Canonical Form. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics