Abstract
We study the problem of finding frequent structures in semistructured data (represented as a directed labeled graph). Frequent structures are graphs that are isomorphic to a large number of subgraphs in the data graph. Frequent structures form building blocks for visual exploration and data mining of semistructured data.We overcome the inherent computational complexity of the problem by using a summary data structure to prune the search space and to provide interactive feedback. We present an experimental study of our methods operating on real datasets. The implementation of our methods is capable of operating on datasets that are two to three orders of magnitude larger than those described in prior work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Tatsuya Asai, Kenji Abe, Shinji Kawasoe, et al. Efficient substructure discovery from large semi-structured data. In Proc. of the Second SIAM International Conference on Data Mining, 2002.
R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in massive databases. SIGMOD Record, 22(2):207–216, June 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th International Conference Very Large Data Bases, pages 487–499. Morgan Kaufmann, 1994.
P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proc. of the 6th International Conference on Database Theory, 1997.
D. Conklin and J. Glasgow. Spatial analogy and subsumption. In Proc. of the Ninth International Conference on Machine Learning, pages 111–116, 1992.
D. J. Cook and L. B. Holder. Graph-based data mining. ISTA: Intelligent Systems & their applications, 15, 2000.
D. Conklin. Structured concept discovery: Theory and methods. Technical Report 94-366, Queen’s University, 1994.
Gao Cong, Lan Yi, Bing Liu, and Ke Wang. Discovering frequent substructures from hierarchical semi-structured data. In Proc. of the Second SIAM International Conference on Data Mining, 2002.
D. H. Fisher, Jr. Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2):139–172, 1987.
S. Fortin. The graph isomorphism problem. Technical Report 96-20, University of Alberta, 1996.
Shayan Ghazizadeh and Sudarshan Chawathe. Discovering freuqent structures using summaries. Technical report, University of Maryland, Computer Science Department, 2002.
J. H. Gennari, P. Langley, and D. Fisher. Models of incremental concept formation. Artificial Intelligence, (40):11–61, 1989.
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. of the Twenty-Third International Conference on Very Large Data Bases, pages 436–445, 1997.
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 13–23, 2000.
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the 1st IEEE Conference on Data Mining, 2001.
M. Lebowitz. Experiments with incremental concept formation: Unimem. Machine Learning, (2):103–138, 1987.
R. Levinson. A self-organizing retrieval system for graphs. In Proc. of the National Conference on Artificial Intelligence, pages 203–206, 1984.
B. D. McKay. nauty user’s guide (version 1.5), 2002.
S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. In Proc. of the Workshop on Management of Semistructured Data, 1997.
S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proc. of the ACM SIGMOD International Conference on Management of Data, pages 295–306, 1998.
S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative objects: Concise representations of semistructured, hierarchial data. In Proc. of the International Conference on Data Engineering, pages 79–90, 1997.
P. H. Winston. Learning structural descriptions from examples. In The Psychology of Computer Vision, pages 157–209. 1975.
K. Yoshida, H. Motoda, and N. Indurkhya. Unifying learning methods by colored digraphs. In Proc. of the InternationalWorkshop on Algorithmic Learning Theory, volume 744, pages 342–355, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ghazizadeh, S., Chawathe, S.S. (2002). SEuS: Structure Extraction Using Summaries. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_9
Download citation
DOI: https://doi.org/10.1007/3-540-36182-0_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive