Skip to main content

SEuS: Structure Extraction Using Summaries

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2534))

Abstract

We study the problem of finding frequent structures in semistructured data (represented as a directed labeled graph). Frequent structures are graphs that are isomorphic to a large number of subgraphs in the data graph. Frequent structures form building blocks for visual exploration and data mining of semistructured data.We overcome the inherent computational complexity of the problem by using a summary data structure to prune the search space and to provide interactive feedback. We present an experimental study of our methods operating on real datasets. The implementation of our methods is capable of operating on datasets that are two to three orders of magnitude larger than those described in prior work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tatsuya Asai, Kenji Abe, Shinji Kawasoe, et al. Efficient substructure discovery from large semi-structured data. In Proc. of the Second SIAM International Conference on Data Mining, 2002.

    Google Scholar 

  2. R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in massive databases. SIGMOD Record, 22(2):207–216, June 1993.

    Article  Google Scholar 

  3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th International Conference Very Large Data Bases, pages 487–499. Morgan Kaufmann, 1994.

    Google Scholar 

  4. P. Buneman, S. B. Davidson, M. F. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proc. of the 6th International Conference on Database Theory, 1997.

    Google Scholar 

  5. D. Conklin and J. Glasgow. Spatial analogy and subsumption. In Proc. of the Ninth International Conference on Machine Learning, pages 111–116, 1992.

    Google Scholar 

  6. D. J. Cook and L. B. Holder. Graph-based data mining. ISTA: Intelligent Systems & their applications, 15, 2000.

    Google Scholar 

  7. D. Conklin. Structured concept discovery: Theory and methods. Technical Report 94-366, Queen’s University, 1994.

    Google Scholar 

  8. Gao Cong, Lan Yi, Bing Liu, and Ke Wang. Discovering frequent substructures from hierarchical semi-structured data. In Proc. of the Second SIAM International Conference on Data Mining, 2002.

    Google Scholar 

  9. D. H. Fisher, Jr. Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2):139–172, 1987.

    Google Scholar 

  10. S. Fortin. The graph isomorphism problem. Technical Report 96-20, University of Alberta, 1996.

    Google Scholar 

  11. Shayan Ghazizadeh and Sudarshan Chawathe. Discovering freuqent structures using summaries. Technical report, University of Maryland, Computer Science Department, 2002.

    Google Scholar 

  12. J. H. Gennari, P. Langley, and D. Fisher. Models of incremental concept formation. Artificial Intelligence, (40):11–61, 1989.

    Google Scholar 

  13. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. of the Twenty-Third International Conference on Very Large Data Bases, pages 436–445, 1997.

    Google Scholar 

  14. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 13–23, 2000.

    Google Scholar 

  15. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the 1st IEEE Conference on Data Mining, 2001.

    Google Scholar 

  16. M. Lebowitz. Experiments with incremental concept formation: Unimem. Machine Learning, (2):103–138, 1987.

    Google Scholar 

  17. R. Levinson. A self-organizing retrieval system for graphs. In Proc. of the National Conference on Artificial Intelligence, pages 203–206, 1984.

    Google Scholar 

  18. B. D. McKay. nauty user’s guide (version 1.5), 2002.

    Google Scholar 

  19. S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. In Proc. of the Workshop on Management of Semistructured Data, 1997.

    Google Scholar 

  20. S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proc. of the ACM SIGMOD International Conference on Management of Data, pages 295–306, 1998.

    Google Scholar 

  21. S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative objects: Concise representations of semistructured, hierarchial data. In Proc. of the International Conference on Data Engineering, pages 79–90, 1997.

    Google Scholar 

  22. P. H. Winston. Learning structural descriptions from examples. In The Psychology of Computer Vision, pages 157–209. 1975.

    Google Scholar 

  23. K. Yoshida, H. Motoda, and N. Indurkhya. Unifying learning methods by colored digraphs. In Proc. of the InternationalWorkshop on Algorithmic Learning Theory, volume 744, pages 342–355, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghazizadeh, S., Chawathe, S.S. (2002). SEuS: Structure Extraction Using Summaries. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-36182-0_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00188-1

  • Online ISBN: 978-3-540-36182-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics