Data Mining and Knowledge Discovery

, Volume 13, Issue 2, pp 243–260 | Cite as

Support measures for graph data*

  • N. Vanetik
  • S. E. Shimony
  • E. Gudes


The concept of support is central to data mining. While the definition of support in transaction databases is intuitive and simple, that is not the case in graph datasets and databases. Most mining algorithms require the support of a pattern to be no greater than that of its subpatterns, a property called anti-monotonicity, or admissibility. This paper examines the requirements for admissibility of a support measure. Support measures for mining graphs are usually based on the notion of an instance graph---a graph representing all the instances of the pattern in a database and their intersection properties. Necessary and sufficient conditions for support measure admissibility, based on operations on instance graphs, are developed and proved. The sufficient conditions are used to prove admissibility of one support measure—the size of the independent set in the instance graph. Conversely, the necessary conditions are used to quickly show that some other support measures, such as weighted count of instances, are not admissible.


Data mining Graph mining Support measures 


  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proc. of the 20th Int'l Conf. on VLDB, Santiago, ChileGoogle Scholar
  2. Bray T, Paoli J, Sperberg-McQueen C, (Eds.) (1998) Extensible Markup Language (XML) 1.0, February,
  3. Chamberlin D (2003) XQuery: A query language for XML, Proceedings of SIGMOD ConferenceGoogle Scholar
  4. Chen MS, Park JS, Yu PS (1998) Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering 10(2):209–221Google Scholar
  5. Dehaspe L, Toivonen H, King RD (1998) Finding frequent substructures in chemical compounds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) New York, New York, pp. 30-36Google Scholar
  6. Deutsch A, Fernandez M, Florescu D, Levy A, Maier D, Suciu D (1999) Querying XML data. IEEE Data Engineering Bulletin 22(3):27–34Google Scholar
  7. Deutsch A, Fernandez MF, Suciu D (1999) Storing semistructured data with STORED. Proceedings of SIGMOD Conference, pp. 431–442Google Scholar
  8. Domshlak C, Brafman R, Shimony SE (2001) Preference-based configuration of web page content. Proceedings of IJCAIGoogle Scholar
  9. Goldman R, Widom J (1997) DataGuides: Enabling query formulation and optimization in semistructured databases. Proc. of 23rd VLDB Conf., Athens, GreeceGoogle Scholar
  10. Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. Proceedings of ICDM, pp. 721–724Google Scholar
  11. Huffman SB, Baudin C, Toward structured retrieval in semi-structured information spaces, Proceedings of IJCAI-97, Nagaya, Japan, pp. 751–756Google Scholar
  12. Inokuchi A, Washio T, Motoda H (2000) An apriori based algorithm for mining frequent substructures from graph data. Proceedings of PKDD00Google Scholar
  13. Kuramochi M, Karypis G (2004) Finding Frequent Patterns in a Large Sparse Graph Proceedings 2004 SIAM Data Mining Conference, Orlando, FloridaGoogle Scholar
  14. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. Proceedings of IEEE ICDMGoogle Scholar
  15. Lin X, Liu Ch, Zhang Y, Zhou X (1998) Efficiently computing frequent tree-like topology patterns in a web environment. Proceedings of 31st Int. Conf. on Tech. of Object-Oriented Language and SystemsGoogle Scholar
  16. Maximum weight clique program, pat/wclique.html
  17. McKay BD (1998) Isomorph-free exhaustive generation. Journal of Algorithms 26:306–324Google Scholar
  18. Meisels A, Orlov M, Maor T (2001) Discovering associations in XML data. BGU Technical reportGoogle Scholar
  19. Milner R (1983) Calculi for synchrony and asynchrony. Proceedings of TCS 25:267–310Google Scholar
  20. Ng RT, Lakshmanan LVS, Han J, A. Pang (1998) Exploratory mining and pruning optimizations of constrained association rules. Proceedings of SIGMOD Conference, pp. 13–24Google Scholar
  21. Movie database,
  22. Ostergard PRJ (2001) A new algorithm for the maximum-weight clique problem, Helsinki University of Technology, internal reportGoogle Scholar
  23. Pennec X, Ayache N (1998) A geometric algorithm to find small but highly similar 3D substructures in proteins. Bioinformatics 14(6):516–522Google Scholar
  24. Srikant R, Agrawal R (1995) Mining generalized association rules. Proceedings of the 21st Int'l Conference on Very Large Databases, Zurich, SwitzerlandGoogle Scholar
  25. Vanetik N (2002) Discovery of frequent patterns in semi-structured data. M.Sc. thesis. Dept. of Computer Science, Ben Gurion UniversityGoogle Scholar
  26. Vanetik N, Gudes E (2004) Mining frequent labeled and partially labeled graph patterns. Proceedings of ICDE, Boston, pp. 91–102Google Scholar
  27. Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. Proceedings ICDM, pp. 458–465Google Scholar
  28. Vanetik N, Shimony ES, Gudes E (2004) Computing frequent graph patterns using disjoint paths. submitted for a journal publicationGoogle Scholar
  29. Vanetik N, Gudes E, Shimony SE (2005) Support measures for graph data. Technical Report FC-06-02, Computer Science Dept., Ben Gurion UniversityGoogle Scholar
  30. Wang K, Liu H (1998) Discovering Typical Structures of Documents: A Road Map Approach. Proceedings of SIGIR, pp. 146–154Google Scholar
  31. Wang X, Wang JTLi, Shasha D, Shapiro B, Rigoutsos I, Zhang K (2002) Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining. IEEE Trans on Knowledge and Data Eng 14(4):731–749Google Scholar
  32. Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD explorationsGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceBen-Gurion University of the NegevBeer-ShevaIsrael

Personalised recommendations