# Support measures for graph data^{*}

- 222 Downloads
- 22 Citations

## Abstract

The concept of support is central to data mining. While the definition of support in transaction databases is intuitive and simple, that is not the case in graph datasets and databases. Most mining algorithms require the support of a pattern to be no greater than that of its subpatterns, a property called *anti-monotonicity*, or admissibility. This paper examines the requirements for admissibility of a support measure. Support measures for mining graphs are usually based on the notion of an instance graph---a graph representing all the instances of the pattern in a database and their intersection properties. Necessary and sufficient conditions for support measure admissibility, based on operations on instance graphs, are developed and proved. The sufficient conditions are used to prove admissibility of one support measure—the size of the independent set in the instance graph. Conversely, the necessary conditions are used to quickly show that some other support measures, such as weighted count of instances, are not admissible.

## Keywords

Data mining Graph mining Support measures## References

- Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proc. of the 20th Int'l Conf. on VLDB, Santiago, ChileGoogle Scholar
- Bray T, Paoli J, Sperberg-McQueen C, (Eds.) (1998) Extensible Markup Language (XML) 1.0, February, http://www.w3.org/XML/#9802xml10
- Chamberlin D (2003) XQuery: A query language for XML, Proceedings of SIGMOD ConferenceGoogle Scholar
- Chen MS, Park JS, Yu PS (1998) Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering 10(2):209–221Google Scholar
- Dehaspe L, Toivonen H, King RD (1998) Finding frequent substructures in chemical compounds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) New York, New York, pp. 30-36Google Scholar
- Deutsch A, Fernandez M, Florescu D, Levy A, Maier D, Suciu D (1999) Querying XML data. IEEE Data Engineering Bulletin 22(3):27–34Google Scholar
- Deutsch A, Fernandez MF, Suciu D (1999) Storing semistructured data with STORED. Proceedings of SIGMOD Conference, pp. 431–442Google Scholar
- Domshlak C, Brafman R, Shimony SE (2001) Preference-based configuration of web page content. Proceedings of IJCAIGoogle Scholar
- Goldman R, Widom J (1997) DataGuides: Enabling query formulation and optimization in semistructured databases. Proc. of 23rd VLDB Conf., Athens, GreeceGoogle Scholar
- Graph Matching Library, http://amalfi.dis.unina.it/graph/db/vflib-2.0/doc/vflib.html
- Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. Proceedings of ICDM, pp. 721–724Google Scholar
- Huffman SB, Baudin C, Toward structured retrieval in semi-structured information spaces, Proceedings of IJCAI-97, Nagaya, Japan, pp. 751–756Google Scholar
- Inokuchi A, Washio T, Motoda H (2000) An apriori based algorithm for mining frequent substructures from graph data. Proceedings of PKDD00Google Scholar
- Kuramochi M, Karypis G (2004) Finding Frequent Patterns in a Large Sparse Graph Proceedings 2004 SIAM Data Mining Conference, Orlando, FloridaGoogle Scholar
- Kuramochi M, Karypis G (2001) Frequent subgraph discovery. Proceedings of IEEE ICDMGoogle Scholar
- Lin X, Liu Ch, Zhang Y, Zhou X (1998) Efficiently computing frequent tree-like topology patterns in a web environment. Proceedings of 31st Int. Conf. on Tech. of Object-Oriented Language and SystemsGoogle Scholar
- Maximum weight clique program, http://www.tcs.hut.fi/ pat/wclique.html
- McKay BD (1998) Isomorph-free exhaustive generation. Journal of Algorithms 26:306–324Google Scholar
- Meisels A, Orlov M, Maor T (2001) Discovering associations in XML data. BGU Technical reportGoogle Scholar
- Milner R (1983) Calculi for synchrony and asynchrony. Proceedings of TCS 25:267–310Google Scholar
- Ng RT, Lakshmanan LVS, Han J, A. Pang (1998) Exploratory mining and pruning optimizations of constrained association rules. Proceedings of SIGMOD Conference, pp. 13–24Google Scholar
- Movie database, http://us.imdb.com
- Ostergard PRJ (2001) A new algorithm for the maximum-weight clique problem, Helsinki University of Technology, internal reportGoogle Scholar
- Pennec X, Ayache N (1998) A geometric algorithm to find small but highly similar 3D substructures in proteins. Bioinformatics 14(6):516–522Google Scholar
- Srikant R, Agrawal R (1995) Mining generalized association rules. Proceedings of the 21st Int'l Conference on Very Large Databases, Zurich, SwitzerlandGoogle Scholar
- Vanetik N (2002) Discovery of frequent patterns in semi-structured data. M.Sc. thesis. Dept. of Computer Science, Ben Gurion UniversityGoogle Scholar
- Vanetik N, Gudes E (2004) Mining frequent labeled and partially labeled graph patterns. Proceedings of ICDE, Boston, pp. 91–102Google Scholar
- Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. Proceedings ICDM, pp. 458–465Google Scholar
- Vanetik N, Shimony ES, Gudes E (2004) Computing frequent graph patterns using disjoint paths. submitted for a journal publicationGoogle Scholar
- Vanetik N, Gudes E, Shimony SE (2005) Support measures for graph data. Technical Report FC-06-02, Computer Science Dept., Ben Gurion UniversityGoogle Scholar
- Wang K, Liu H (1998) Discovering Typical Structures of Documents: A Road Map Approach. Proceedings of SIGIR, pp. 146–154Google Scholar
- Wang X, Wang JTLi, Shasha D, Shapiro B, Rigoutsos I, Zhang K (2002) Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining. IEEE Trans on Knowledge and Data Eng 14(4):731–749Google Scholar
- Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD explorationsGoogle Scholar