Abstract
The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions.
ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berendt, B.: Using site semantics to analyze, visualize and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)
Berendt, B., Brenstein, E.: Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. BRMIC 33, 243–257 (2001)
Berendt, B., Hotho, A., Stumme, G.: Usage mining for and on the semantic web. In: Kargupta, H., et al. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 461–480. AAAI/MIT Press, Menlo Park (2004)
Berendt, B., Kralisch, A.: Analysing and visualising logfiles: the Individualised SiteMap tool ISM. In: Proc. GOR 2005 (2005)
Berendt, B., Spiliopoulou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)
Borgelt, C.: On Canonical Forms for Frequent Graph Mining. In: Proc. of Workshop on Mining Graphs, Trees, and Sequences (MGTS 2005 at PKDD 2005), pp. 1–12 (2005)
Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proc. 2nd Semantic Web Mining Workshop at PKDD 2001 (2001)
Dupret, G., Piwowarski, B.: Deducing a term taxonomy from term similarities. In: Proc. Knowledge Discovery and Ontologies Workshop at PKDD 2005, pp. 11-22 (2005)
Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using site semantics and a taxonomy to enhance the web personalization process. In: Proc. SIGKDD 2003, pp. 99–108 (2003)
Hofer, H., Borgelt, C., Berthold, M.R.: Large scale mining of molecular fragments with wildcards. In: Advances in Intelligent Data Analysis V, pp. 380–389 (2003)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphisms. In: Proc. ICDM, pp. 549–552 (2003)
Inokuchi, A.: Mining generalized substructures from a set of labeled graphs. In: Proc. ICDM 2004, pp. 414–418 (2004)
Inokuchi, I., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Inokuchi, I., Washio, T., Nishimura, K., Motoda, H.: A fast algorithm for mining frequent connected subgraphs. Research Report, IBM Research, Tokyo (2002)
Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., Agrawal, G.: Discovering frequent topological structures from graph datasets. In: Proc. SIGKDD 2005, pp. 606–611 (2005)
Jin, X., Zhou, Y., Mobasher, B.: A maximum entropy web recommendation system: Combining collaborative and content features. In: Proc. SIGKDD 2005, pp. 612–617 (2005)
Kralisch, A., Berendt, B.: Language sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia 11, 221–246 (2005)
Kralisch, A., Eisend, M., Berendt, B.: Impact of culture on website navigation behaviour. In: Proc. HCI-International (2005)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. ICDM, pp. 313–320 (2001)
McEneaney, J.E.: Graphic and numerical methods to assess navigation in hypertext. Int. J. of Human-Computer Studies 55, 761–786 (2001)
Meinl, T., Borgelt, C., Berthold, M.R.: Mining fragments with fuzzy chains in molecular databases. In: Proc. Worksh. Mining Graphs, Trees & Sequences at PKDD 2004, pp. 49–60 (2004)
Meo, R., Lanzi, P.L., Matera, M., Esposito, R.: Integrating web conceptual modeling and web usage mining. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 135–148. Springer, Heidelberg (2006)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proc. SIGKDD 2004, pp. 647–652 (2004), extended version: LIACS, Leiden Univ., Leiden, The Netherlands, Tech. Report (April 2004), http://hms.liacs.nl
Oberle, D., Berendt, B., Hotho, A., Gonzalez, J.: Conceptual user tracking. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 155–164. Springer, Heidelberg (2003)
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. 21st VLDB Conference, pp. 407–419 (1995)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. ICDM, pp. 51–58 (2002)
Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proc. SIGKDD 2005, pp. 324–333 (2005)
Zaïane, O.R., Han, J.: Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In: Proc. ADL 1998, pp. 19–29 (1998)
Zaki, M.J.: Efficiently mining trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berendt, B. (2006). Using and Learning Semantics in Frequent Subgraph Mining. In: Nasraoui, O., Zaïane, O., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2005. Lecture Notes in Computer Science(), vol 4198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11891321_2
Download citation
DOI: https://doi.org/10.1007/11891321_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46346-7
Online ISBN: 978-3-540-46348-1
eBook Packages: Computer ScienceComputer Science (R0)