Using and Learning Semantics in Frequent Subgraph Mining

Berendt, Bettina

doi:10.1007/11891321_2

Using and Learning Semantics in Frequent Subgraph Mining

Bettina Berendt²⁴

Conference paper

650 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4198))

Abstract

The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions.

ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berendt, B.: Using site semantics to analyze, visualize and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)
Article MathSciNet Google Scholar
Berendt, B., Brenstein, E.: Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. BRMIC 33, 243–257 (2001)
Google Scholar
Berendt, B., Hotho, A., Stumme, G.: Usage mining for and on the semantic web. In: Kargupta, H., et al. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 461–480. AAAI/MIT Press, Menlo Park (2004)
Google Scholar
Berendt, B., Kralisch, A.: Analysing and visualising logfiles: the Individualised SiteMap tool ISM. In: Proc. GOR 2005 (2005)
Google Scholar
Berendt, B., Spiliopoulou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)
Article Google Scholar
Borgelt, C.: On Canonical Forms for Frequent Graph Mining. In: Proc. of Workshop on Mining Graphs, Trees, and Sequences (MGTS 2005 at PKDD 2005), pp. 1–12 (2005)
Google Scholar
Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proc. 2nd Semantic Web Mining Workshop at PKDD 2001 (2001)
Google Scholar
Dupret, G., Piwowarski, B.: Deducing a term taxonomy from term similarities. In: Proc. Knowledge Discovery and Ontologies Workshop at PKDD 2005, pp. 11-22 (2005)
Google Scholar
Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using site semantics and a taxonomy to enhance the web personalization process. In: Proc. SIGKDD 2003, pp. 99–108 (2003)
Google Scholar
Hofer, H., Borgelt, C., Berthold, M.R.: Large scale mining of molecular fragments with wildcards. In: Advances in Intelligent Data Analysis V, pp. 380–389 (2003)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphisms. In: Proc. ICDM, pp. 549–552 (2003)
Google Scholar
Inokuchi, A.: Mining generalized substructures from a set of labeled graphs. In: Proc. ICDM 2004, pp. 414–418 (2004)
Google Scholar
Inokuchi, I., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Inokuchi, I., Washio, T., Nishimura, K., Motoda, H.: A fast algorithm for mining frequent connected subgraphs. Research Report, IBM Research, Tokyo (2002)
Google Scholar
Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., Agrawal, G.: Discovering frequent topological structures from graph datasets. In: Proc. SIGKDD 2005, pp. 606–611 (2005)
Google Scholar
Jin, X., Zhou, Y., Mobasher, B.: A maximum entropy web recommendation system: Combining collaborative and content features. In: Proc. SIGKDD 2005, pp. 612–617 (2005)
Google Scholar
Kralisch, A., Berendt, B.: Language sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia 11, 221–246 (2005)
Article Google Scholar
Kralisch, A., Eisend, M., Berendt, B.: Impact of culture on website navigation behaviour. In: Proc. HCI-International (2005)
Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. ICDM, pp. 313–320 (2001)
Google Scholar
McEneaney, J.E.: Graphic and numerical methods to assess navigation in hypertext. Int. J. of Human-Computer Studies 55, 761–786 (2001)
Article MATH Google Scholar
Meinl, T., Borgelt, C., Berthold, M.R.: Mining fragments with fuzzy chains in molecular databases. In: Proc. Worksh. Mining Graphs, Trees & Sequences at PKDD 2004, pp. 49–60 (2004)
Google Scholar
Meo, R., Lanzi, P.L., Matera, M., Esposito, R.: Integrating web conceptual modeling and web usage mining. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 135–148. Springer, Heidelberg (2006)
Chapter Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proc. SIGKDD 2004, pp. 647–652 (2004), extended version: LIACS, Leiden Univ., Leiden, The Netherlands, Tech. Report (April 2004), http://hms.liacs.nl
Oberle, D., Berendt, B., Hotho, A., Gonzalez, J.: Conceptual user tracking. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 155–164. Springer, Heidelberg (2003)
Chapter Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. 21^st VLDB Conference, pp. 407–419 (1995)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Chapter Google Scholar
Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)
Chapter Google Scholar
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. ICDM, pp. 51–58 (2002)
Google Scholar
Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proc. SIGKDD 2005, pp. 324–333 (2005)
Google Scholar
Zaïane, O.R., Han, J.: Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In: Proc. ADL 1998, pp. 19–29 (1998)
Google Scholar
Zaki, M.J.: Efficiently mining trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Systems, Humboldt University Berlin, D-10178, Berlin, Germany
Bettina Berendt

Authors

Bettina Berendt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Discovery & Web Mining Lab, University of Louisville, 40292, Louisville, KY, USA
Olfa Nasraoui
University of Alberta, Canada
Osmar Zaïane
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Center for Web Intelligence School of Computing, DePaul University, Chicago, Illinois, USA
Bamshad Mobasher
Data Miners Inc., 77 North Washington Street, 02114, Boston, MA, USA
Brij Masand
University of Illinois, Chicago, IL, USA
Philip S. Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berendt, B. (2006). Using and Learning Semantics in Frequent Subgraph Mining. In: Nasraoui, O., Zaïane, O., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2005. Lecture Notes in Computer Science(), vol 4198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11891321_2

Download citation

DOI: https://doi.org/10.1007/11891321_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46346-7
Online ISBN: 978-3-540-46348-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics