Skip to main content

Using and Learning Semantics in Frequent Subgraph Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4198))

Abstract

The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions.

ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berendt, B.: Using site semantics to analyze, visualize and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)

    Article  MathSciNet  Google Scholar 

  2. Berendt, B., Brenstein, E.: Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. BRMIC 33, 243–257 (2001)

    Google Scholar 

  3. Berendt, B., Hotho, A., Stumme, G.: Usage mining for and on the semantic web. In: Kargupta, H., et al. (eds.) Data Mining: Next Generation Challenges and Future Directions, pp. 461–480. AAAI/MIT Press, Menlo Park (2004)

    Google Scholar 

  4. Berendt, B., Kralisch, A.: Analysing and visualising logfiles: the Individualised SiteMap tool ISM. In: Proc. GOR 2005 (2005)

    Google Scholar 

  5. Berendt, B., Spiliopoulou, M.: Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)

    Article  Google Scholar 

  6. Borgelt, C.: On Canonical Forms for Frequent Graph Mining. In: Proc. of Workshop on Mining Graphs, Trees, and Sequences (MGTS 2005 at PKDD 2005), pp. 1–12 (2005)

    Google Scholar 

  7. Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proc. 2nd Semantic Web Mining Workshop at PKDD 2001 (2001)

    Google Scholar 

  8. Dupret, G., Piwowarski, B.: Deducing a term taxonomy from term similarities. In: Proc. Knowledge Discovery and Ontologies Workshop at PKDD 2005, pp. 11-22 (2005)

    Google Scholar 

  9. Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using site semantics and a taxonomy to enhance the web personalization process. In: Proc. SIGKDD 2003, pp. 99–108 (2003)

    Google Scholar 

  10. Hofer, H., Borgelt, C., Berthold, M.R.: Large scale mining of molecular fragments with wildcards. In: Advances in Intelligent Data Analysis V, pp. 380–389 (2003)

    Google Scholar 

  11. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphisms. In: Proc. ICDM, pp. 549–552 (2003)

    Google Scholar 

  12. Inokuchi, A.: Mining generalized substructures from a set of labeled graphs. In: Proc. ICDM 2004, pp. 414–418 (2004)

    Google Scholar 

  13. Inokuchi, I., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Inokuchi, I., Washio, T., Nishimura, K., Motoda, H.: A fast algorithm for mining frequent connected subgraphs. Research Report, IBM Research, Tokyo (2002)

    Google Scholar 

  15. Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., Agrawal, G.: Discovering frequent topological structures from graph datasets. In: Proc. SIGKDD 2005, pp. 606–611 (2005)

    Google Scholar 

  16. Jin, X., Zhou, Y., Mobasher, B.: A maximum entropy web recommendation system: Combining collaborative and content features. In: Proc. SIGKDD 2005, pp. 612–617 (2005)

    Google Scholar 

  17. Kralisch, A., Berendt, B.: Language sensitive search behaviour and the role of domain knowledge. New Review of Hypermedia and Multimedia 11, 221–246 (2005)

    Article  Google Scholar 

  18. Kralisch, A., Eisend, M., Berendt, B.: Impact of culture on website navigation behaviour. In: Proc. HCI-International (2005)

    Google Scholar 

  19. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. ICDM, pp. 313–320 (2001)

    Google Scholar 

  20. McEneaney, J.E.: Graphic and numerical methods to assess navigation in hypertext. Int. J. of Human-Computer Studies 55, 761–786 (2001)

    Article  MATH  Google Scholar 

  21. Meinl, T., Borgelt, C., Berthold, M.R.: Mining fragments with fuzzy chains in molecular databases. In: Proc. Worksh. Mining Graphs, Trees & Sequences at PKDD 2004, pp. 49–60 (2004)

    Google Scholar 

  22. Meo, R., Lanzi, P.L., Matera, M., Esposito, R.: Integrating web conceptual modeling and web usage mining. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 135–148. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proc. SIGKDD 2004, pp. 647–652 (2004), extended version: LIACS, Leiden Univ., Leiden, The Netherlands, Tech. Report (April 2004), http://hms.liacs.nl

  24. Oberle, D., Berendt, B., Hotho, A., Gonzalez, J.: Conceptual user tracking. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds.) AWIC 2003. LNCS (LNAI), vol. 2663, pp. 155–164. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  25. Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proc. 21st VLDB Conference, pp. 407–419 (1995)

    Google Scholar 

  26. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  27. Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 392–403. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  28. Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. ICDM, pp. 51–58 (2002)

    Google Scholar 

  29. Yan, X., Zhou, X.J., Han, J.: Mining closed relational graphs with connectivity constraints. In: Proc. SIGKDD 2005, pp. 324–333 (2005)

    Google Scholar 

  30. Zaïane, O.R., Han, J.: Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In: Proc. ADL 1998, pp. 19–29 (1998)

    Google Scholar 

  31. Zaki, M.J.: Efficiently mining trees in a forest. In: Proc. SIGKDD 2002, pp. 71–80 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berendt, B. (2006). Using and Learning Semantics in Frequent Subgraph Mining. In: Nasraoui, O., Zaïane, O., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2005. Lecture Notes in Computer Science(), vol 4198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11891321_2

Download citation

  • DOI: https://doi.org/10.1007/11891321_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46346-7

  • Online ISBN: 978-3-540-46348-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics