Finding Generalized Path Patterns for Web Log Data Mining
Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. We examine the problem of finding traversal patterns from web logs by considering the fact that irrelevant accesses to web documents may be interleaved within access patterns due to navigational purposes. We define a general type of pattern that takes into account this fact and also, we present a level-wise algorithm for the determination of these patterns, which is based on the underlying structure of the web site. The performance of the algorithm and its sensitivity to several parameters is examined experimentally with synthetic data.
KeywordsAssociation Rule Mining Association Rule Adjacency List Corruption Level Candidate Path
Unable to display preview. Download preview PDF.
- 1.R. Agrawal and R. Srikant: “Fast Algorithms for Mining Association Rules”, Proceedings Very Large Data Bases Conference (VLDB’94), pp. 487–499, 1994.Google Scholar
- 2.R. Agrawal and R. Srikant: “Mining Sequential Patterns”, Proceedings International Conference on Data Engineering (ICDE’95), pp. 3–14, 1995.Google Scholar
- 3.M. Arlitt and C. Williamson. “Internet Web Servers: Workload Characterization and Performance”, IEEE/ACM Transactions on Networking, Vol. 5, No. 5, 1997.Google Scholar
- 4.P. Barford and M. Crovell: “Generating Representative Web Workloads for Network and Server Performance Evaluation”, Proceedings ACM Conference on Mea surement and Modeling of Computer Systems (SIGMETRICS’98), pp. 151–160, 1998.Google Scholar
- 5.J. Borges and M. Levene: “Mining Association Rules in Hypertext Databases”, Proceedings Conference on Knowledge Discovery and Data Mining (KDD’98), pp. 149–153, 1998.Google Scholar
- 6.S. Brin, R. Motwani, J. Ullman and S. Tsur: “Dynamic Itemset Counting and Implication Rules for Market Basket Data”, Proceedings ACM SIGMOD Conference (SIGMOD’97), pp. 255–264, 1997.Google Scholar
- 8.Y. Chiang, M. Goodrich, E. Grove, R. Tamassia, D. Vengroff and J.S. Vitter: “External-Memory Graph Algorithms”, Proceedings Symposium on Discrete Algorithms (SODA’95), pp. 139–149, 1995.Google Scholar
- 9.R. Cooley, B. Mobasher and J. Srivastava: “Data Preparation for Mining World Wide Web Browsing Patterns”, Knowledge and Information Systems, Vol. 1, No. 1, pp. 5–32, 1999.Google Scholar
- 10.K. Joshi, A. Joshi, Y. Yesha and R. Krishnapuram: “Warehousing and Mining Web Logs”, Proceedings Workshop on Web Information and Data Management, pp. 63–68, 1999.Google Scholar
- 11.M. Nodine, M. Goodrich and J.S. Vitter: “Blocking for External Graph Searching”, Proceedings ACM PODS Conference (PODS’93), pp. 222–232, 1993.Google Scholar
- 12.A. Nanopoulos and Y. Manolopoulos: “Finding Generalized Path Patterns for Web Log Data Mining”, Technical report, Aristotle University, http://delab.csd.auth.gr/publications.html, 2000.
- 14.J. Pei, J. Han, B. Mortazavi-Asl and H. Zhu: “Mining Access Patterns Efficiently from Web Logs”, Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00), 2000.Google Scholar
- 15.Y. Xiao and M. Dunham: “Considering Main Memory in Mining Association Rules”, Proceedings Conference on Data Warehousing and Knowledge Discovery (Da-WaK’99), pp. 209–218, 1999.Google Scholar
- 16.O. Zaiane, M. Xin and J. Han: “Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs”, Proceedings on Advances in Digital Libraries (ADL’98), pp. 19–29, 1998.Google Scholar