Advertisement

Recent Developments in Web Usage Mining Research

  • Federico Michele Facca
  • Pier Luca Lanzi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2737)

Abstract

Web Usage Mining is that area of Web Mining which deals with the extraction of interesting knowledge from logging information produced by web servers. In this paper, we present a survey of the recent developments in this area that is receiving increasing attention from the Data Mining community.

Keywords

Association Rule Sequential Pattern Customer Relationship Management Proxy Server Fuzzy Association Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    consortium on discovering knowledge with Inductive Queries (cInQ). Project funded by the European Commission under the Information Society Technologies Programme (1998-2002) Future and Emerging Technologies arm. Contract no. IST-2000-26469, http://www.cinq-project.org Bibliography on Web Usage Mining, available at http://www.cinq-project.org/intranet/polimi/
  2. 2.
    Configuration File of W3C httpd (1995), http://www.w3.org/Daemon/User/Config/
  3. 3.
    W3C Extended Log File Format (1996), http://www.w3.org/TR/WD-logfile.html
  4. 4.
    Accrue (2003), http://www.accrue.com
  5. 5.
    Funnel Web Analyzer (2003), http://www.quest.com
  6. 6.
    NetIQ WebTrends Log Analyzer (2003), http://www.netiq.com
  7. 7.
  8. 8.
    WebSideStory HitBox (2003), http://www.websidestory.com
  9. 9.
    WUM: A Web Utilization Miner (2003), http://wum.wiwi.hu-berlin.de
  10. 10.
    Adomavicius, G., Tuzhilin, A.: Extending recommender systems: A multidimensional approachGoogle Scholar
  11. 11.
    Andersen, J., Giversen, A., Jensen, A.H., Larsen, R.S., Pedersen, T.B., Skyt, J.: Analyzing clickstreams using subsessions. In: International Workshop on Data Warehousing and OLAP, DOLAP 2000 (2000)Google Scholar
  12. 12.
    Corin, R.A.: A Machine Learning Approach to Web Personalization. PhD thesis, University of Washington (2002)Google Scholar
  13. 13.
    Ansari, S., Kohavi, R., Mason, L., Zheng, Z.: Integrating ecommerce and data mining: Architecture and challenges. In: WEBKDD 2000 - Web Mining for E-Commerce – Challenges and Opportunities, Second International Workshop (August 2000)Google Scholar
  14. 14.
    Ansari, S., Kohavi, R., Mason, L., Zheng, Z.: Integrating e-commerce and data mining: Architecture and challenges. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM 2001). IEEE Computer Society, Los Alamitos (2001)Google Scholar
  15. 15.
    Banerjee, A., Ghosh, J.: Clickstream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining (2001)Google Scholar
  16. 16.
    Berendt, B.: Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The impact of site structure and user environment on session reconstruction in web usage analysis. In: Proceedings of the 4th WebKDD 2002 Workshop, at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD 2002) (2002)Google Scholar
  18. 18.
    Borges, J.: A Data Mining Model to Capture UserWeb Navigation Patterns. PhD thesis, Department of Computer Science University College London (2000)Google Scholar
  19. 19.
    Bounsaythip, C., Rinta-Runsala, E.: Overviewof data mining for customer behavior modeling. Technical Report TTE1-2001-18, VTT Information Technology (2001)Google Scholar
  20. 20.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  21. 21.
    Shiu, S., Wong, C., Pal, S.: Mining fuzzy association rules for web access case adaptation. In: Case-Based Reasoning Research and Development: Proceedings of the Fourth International Conference on Case-Based Reasoning (2001)Google Scholar
  22. 22.
    Catledge, L.D., Pitkow, J.E.: Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27(6), 1065–1073 (1995)CrossRefGoogle Scholar
  23. 23.
    Chang, C.-Y., Chen, M.-S.: A new cache replacement algorithm for the integration of web caching and prefectching. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 632–634. ACM Press, New York (2002)CrossRefGoogle Scholar
  24. 24.
    Chen, M., LaPaugh, A.S., Singh, J.P.: Predicting category accesses for a user in a structured information space. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 65–72 (2002)Google Scholar
  25. 25.
    Cooley, R.: Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. PhD thesis, University of Minnesota (2000)Google Scholar
  26. 26.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1(1), 5–32 (1999)Google Scholar
  27. 27.
    Diebold, B., Kaufmann, M.: Usage-based visualization of web localities. In: Australian symposium on Information visualisation, pp. 159–164 (2001)Google Scholar
  28. 28.
    Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology (TOIT) 3(1), 1–27 (2003)CrossRefGoogle Scholar
  29. 29.
    Etzioni, O.: The world-wide web: Quagmire or gold mine? Communications of the ACM 39(11), 65–68 (1996)CrossRefGoogle Scholar
  30. 30.
    Fenstermacher, K.D., Ginsburg, M.: Mining client-side activity for personalization. In: Fourth IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002), pp. 205–212 (2002)Google Scholar
  31. 31.
    Fu, Y., Creado, M., Ju, C.: Reorganizing web sites based on user access patterns. In: Proceedings of the tenth international conference on Information and knowledge management, pp. 583–585. ACM Press, New York (2001)Google Scholar
  32. 32.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)Google Scholar
  33. 33.
    Hay, B., Wets, G., Vanhoof, K.: Clustering navigation patterns on a website using a sequence alignment methodGoogle Scholar
  34. 34.
    Heer, J., Chi, H.: Mining the structure of user activity using cluster stability. In: Proceedings of the Workshop on Web Analytics, Second SIAM Conference on Data Mining, ACM Press, New York (2002)Google Scholar
  35. 35.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975); Republished by the MIT press (1992) Google Scholar
  36. 36.
    Huang, J.Z., Ng, M.K., Ching, W.-K., Ng, J., Cheung, D.: A cube model and cluster analysis for web access sessions. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS, vol. 2356, pp. 48–67. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  37. 37.
    Huang, X., Cercone, N., An, A.: Comparison of interestingness functions for learning web usage patterns. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 617–620. ACM Press, New York (2002)CrossRefGoogle Scholar
  38. 38.
    Joshi, K.P., Joshi, A., Yesha, Y.: On using a warehouse to analyze web logs. Distributed and Parallel Databases 13(2), 161–180 (2003)zbMATHCrossRefGoogle Scholar
  39. 39.
    Kamdar, T.: Creating adaptive web servers using incremental web log mining. Master’s thesis, Computer Science Department, University of Maryland, Baltimore County (2001)Google Scholar
  40. 40.
    Kosala, R., Blockeel, H.: Web mining research: A survey. In: SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, vol. 2(1). ACM, New York (2000)Google Scholar
  41. 41.
    Lan, B., Bressan, S., Ooi, B.C., Tan, K.-L.: Rule-assisted prefetching in web-server caching. In: Proceedings of the ninth international conference on Information and knowledge management (CIKM 2000), pp. 504–511. ACM Press, New York (2000)CrossRefGoogle Scholar
  42. 42.
    Li, T.: Web-document prediction and presending using association rule sequential classifiers. Master’s thesis, Simon Fraser University (2001)Google Scholar
  43. 43.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from web usage data. In: Web Information and Data Management, pp. 9–15 (2001)Google Scholar
  44. 44.
    Mortazavi-Asl, B.: Discovering and mining user web-page traversal patterns. Master’s thesis, Simon Fraser University (2001)Google Scholar
  45. 45.
    Niu, N., Stroulia, E., El-Ramly, M.: Understanding web usage for dynamic web-site adaptation: A case study. In: Proceedings of the Fourth International Workshop on Web Site Evolution (WSE 2002), pp. 53–64. IEEE, Los Alamitos (2002)CrossRefGoogle Scholar
  46. 46.
    Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: Exploiting web log mining for web cache enhancement. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS, vol. 2356, pp. 68–87. Springer, Heidelberg (2002)(Revised Papers)CrossRefGoogle Scholar
  47. 47.
    Nanopoulos, A., Zakrzewicz, M., Morzy, T., Manolopoulos, Y.: Indexing web access-logs for pattern queries. In: Fourth ACM CIKM International Workshop on Web Information and Data Management, WIDM 2002 (2002)Google Scholar
  48. 48.
    Nasraoui, O., Gonzalez, F., Dasgupta, D.: The fuzzy artificial immune system: Motivations, basic concepts, and application to clustering and web profiling. In: Proceedings of the World Congress on Computational Intelligence (WCCI) and IEEE International Conference on Fuzzy Systems, pp. 711–716 (2002)Google Scholar
  49. 49.
    Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: WEBKDD 2001 - Mining Web Log Data Across All Customers Touch Points, Third International Workshop (2001)Google Scholar
  50. 50.
    Paik, H.-Y., Benatallah, B., Hamadi, R.: Dynamic restructuring of e-catalog communities based on user interaction patterns. World Wide Web 5(4), 325–366 (2002)CrossRefGoogle Scholar
  51. 51.
    Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 396–407 (2000)Google Scholar
  52. 52.
    Pilot Software. Web Site Analysis, Going Beyond Traffic Analysis (2002), http://www.marketwave.com/products_solutions/hitlist.html
  53. 53.
    Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: Logml: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WEBKDD 2001. LNCS, vol. 2356, pp. 88–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  54. 54.
    Pedersen, T.B., Jespersen, S., Thorhauge, J.: A hybrid approach to web usage mining. Technical Report R02-5002, Department of Computer Science Aalborg University (2002)Google Scholar
  55. 55.
    Ben Schafer, J., Konstan, J.A., Riedl, J.: E-commerce recommendation applications. Data Mining and Knowledge Discovery 5(1-2), 115–153 (2001)zbMATHCrossRefGoogle Scholar
  56. 56.
    Shahabi, C., Banaei-Kashani, F.: A framework for efficient and anonymous web usage mining based on client-side tracking. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WEBKDD 2001. LNCS, vol. 2356, pp. 113–144. Springer, Heidelberg (2002) (Revised Papers) CrossRefGoogle Scholar
  57. 57.
    Shahabi, C., Chen, Y.-S.: Improving user profiles for e-commerce by genetic algorithms. E-Commerce and Intelligent Methods Studies in Fuzziness and Soft Computing 105(8) (2002)Google Scholar
  58. 58.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2), 12–23 (2000)CrossRefGoogle Scholar
  59. 59.
    Stumme, G., Hotho, A., Berendt, B.: Usage mining for and on the semantic web. In: National Science Foundation Workshop on Next Generation Data Mining (2002)Google Scholar
  60. 60.
    Tan, P.-N., Kumar, V.: Modeling of web robot navigational patterns. In: WEBKDD 2000 - Web Mining for E-Commerce – Challenges and Opportunities. Second International Workshop (August 2000)Google Scholar
  61. 61.
    Tan, P.-N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6(1), 9–35 (2002)CrossRefMathSciNetGoogle Scholar
  62. 62.
    Toolan, F., Kushmerick, N.: Mining web logs for personalized site mapsGoogle Scholar
  63. 63.
    VanderMeer, D., Dutta, K., Datta, A.: Enabling scalable online personalization on the web. In: Proceedings of the 2nd ACM E-Commerce Conference (EC 2000), pp. 185–196. ACM Press, New York (2000)CrossRefGoogle Scholar
  64. 64.
    Wu, Y.-H., Chen, A.L.P.: Prediction of web page accesses by proxy server log. World Wide Web 5(1), 67–88 (2002)zbMATHCrossRefGoogle Scholar
  65. 65.
    Xie, Y., Phoha, V.V.: Web user clustering from access log using belief function. In: Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), pp. 202–208. ACM Press, New York (2001)CrossRefGoogle Scholar
  66. 66.
    Zaïane, O.R.: Web usage mining for a better web-based learning environment. In: Proceedings of Conference on Advanced Technology for Education, pp. 450–455 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Federico Michele Facca
    • 1
  • Pier Luca Lanzi
    • 1
  1. 1.Artificial Intelligence and Robotics Laboratory , Dipartimento di Elettronica e InformazionePolitecnico di Milano 

Personalised recommendations