Skip to main content

A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 14))

Abstract

Data on the Web is noisy, huge, and dynamic. This poses enormous challenges to most data mining techniques that try to extract patterns from this data. While scalable data mining methods are expected to cope with the size challenge, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. Furthermore, the heterogeneity of the Web has required Web-based applications to more effectively integrate a variety of types of data across multiple channels and from different sources such as content, structure, and more recently, semantics. Most existing Web mining and personalization methods are limited to working at the level described to be the lowest and most primitive level, namely discovering models of the user profiles from the input data stream. However, in order to improve understanding of the real intention and dynamics of Web clickstreams, we need to extend reasoning and discovery beyond the usual data stream level. We propose a new multi-level framework for Web usage mining and personalization, consisting of knowledge discovery at different granularities: (i) session/user clicks, profiles, (ii) profile life events and profile communities, and (iii) sequential patterns and predicted shifts in the user profiles. One of the most promising features of the proposed framework address the challenging dynamic scenarios, including (i) defining and detecting events in the life of a synopsis profile, such as Birth, Death and Atavism, and (ii) identifying Node Communities that can later be used to track the temporal evolution of Web profile activity events and dynamic trends within communities, such as Expansion, Shrinking, and Drift.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Karnam A. Joshi, C. Punyapu. Personalization and asynchronicity to support mobile web access. In Workshop on Web Information and Data Management, ACM 7th Intl. Conf. on Information and Knowledge Management, Nov. 1998.

    Google Scholar 

  2. C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. 2003.

    Google Scholar 

  3. C. Aggarwal, J. Han, J. Wang, and P.S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept 2003.

    Google Scholar 

  4. S. Babu and J. Widom. Continuous queries over data streams. In SIGMOD Record'01, pp. 109–120, 2001.

    Google Scholar 

  5. M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3):67–72, 1997.

    Article  Google Scholar 

  6. D. Barbara. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter, 3(2):23–27, 2002.

    Google Scholar 

  7. J. Borges and M. Levene. Data mining of user navigation patterns. In H.A. Abbass, R.A. Sarker, and C.S. Newton, editors, Web Usage Analysis and User Profiling, Lecture Notes in Computer Science, pp. 92–111. Springer-Verlag, 1999.

    Google Scholar 

  8. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98), 1998.

    Google Scholar 

  9. A. Buchner and M.D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 4(27), 1999.

    Google Scholar 

  10. R. Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.

    Article  MATH  Google Scholar 

  11. R. Burke. Hybrid recommmender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.

    Article  MATH  Google Scholar 

  12. M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003.

    Google Scholar 

  13. Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, 2002.

    Google Scholar 

  14. R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In IEEE Intl. Conf. Tools with AI, pp. 558–567, Newport Beach, CA, 1997.

    Google Scholar 

  15. R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems, 1(1), 1999.

    Google Scholar 

  16. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B, 39(1):1–38, 1977.

    MATH  MathSciNet  Google Scholar 

  17. U. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.

    Google Scholar 

  18. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000.

    Google Scholar 

  19. G.H. Hardy, J.E. Littlewood, and G Pólya. Inequalities, chapter Tchebychef's Inequality, pp. 43–45. Cambridge University Press, Cambridge, England, 2nd edition, 1988.

    Google Scholar 

  20. M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams, 1998.

    Google Scholar 

  21. P.J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.

    Book  MATH  Google Scholar 

  22. H. Heckerman J. Breese and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conf. Uncertainty in Artificial Intelligence, pp. 43–52, 1998.

    Google Scholar 

  23. A. Joshi, S. Weerawarana, and E. Houstis. On disconnected browsing of distributed information. In Seventh IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE), pp. 101–108, 1997.

    Google Scholar 

  24. H. Mannila, H. Toivonen, and A.I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of KDD Congress, pp. 210–215, Montreal, Quebec, Canada, 1995.

    Google Scholar 

  25. D. Mladenic. Text learning and related intelligent agents. IEEE Expert, Jul. 1999.

    Google Scholar 

  26. B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalizaton based on association rule discovery from web usage data. In ACM Workshop on Web information and data management, Atlanta, GA, Nov 2001.

    Google Scholar 

  27. O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. In WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington, DC, August 2003.

    Google Scholar 

  28. O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, FL, November 2003.

    Google Scholar 

  29. O. Nasraoui and R. Krishnapuram. A new evolutionary approach to web usage and context sensitive associations mining. International Journal on Computational Intelligence and Applications - Special Issue on Internet Intelligent Systems, 2(3):339–348.

    Google Scholar 

  30. O. Nasraoui and R. Krishnapuram. One step evolutionary mining of context sensitive associations and web navigation patterns. In SIAM conference on Data Mining, pp. 531–547, Arlington, VA, 2002.

    Google Scholar 

  31. O. Nasraoui, R. Krishnapuram, H. Frigui, and Joshi A. Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools, 9(4):509–526, 2000.

    Article  Google Scholar 

  32. O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a relational clustering algorithm based on a robust estimator. In 8th International World Wide Web Conference, pp. 40–41, Toronto, Canada, 1999.

    Google Scholar 

  33. O. Nasraoui and M. Pavuluri. Complete this puzzle: A connectionist approach to accurate web recommendations based on a committee of predictors. In WebKDD- 2004 workshop on Web Mining and Web Usage Analysis , B. Mobasher, B. Liu, B. Masand, O. Nasraoui, Eds, Seattle, WA, Aug 2004.

    Google Scholar 

  34. O. Nasraoui and C. Petenes. Combining web usage mining and fuzzy inference for website personalization. In Proc. of WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, p. 37, Washington DC, August 2003.

    Google Scholar 

  35. M. Pazzani. A framework for collaborative, content-based and demographic filtering. AI Review, 13(5–6):393–408, 1999.

    Google Scholar 

  36. M. Perkowitz and O. Etzioni. Adaptive web sites: an ai challenge. In Intl. Joint Conf. on AI, 1997.

    Google Scholar 

  37. M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pp. In AAAI 98, 1998.

    Google Scholar 

  38. R.O. Duda and P.E. Hart. Pattern Classifiation and Scene Analysis. John Wiley and Sons, 1973.

    Google Scholar 

  39. P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.

    Book  MATH  Google Scholar 

  40. Robert E. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classifiation, 2002.

    Google Scholar 

  41. C. Shahabi, A.M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from users web-page navigation. In Proceedings of workshop on research issues in Data engineering, Birmingham, England, 1997.

    Google Scholar 

  42. M. Spiliopoulou and L.C. Faulstich. Wum: A web utilization miner. In Proceedings of EDBT workshop WebDB98, Valencia, Spain, 1999.

    Google Scholar 

  43. J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):1–12, Jan 2000.

    Google Scholar 

  44. L. Terveen, W. Hill, and B. Amento. Phoaks – a system for sharing recommendations. Comm. ACM, 40(3), 1997.

    Google Scholar 

  45. T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web conference, Paris, France, 1996.

    Google Scholar 

  46. O. Zaiane and J. Han. Webml: Querying the world-wide web for resources and knowledge. In Workshop on Web Information and Data Management, 7th Intl. Conf. on Information and Knowledge Management, 1998.

    Google Scholar 

  47. O. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pp. 19–29, Santa Barbara, CA, 1998.

    Google Scholar 

  48. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, Montreal Canada, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this chapter

Cite this chapter

Nasraoui, O. (2006). A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams. In: Sirmakessis, S. (eds) Adaptive and Personalized Semantic Web. Studies in Computational Intelligence, vol 14. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-33279-0_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-33279-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30605-4

  • Online ISBN: 978-3-540-33279-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics