A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams

Nasraoui, Olfa

doi:10.1007/3-540-33279-0_2

A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams

Olfa Nasraoui³

Chapter

418 Accesses
2 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 14))

Abstract

Data on the Web is noisy, huge, and dynamic. This poses enormous challenges to most data mining techniques that try to extract patterns from this data. While scalable data mining methods are expected to cope with the size challenge, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. Furthermore, the heterogeneity of the Web has required Web-based applications to more effectively integrate a variety of types of data across multiple channels and from different sources such as content, structure, and more recently, semantics. Most existing Web mining and personalization methods are limited to working at the level described to be the lowest and most primitive level, namely discovering models of the user profiles from the input data stream. However, in order to improve understanding of the real intention and dynamics of Web clickstreams, we need to extend reasoning and discovery beyond the usual data stream level. We propose a new multi-level framework for Web usage mining and personalization, consisting of knowledge discovery at different granularities: (i) session/user clicks, profiles, (ii) profile life events and profile communities, and (iii) sequential patterns and predicted shifts in the user profiles. One of the most promising features of the proposed framework address the challenging dynamic scenarios, including (i) defining and detecting events in the life of a synopsis profile, such as Birth, Death and Atavism, and (ii) identifying Node Communities that can later be used to track the temporal evolution of Web profile activity events and dynamic trends within communities, such as Expansion, Shrinking, and Drift.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Karnam A. Joshi, C. Punyapu. Personalization and asynchronicity to support mobile web access. In Workshop on Web Information and Data Management, ACM 7th Intl. Conf. on Information and Knowledge Management, Nov. 1998.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. 2003.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P.S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept 2003.
Google Scholar
S. Babu and J. Widom. Continuous queries over data streams. In SIGMOD Record'01, pp. 109–120, 2001.
Google Scholar
M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3):67–72, 1997.
Article Google Scholar
D. Barbara. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter, 3(2):23–27, 2002.
Google Scholar
J. Borges and M. Levene. Data mining of user navigation patterns. In H.A. Abbass, R.A. Sarker, and C.S. Newton, editors, Web Usage Analysis and User Profiling, Lecture Notes in Computer Science, pp. 92–111. Springer-Verlag, 1999.
Google Scholar
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98), 1998.
Google Scholar
A. Buchner and M.D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record, 4(27), 1999.
Google Scholar
R. Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.
Article MATH Google Scholar
R. Burke. Hybrid recommmender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002.
Article MATH Google Scholar
M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proc. of 35th ACM Symposium on Theory of Computing (STOC), 2003.
Google Scholar
Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, 2002.
Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava. Web mining: Information and pattern discovery on the world wide web. In IEEE Intl. Conf. Tools with AI, pp. 558–567, Newport Beach, CA, 1997.
Google Scholar
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems, 1(1), 1999.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B, 39(1):1–38, 1977.
MATH MathSciNet Google Scholar
U. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000.
Google Scholar
G.H. Hardy, J.E. Littlewood, and G Pólya. Inequalities, chapter Tchebychef's Inequality, pp. 43–45. Cambridge University Press, Cambridge, England, 2nd edition, 1988.
Google Scholar
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams, 1998.
Google Scholar
P.J. Huber. Robust Statistics. John Wiley & Sons, New York, 1981.
Book MATH Google Scholar
H. Heckerman J. Breese and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In 14th Conf. Uncertainty in Artificial Intelligence, pp. 43–52, 1998.
Google Scholar
A. Joshi, S. Weerawarana, and E. Houstis. On disconnected browsing of distributed information. In Seventh IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE), pp. 101–108, 1997.
Google Scholar
H. Mannila, H. Toivonen, and A.I. Verkamo. Discovering frequent episodes in sequences. In Proceedings of KDD Congress, pp. 210–215, Montreal, Quebec, Canada, 1995.
Google Scholar
D. Mladenic. Text learning and related intelligent agents. IEEE Expert, Jul. 1999.
Google Scholar
B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalizaton based on association rule discovery from web usage data. In ACM Workshop on Web information and data management, Atlanta, GA, Nov 2001.
Google Scholar
O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Mining evolving user profiles in noisy web clickstream data with a scalable immune system clustering algorithm. In WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, Washington, DC, August 2003.
Google Scholar
O. Nasraoui, C. Cardona, C. Rojas, and F. Gonzalez. Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, FL, November 2003.
Google Scholar
O. Nasraoui and R. Krishnapuram. A new evolutionary approach to web usage and context sensitive associations mining. International Journal on Computational Intelligence and Applications - Special Issue on Internet Intelligent Systems, 2(3):339–348.
Google Scholar
O. Nasraoui and R. Krishnapuram. One step evolutionary mining of context sensitive associations and web navigation patterns. In SIAM conference on Data Mining, pp. 531–547, Arlington, VA, 2002.
Google Scholar
O. Nasraoui, R. Krishnapuram, H. Frigui, and Joshi A. Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools, 9(4):509–526, 2000.
Article Google Scholar
O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a relational clustering algorithm based on a robust estimator. In 8th International World Wide Web Conference, pp. 40–41, Toronto, Canada, 1999.
Google Scholar
O. Nasraoui and M. Pavuluri. Complete this puzzle: A connectionist approach to accurate web recommendations based on a committee of predictors. In WebKDD- 2004 workshop on Web Mining and Web Usage Analysis , B. Mobasher, B. Liu, B. Masand, O. Nasraoui, Eds, Seattle, WA, Aug 2004.
Google Scholar
O. Nasraoui and C. Petenes. Combining web usage mining and fuzzy inference for website personalization. In Proc. of WebKDD 2003 KDD Workshop on Web mining as a Premise to Effective and Intelligent Web Applications, p. 37, Washington DC, August 2003.
Google Scholar
M. Pazzani. A framework for collaborative, content-based and demographic filtering. AI Review, 13(5–6):393–408, 1999.
Google Scholar
M. Perkowitz and O. Etzioni. Adaptive web sites: an ai challenge. In Intl. Joint Conf. on AI, 1997.
Google Scholar
M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pp. In AAAI 98, 1998.
Google Scholar
R.O. Duda and P.E. Hart. Pattern Classifiation and Scene Analysis. John Wiley and Sons, 1973.
Google Scholar
P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.
Book MATH Google Scholar
Robert E. Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classifiation, 2002.
Google Scholar
C. Shahabi, A.M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from users web-page navigation. In Proceedings of workshop on research issues in Data engineering, Birmingham, England, 1997.
Google Scholar
M. Spiliopoulou and L.C. Faulstich. Wum: A web utilization miner. In Proceedings of EDBT workshop WebDB98, Valencia, Spain, 1999.
Google Scholar
J. Srivastava, R. Cooley, M. Deshpande, and P.N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):1–12, Jan 2000.
Google Scholar
L. Terveen, W. Hill, and B. Amento. Phoaks – a system for sharing recommendations. Comm. ACM, 40(3), 1997.
Google Scholar
T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web conference, Paris, France, 1996.
Google Scholar
O. Zaiane and J. Han. Webml: Querying the world-wide web for resources and knowledge. In Workshop on Web Information and Data Management, 7th Intl. Conf. on Information and Knowledge Management, 1998.
Google Scholar
O. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pp. 19–29, Santa Barbara, CA, 1998.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD conference on Management of Data, Montreal Canada, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Louisville, Louisville
Olfa Nasraoui

Authors

Olfa Nasraoui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Academic, Computer Technology Institute, Riga Fereou 61, Patras, 26221, Greece
Spiros Sirmakessis Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nasraoui, O. (2006). A Multi-Layered and Multi-Faceted Framework for Mining Evolving Web Clickstreams. In: Sirmakessis, S. (eds) Adaptive and Personalized Semantic Web. Studies in Computational Intelligence, vol 14. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-33279-0_2

Download citation

DOI: https://doi.org/10.1007/3-540-33279-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30605-4
Online ISBN: 978-3-540-33279-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics