Mining Evolving Web Sessions and Clustering Dynamic Web Documents for Similarity-Aware Web Content Management

Xiao, Jitian

doi:10.1007/978-3-540-88192-6_11

Jitian Xiao⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5139))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2495 Accesses

Abstract

Similarity discovery has become one of the most important research streams in web usage mining community in the recent years. The knowledge obtained from the exercise can be used for many applications such as predicting user’s preference, optimizing web cache organization and improving the quality of web document pre-fetching. This paper presents an approach of mining evolving web sessions to cluster web users and establish similarities among web documents, which are then applied to a Similarity-aware Web content Management system, facilitating offline building of the similarity-ware web caches and online updating of sub-caches and cache content similarity profiles. An agent-based web document pre-fetching mechanism is also developed to support the similarity-aware caching to further reduce the bandwidth consumption and network traffic latency, therefore to improve the web access performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, L., Bhowmick, S.S., Li, J.: COWES: Clustering Web Users Based on Historical Web Sessions. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 541–556. Springer, Heidelberg (2006)
Chapter Google Scholar
Xiao, J., Zhang, Y.: Clustering of web users using session-based similarity measures. In: Proc. of ICCNMC 2001 (2001)
Google Scholar
Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A Web usage mining Framework for mining Evolving user profiles in Dynamic Web sites. IEEE Transaction on Knowledge and Data Engineering 20(2) (2008)
Google Scholar
Xiao, J., Wang, J.: A Similarity-Aware Multiagent-Based Web Content Management Scheme. In: Yeung, D.S., Liu, Z.-Q., Wang, X.-Z., Yan, H. (eds.) ICMLC 2005. LNCS (LNAI), vol. 3930, pp. 305–314. Springer, Heidelberg (2006)
Chapter Google Scholar
Fan, L., Cao, P., Lin, W., Jacobson, Q.: Web Prefetching between Low-Bandwidth Client and Proxies: Potential and Performance. In: SIGMETRICS 1999 (1999)
Google Scholar
Palpanas, T.: Web Prefetching using Partial Matching Prediction, Technical report CSRG-376, University of Toronto (1998)
Google Scholar
Xiao, J.: Agent-based Similarity-aware Web Document Pre-fetching. In: Proc. of the CIMCA/IAWTIC 2005, pp. 928–933 (2005)
Google Scholar
Wang, W., Zaiane, O.R.: Clustering web sessions by sequence alignment. In: Proc. of DEXA (2002)
Google Scholar
Fu, Y., Sandhu, K., Shih, M.: A generalization-based approach to clustering of web usage sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Chapter Google Scholar
Wen, J.R., Nie, J.Y., Zhang, H.J.: Querying Clustering Using User Logs. ACM Transactions on Information Systems 20(1), 59–81 (2002)
Article Google Scholar
Popescul, A., Flake, G., Lawrence, S., Ungar, L.H., Gile, C.L.: Clustering and Identifying Temporal Trends in Document Database. In: Proceedings of the IEEE advances in Digital Libraries, Washington (2000)
Google Scholar
Flesca, S., Masciari, E.: Efficient and Effective Web Change Detection. In: Data & Knowledge Engineering. Elsevier, Amsterdam (2003)
Google Scholar
Salton, G., Yang, C.: On the specification of term values in automatic indexing. Journal of Documentation 29, 351–372 (1973)
Article Google Scholar
Barfourosh, A.A., Nezhad, H.R.M., Anderson, M.L., Perlis, D.: Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition, Technical report UMIACS-TR-2001-69, DRUM: Digital Repository at the University of Maryland (2002)
Google Scholar
Broder, A.Z.: On the Resemblance and Containment of Documents. In: Proceedings of Compression and Complexity of SEQUENCES 1997, Salerno, Italy, pp. 21–29 (1997)
Google Scholar
Fox, E.: Extending the Boolean and Vector Space Models on Information Retrieval with P-Norm Queries and Multiple Concepts Types. Cornell University Dissertation (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Science, Edith Cowan University, 2 Bradford Street, Mount Lawley, WA 6050, Australia
Jitian Xiao

Authors

Jitian Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Sichuan University, 610065, Chengdu, China
Changjie Tang
Department of Computer Science, The University of Western Ontario, Canada
Charles X. Ling
School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
Faculty of Science & Engineering, York University, 355 Lumbers Building, M3J 1P3, Toronto, Ontario, Canada
Nick J. Cercone
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, 4072, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, J. (2008). Mining Evolving Web Sessions and Clustering Dynamic Web Documents for Similarity-Aware Web Content Management. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-88192-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics