A Cube Model and Cluster Analysis for Web Access Sessions

Huang, Joshua Zhexue; Ng, Michael; Ching, Wai-Ki; Ng, Joe; Cheung, David

doi:10.1007/3-540-45640-6_3

Joshua Zhexue Huang⁵,
Michael Ng⁶,
Wai-Ki Ching⁶,
Joe Ng⁵ &
…
David Cheung⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2356))

Included in the following conference series:

International Workshop on Mining Web Log Data Across All Customers Touch Points

309 Accesses
10 Citations

Abstract

Identification of the navigational patterns of casual visitors is an important step in online recommendation to convert casual visitors to customers in e-commerce. Clustering and sequential analysis are two primary techniques for mining navigational patterns from Web and application server logs. The characteristics of the log data and mining tasks require new data representation methods and analysis algorithms to be tested in the e-commerce environment. In this paper we present a cube model to represent Web access sessions for data mining. The cube model organizes session data into three dimensions. The COMPONENT dimension represents a session as a set of ordered components {c ₁, c ₂,..., c _P}, in which each component c _i indexes the ith visited page in the session. Each component is associated with a set of attributes describing the page indexed by it, such as the page ID, category and view time spent at the page. The attributes associated with each component are defined in the ATTRIBUTE dimension. The SESSION dimension indexes individual sessions. In the model, irregular sessions are converted to a regular data structure to which existing data mining algorithms can be applied while the order of the page sequences is maintained. A rich set of page attributes is embedded in the model for different analysis purposes. We also present some experimental results of using the partitional clustering algorithm to cluster sessions. Because the sessions are essentially sequences of categories, the k-modes algorithm designed for clustering categorical data and the clustering method using the Markov transition frequency (or probability) matrix, are used to cluster categorical sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, A. and Ghosh, J. (2001) Clickstream Clustering Using Weighted Longest Common Subsequences. Workshop on Web Mining, First SIAM International Conference on Data Mining, Chicago.
Google Scholar
Berendt, B., Mobasher, B., Spiliopoulou, M. and Wiltshire, J. (2001) Measuring the Accuracy of Sessionizers for Web Usage Analysis. Workshop on Web Mining, First SIAM International Conference on Data Mining, Chicago.
Google Scholar
Berkhin, P., Becher, J. and Randall, D. (2001) Interactive Path Analysis of Web Site Traffic. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 414–419, San Francisco, CA.
Google Scholar
Cadez, I., Gaffney, S. and Smyth, P. (2000) A General Probabilistic Framework for Clustering Individuals and Objects. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 140–149, New York, NY.
Google Scholar
Chen, M. S., Park, J. S. and Yu, P. S. (1998) Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, pp. 209–221.
Article Google Scholar
Cooley, R., Mobasher, B. and Srivastava, J. (1999) Data preparation for mining World Wide Web browsing patterns. Knowledge and Information Systems, Vol. 1, No. 1, pp. 1–27.
Google Scholar
Etzioni, O. (1996) The World Wide Web: quagmire or gold mine? Communications of the ACM, Vol. 39, No. 11, pp. 65–68.
Article Google Scholar
Fu, Y., Sandhu, K. and Shih, M. (1999) Clustering of Web users based on access patterns. WEBKDD99, Springer.
Google Scholar
Han, J., Cai, Y. and Cercone, N. (1992) Knowledge discovery in databases: an attribute-oriented approach. In Proceeds of VLDB92, Canada.
Google Scholar
Heer, J. and Chi, Ed H. (2001) Identification of Web user traffic composition using multi-modal clustering and information scent. Proceedings of the workshop on Web mining, SIAM conference on data mining, pp. 51–58.
Google Scholar
Huang, Z. (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283–304.
Article Google Scholar
Huang, Z. and Ng, M. K. (1999) A Fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems, Vol. 7, No. 4, pp. 446–452.
Article Google Scholar
Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall.
Google Scholar
Joshi, A. and Joshi K. (1999) On mining Web access logs. Technical Report, CSEE Department, UMBC, MD, USA. http://www.cs.ubmc.edu/joshi/web-mine/publications.html
Google Scholar
Kamdar, T. and Joshi, A. (2000) On creating adaptive Web servers using weblog mining. Technical report CS-TR-00-05, CSEE, UMBC, USA. http://www.cs.ubmc.edu/joshi/web-mine/publications.html
Google Scholar
Kimball, R. and Merx, R. (2000) The Data Webhouse Toolkit-Building Web-Enabled Data Warehouse. Wiley Computer Publishing.
Google Scholar
Kosala, R. and Blockeel, H. (2000) Web mining research: a survey. SIDKDD Explorations, Vol. 2, No. 1, pp. 1–15.
Article Google Scholar
Magid, J., Matthews, R. D. and Jones, P. (1995) The Web Server Book-Tools & Techniques for Building Your Own Internet Information Site. Ventana Press.
Google Scholar
Nasraoui, O., Frigui, H., Joshi, A. and Krishnapuram, R. (1999) Mining Web access logs using relational competitive fuzzy clustering. Proceedings of the Eight International Fuzzy Systems Association Congress-IFSA99.
Google Scholar
Ng, R. and Han, J. (1994) Efficient and effective clustering methods for spatial data mining. In Proceedings of VLDB, 1994.
Google Scholar
Shahabi, C., Faisal, A., Kashani, F. B. and Faruque, J. (2000) INSITE: A tool for real-time knowledge discovery from users Web navigation. Proceedings of VLDB2000, Cairo, Egypt.
Google Scholar
Spiliopoulou, M. and Faulstich, L. C. (1998) WUM: A Web utilization miner. In EDBT Workshop WebDB98, Valencia, Spain, Springer.
Google Scholar
Srivastava, J., Cooley, R., Deshpande, M. and Tan, P. (2000) Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explorations, Vol. 1, No. 2, pp. 12–23.
Article Google Scholar
Taha, T. (1991) Operations Research, 3rd Edition, Collier Macmillan, N.Y., U.S.A.
Google Scholar
www.w3.org/Daemon/User/Config/Logging.thml
W3C (1999) Web Characterization Terminology & Definitions Sheet. W3C Working Draft 24-May, 1999. http://www.w3.org/1999/05/WCA-terms/.
Zaiane, O. R., Xin, M. and Han, J. (1998) Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. Proceedings of Advances in Digital Libraries Conference (ADL’98), Santa Barbara, CA, April 1998, pp. 19–29.
Google Scholar
Zhang, T. and Ramakrishnan, R. (1997) BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, Vol. 1, No. 2, pp. 141–182.
Article Google Scholar

Download references

Author information

Authors and Affiliations

E-Business Technology Institute, The University of Hong Kong, China
Joshua Zhexue Huang, Joe Ng & David Cheung
Department of Mathematics, The University of Hong Kong, China
Michael Ng & Wai-Ki Ching

Authors

Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Ng
View author publications
You can also search for this author in PubMed Google Scholar
Wai-Ki Ching
View author publications
You can also search for this author in PubMed Google Scholar
Joe Ng
View author publications
You can also search for this author in PubMed Google Scholar
David Cheung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Blue Martini Software, 2600 Campus Drive, San Mateo, CA, 94403, USA
Ron Kohavi
Data Miners Inc., 77 North Washington Street, Boston, MA, 02114, USA
Brij M. Masand
Leipzig Graduate School of Management, Jahnallee 59, 04109, Leipzig, Germany
Myra Spiliopoulou
University of Minnesota, 4-192 EECS Building 200 Union St SE, Minneapolis, MN, 55455
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, J.Z., Ng, M., Ching, WK., Ng, J., Cheung, D. (2002). A Cube Model and Cluster Analysis for Web Access Sessions. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_3

Download citation

DOI: https://doi.org/10.1007/3-540-45640-6_3
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43969-1
Online ISBN: 978-3-540-45640-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics