Abstract
Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features of user sessions in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes can achieve categorization accuracies as high as 99% [Heer02b]. In this paper, we describe the further development of this work into a prototype service called LumberJack, a push-button analysis system that is both more automated and accurate than past systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Banerjee, A., Ghosh, J.: Clickstream Clustering using Weighted Longest Common Subsequences. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 33–40 (2001)
Barrett, R., Maglio, P.P., Kellem, D.C.: How to personalize the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 1997, Atlanta GA, March 1997, pp. 75–82 (1997)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB2002), Kaua’i, HI (January 2002)
Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc.of International Conference on Machine Learning, ML 1994, pp. 28–36. Morgan Kaufmann, San Francisco (1994)
Chi, E.H., Pirolli, P., Pitkow, J.: The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site. In: Proc. of ACM CHI 2000 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 161–168, 581, 582. ACM Press, New York (2000)
Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions on the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 2001, Seattle, WA, pp. 490–497 (2001)
Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the International Conference on Tools ith Artificial Ingelligence, pp. 558–567. IEEE, Los Alamitos (1997)
CLUTO: A Software Package for Clustering High-Dimensional Datasets, Available at http://www-users.cs.umn.edu/~karypis/cluto/
Fu, Y., Sandhu, K., Shih, M.: A Generalization-Based Approach to Clustering of Web Usage Sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Heer, J., Chi, E.H.: Identification of Web User Traffic Composition using Multi- Modal Clustering and Information Scent. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 51–58 (2001)
Heer, J., Chi, E.H.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington VA (April 2002)
Heer, J., Chi, E.H.: Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In: Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, MN, pp. 243–250. ACM Press, New York (2002)
Hong, J.I., Heer, J., Waterson, S., Landay, J.A.: WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. To appear in ACM Transactions on Information Systems
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. UC Berkeley Press (1967)
Mobasher, B., Dai, H., Luo, T., Su, Y., Zhu, J.: Integrating usage and content mining for more effective personalization. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds.) EC-Web 2000. LNCS, vol. 1875, p. 165. Springer, Heidelberg (2000)
Modha, D., Spangler, W.: Feature Weighting in k-Means Clustering. Machine Learning 47 (2002)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Pirolli, P., Pitkow, J.E.: Distributions of Surfers’ Paths Through the World Wide Web: Empirical Characterization. World Wide Web 2(1–2), 29–45 (1999)
Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 106(4), 643–675 (1999)
Schuetze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Schuetze, H., Pirolli, P., Pitkow, J., Chen, F., Chi, E., Li, J.: System and Method for clustering data objects in a collection. Xerox PARC UIR QCA Technical Report (1999)
Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge Discovery from User’s Web-page Navigation. In: Proc. 7th IEEE Intl. Conf. On Research Issues in Data Engineering, pp. 20–29 (1997)
Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL (April 2001)
Srivastava, J., Cooley, R., Deshpande, M.: Web Usage Mining: Discovery and Application of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 12–23 (2000)
Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.): WebKDD 2001. LNCS (LNAI), vol. 2356. Springer, Heidelberg (2002)
Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From User Access Patterns to Dynamic Hypertext Linking. Computer Networks 28(7–11), 1007–1014 (1996)
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report #01–40. University of Minnesota, Computer Science Department. Minneapolis, MN (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chi, E.H., Rosien, A., Heer, J. (2003). LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds) WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. WebKDD 2002. Lecture Notes in Computer Science(), vol 2703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39663-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-39663-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20304-9
Online ISBN: 978-3-540-39663-5
eBook Packages: Springer Book Archive