LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition

Chi, Ed H.; Rosien, Adam; Heer, Jeffrey

doi:10.1007/978-3-540-39663-5_1

Ed H. Chi¹⁰,
Adam Rosien¹⁰ &
Jeffrey Heer¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2703))

Included in the following conference series:

International Workshop on Mining Web Data for Discovering Usage Patterns and Profiles

379 Accesses
15 Citations

Abstract

Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features of user sessions in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes can achieve categorization accuracies as high as 99% [Heer02b]. In this paper, we describe the further development of this work into a prototype service called LumberJack, a push-button analysis system that is both more automated and accurate than past systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, A., Ghosh, J.: Clickstream Clustering using Weighted Longest Common Subsequences. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 33–40 (2001)
Google Scholar
Barrett, R., Maglio, P.P., Kellem, D.C.: How to personalize the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 1997, Atlanta GA, March 1997, pp. 75–82 (1997)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB2002), Kaua’i, HI (January 2002)
Google Scholar
Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc.of International Conference on Machine Learning, ML 1994, pp. 28–36. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Chi, E.H., Pirolli, P., Pitkow, J.: The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site. In: Proc. of ACM CHI 2000 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 161–168, 581, 582. ACM Press, New York (2000)
Google Scholar
Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions on the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 2001, Seattle, WA, pp. 490–497 (2001)
Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the International Conference on Tools ith Artificial Ingelligence, pp. 558–567. IEEE, Los Alamitos (1997)
Chapter Google Scholar
CLUTO: A Software Package for Clustering High-Dimensional Datasets, Available at http://www-users.cs.umn.edu/~karypis/cluto/
Fu, Y., Sandhu, K., Shih, M.: A Generalization-Based Approach to Clustering of Web Usage Sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Chapter Google Scholar
Heer, J., Chi, E.H.: Identification of Web User Traffic Composition using Multi- Modal Clustering and Information Scent. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 51–58 (2001)
Google Scholar
Heer, J., Chi, E.H.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington VA (April 2002)
Google Scholar
Heer, J., Chi, E.H.: Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In: Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, MN, pp. 243–250. ACM Press, New York (2002)
Google Scholar
Hong, J.I., Heer, J., Waterson, S., Landay, J.A.: WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. To appear in ACM Transactions on Information Systems
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. UC Berkeley Press (1967)
Google Scholar
Mobasher, B., Dai, H., Luo, T., Su, Y., Zhu, J.: Integrating usage and content mining for more effective personalization. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds.) EC-Web 2000. LNCS, vol. 1875, p. 165. Springer, Heidelberg (2000)
Chapter Google Scholar
Modha, D., Spangler, W.: Feature Weighting in k-Means Clustering. Machine Learning 47 (2002)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Pirolli, P., Pitkow, J.E.: Distributions of Surfers’ Paths Through the World Wide Web: Empirical Characterization. World Wide Web 2(1–2), 29–45 (1999)
Article Google Scholar
Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 106(4), 643–675 (1999)
Article Google Scholar
Schuetze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Schuetze, H., Pirolli, P., Pitkow, J., Chen, F., Chi, E., Li, J.: System and Method for clustering data objects in a collection. Xerox PARC UIR QCA Technical Report (1999)
Google Scholar
Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge Discovery from User’s Web-page Navigation. In: Proc. 7th IEEE Intl. Conf. On Research Issues in Data Engineering, pp. 20–29 (1997)
Google Scholar
Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL (April 2001)
Google Scholar
Srivastava, J., Cooley, R., Deshpande, M.: Web Usage Mining: Discovery and Application of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 12–23 (2000)
Article Google Scholar
Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.): WebKDD 2001. LNCS (LNAI), vol. 2356. Springer, Heidelberg (2002)
MATH Google Scholar
Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From User Access Patterns to Dynamic Hypertext Linking. Computer Networks 28(7–11), 1007–1014 (1996)
Google Scholar
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report #01–40. University of Minnesota, Computer Science Department. Minneapolis, MN (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

PARC (Palo Alto Research Center), 3333 Coyote Hill Road, Palo Alto, CA, 94304, USA
Ed H. Chi, Adam Rosien & Jeffrey Heer

Authors

Ed H. Chi
View author publications
You can also search for this author in PubMed Google Scholar
Adam Rosien
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Heer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Alberta, Canada
Osmar R. Zaïane
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Data Miners Inc., 77 North Washington Street, MA 02114, Boston, USA
Brij Masand

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chi, E.H., Rosien, A., Heer, J. (2003). LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds) WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. WebKDD 2002. Lecture Notes in Computer Science(), vol 2703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39663-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-39663-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20304-9
Online ISBN: 978-3-540-39663-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics