Skip to main content

LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition

  • Conference paper
Book cover WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles (WebKDD 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2703))

Abstract

Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features of user sessions in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes can achieve categorization accuracies as high as 99% [Heer02b]. In this paper, we describe the further development of this work into a prototype service called LumberJack, a push-button analysis system that is both more automated and accurate than past systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, A., Ghosh, J.: Clickstream Clustering using Weighted Longest Common Subsequences. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 33–40 (2001)

    Google Scholar 

  2. Barrett, R., Maglio, P.P., Kellem, D.C.: How to personalize the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 1997, Atlanta GA, March 1997, pp. 75–82 (1997)

    Google Scholar 

  3. Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB2002), Kaua’i, HI (January 2002)

    Google Scholar 

  4. Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc.of International Conference on Machine Learning, ML 1994, pp. 28–36. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  5. Chi, E.H., Pirolli, P., Pitkow, J.: The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site. In: Proc. of ACM CHI 2000 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 161–168, 581, 582. ACM Press, New York (2000)

    Google Scholar 

  6. Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions on the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 2001, Seattle, WA, pp. 490–497 (2001)

    Google Scholar 

  7. Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the International Conference on Tools ith Artificial Ingelligence, pp. 558–567. IEEE, Los Alamitos (1997)

    Chapter  Google Scholar 

  8. CLUTO: A Software Package for Clustering High-Dimensional Datasets, Available at http://www-users.cs.umn.edu/~karypis/cluto/

  9. Fu, Y., Sandhu, K., Shih, M.: A Generalization-Based Approach to Clustering of Web Usage Sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Heer, J., Chi, E.H.: Identification of Web User Traffic Composition using Multi- Modal Clustering and Information Scent. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 51–58 (2001)

    Google Scholar 

  11. Heer, J., Chi, E.H.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington VA (April 2002)

    Google Scholar 

  12. Heer, J., Chi, E.H.: Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In: Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, MN, pp. 243–250. ACM Press, New York (2002)

    Google Scholar 

  13. Hong, J.I., Heer, J., Waterson, S., Landay, J.A.: WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. To appear in ACM Transactions on Information Systems

    Google Scholar 

  14. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. UC Berkeley Press (1967)

    Google Scholar 

  15. Mobasher, B., Dai, H., Luo, T., Su, Y., Zhu, J.: Integrating usage and content mining for more effective personalization. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds.) EC-Web 2000. LNCS, vol. 1875, p. 165. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Modha, D., Spangler, W.: Feature Weighting in k-Means Clustering. Machine Learning 47 (2002)

    Google Scholar 

  17. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  18. Pirolli, P., Pitkow, J.E.: Distributions of Surfers’ Paths Through the World Wide Web: Empirical Characterization. World Wide Web 2(1–2), 29–45 (1999)

    Article  Google Scholar 

  19. Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 106(4), 643–675 (1999)

    Article  Google Scholar 

  20. Schuetze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  21. Schuetze, H., Pirolli, P., Pitkow, J., Chen, F., Chi, E., Li, J.: System and Method for clustering data objects in a collection. Xerox PARC UIR QCA Technical Report (1999)

    Google Scholar 

  22. Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge Discovery from User’s Web-page Navigation. In: Proc. 7th IEEE Intl. Conf. On Research Issues in Data Engineering, pp. 20–29 (1997)

    Google Scholar 

  23. Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL (April 2001)

    Google Scholar 

  24. Srivastava, J., Cooley, R., Deshpande, M.: Web Usage Mining: Discovery and Application of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 12–23 (2000)

    Article  Google Scholar 

  25. Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.): WebKDD 2001. LNCS (LNAI), vol. 2356. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  26. Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From User Access Patterns to Dynamic Hypertext Linking. Computer Networks 28(7–11), 1007–1014 (1996)

    Google Scholar 

  27. Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report #01–40. University of Minnesota, Computer Science Department. Minneapolis, MN (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chi, E.H., Rosien, A., Heer, J. (2003). LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds) WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles. WebKDD 2002. Lecture Notes in Computer Science(), vol 2703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39663-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39663-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20304-9

  • Online ISBN: 978-3-540-39663-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics