Discovery of Indirect Associations from Web Usage Data

  • Pang-Ning Tan
  • Vipin Kumar
Chapter

Abstract

Web associations are valuable patterns because they provide useful insights into the browsing behavior of Web users. However, there are several limitations in applying current techniques for mining association patterns in Web usage data. First, as current techniques rely on the support measure to eliminate infrequent patterns, they are unable to detect interesting negative associations in data. In addition, they do not account for the impact of Web site structure on the support of a pattern. To address these limitations, we describe the use of a new data mining technique called indirect association to discover interesting negative associations in Web click-stream data. The key idea behind indirect association is to find pairs of pages that are negatively associated with each other, but are often accessed together with a common set of pages called the mediator. Indirect associations are useful patterns because they can capture the different interests of Web users who share a common traversal path. This type of pattern is not easily found using conventional data mining techniques unless the groups of users are known a priori. A novel technique is also developed for merging indirect associations into more compact patterns. The effectiveness of mining indirect associations is demonstrated using Web data from an academic institution and an online Web store.

Keywords

Radar Marketing Beach Expense Hunt 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 7.1
    R. Agrawal, T. Imielinski, A. Swami: Database mining: a performance perspective. IEEE Trans. Know Data Eng., 5, 914–925 (1993)CrossRefGoogle Scholar
  2. 7.2
    R. Agrawal, R. Srikant: Fast algorithms for mining association rules. In: J. B. Bocca, M. Jarke, C. Zaniolo (eds.), Proc. 20th VLDB Conference, Santiago de Chile, 12–15 September 1994 (Morgan Kaufmann, 1994 ) pp. 487–499Google Scholar
  3. 7.3
    R. Agrawal, R. Srikant: Mining sequential patterns. In: P. S. Yu, A. L. P. Chen (eds.), Proc. 11th Int. Conf. on Data Engineering, Taipei, March 6–10 1995 (IEEE Computer Society, 1995 ) pp. 3–14Google Scholar
  4. 7.4
    A. Banerjee, J. Ghosh: Clickstream clustering using weighted longest common subsequences. In: Proc. Workshop on Web Mining at the 1st SIAM Int’l Conf. on Data Mining (Chicago, 7 April 2001 ) pp. 33–40Google Scholar
  5. 7.5
    J. Borges, M. Levene: Mining association rules in hypertext databases. In: R. Agrawal, P. E. Stolorz, G. Piatetsky-Shapiro (eds.), Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York, 27–31 August 1998 (AAAI Press, 1998 ) pp. 149–153Google Scholar
  6. 7.6
    T. Brijs, G. Swinnen, K. Vanhoof, G. Wets: Using association rules for product assortment decisions: a case study. In: Proc. 5th Int. Conf. on Knowledge Discovery and Data Mining, San Diego, 15–18 August 1999 (ACM Press, 1999 ) pp. 254–260Google Scholar
  7. 7.7
    S. Brin, R. Motwani, C. Silverstein: Beyond market baskets: generalizing association rules to correlations. In: J. Peckham (ed.), Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, 13–15 May 1997 (ACM Press, 1997 ) pp. 265–276Google Scholar
  8. 7.8
    S. Brin, R. Motwani, J. Ullman, S. Tsur: Dynamic itemset counting and implication rules for market basket data. In: J. Peckham (ed.), Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, 13–15 May 1997 (ACM Press, 1997 ) pp. 255–264Google Scholar
  9. 7.9
    M. S. Chen, J. S. Park, P. S. Yu: Efficient data mining for path traversal patterns. IEEE Trans. Know. Data Eng., 10, 209–221 (1998)CrossRefGoogle Scholar
  10. 7.10
    R. Cooley, B. Mobasher, J. Srivastava: Web mining: information and pattern discovery on the World-Wide Web. In: 9th Int. Conf. on Tools with Artificial Intelligence, Newport Beach, 3–8 November 1997 (IEEE Computer Society Press, 1997 ) pp. 558–567Google Scholar
  11. 7.11
    R. Cooley, P. N. Tan, J. Srivastava: Discovery of interesting usage patterns from Web data. In: M. Spiliopoulou, B. Masand (eds.), Advances in Web Usage Analysis and User Profiling, LNAI 1836 (Springer, Berlin Heidelberg, 2000 ) pp. 163–182CrossRefGoogle Scholar
  12. 7.12
    M. Deshpande, G. Karypis: Selective Markov models for predicting Web page access. In: Proc. Ist SIAM Int. Conf. on Data Mining, Chicago, 5–7 April 2001Google Scholar
  13. 7.13
    Y. Fu, K. Sandhu, M. Shih: A generalization-based approach to clustering of Web usage sessions. In: M. Spiliopoulou, B. Masand (eds.), Advances in Web Usage Analysis and User Profiling, LNAI 1836 ( Springer, Berlin Heidelberg, 2000 ) pp. 21–38CrossRefGoogle Scholar
  14. 7.14
    M. N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: sequential pattern mining with regular expression constraints. In: M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, M. L. Brodie (eds.), Proc. 25th Int. Conf. on Very Large Databases, Edinburgh, 7–10 September 1999 (Morgan Kaufmann, 1999 ) pp. 223–234Google Scholar
  15. 7.15
    B. Liu, W. Hsu, Y. Ma: Pruning and summarizing the discovered associations. In: Proc. 5th Int. Conf. on Knowledge Discovery and Data Mining, San Diego, 15–18 August 1999 (ACM Press, 1999 ) pp. 125–134Google Scholar
  16. 7.16
    H. Mannila, H. Toivonen, A. I. Verkamo: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1, 259–289 (1997)CrossRefGoogle Scholar
  17. 7.17
    J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu: Mining access patterns efficiently from Web logs. In: T. Terano, H. Liu, A. L. P. Chen (eds.), Proc. 4th Pacific-Asia Conf. on KDD, Kyoto, 18–20 April 2000 (Springer, 2000 ) pp. 396–407Google Scholar
  18. 7.18
    P. Pirolli, J. E. Pitkow, R. Rao: Silk from a sow’s ear: extracting usable structures from the Web. In: Proc. Conf. on Human Factors in Computing Systems, Vancouver, 13–18 April 1996 (ACM Press, 1996 ) pp. 118–125Google Scholar
  19. 7.19
    J. E. Pitkow, P. Pirolli: Mining longest repeating subsequences to predict World-Wide Web surfing. In: Proc. 2nd USENIX Symp. Internet Technologies and Systems, Boulder, CO, 11–14 October 1999Google Scholar
  20. 7.20
    A. Savasere, E. Omiecinski, S. Navathe: Mining for strong negative associations in a large database of customer transactions. In: Proc. 14th Int. Conf. on Data Engineering, Orlando, 23–27 February 1998 (IEEE Computer Society, 1998 ) pp. 494–502Google Scholar
  21. 7.21
    C. Shahabi, A. M. Zarkesh, J. Adibi, V. Shah: Knowledge discovery from users Webpage navigation. In: Proc. 7th Int. Workshop on Research Issues in Data Engineering, Birmingham, 7–8 April 1997Google Scholar
  22. 7.22
    A. Silberschatz, A. Tuzhilin: What makes patterns interesting in knowledge discovery systems. IEEE Trans. Know. Data Eng., 8, 970–974 (1996)CrossRefGoogle Scholar
  23. 7.23
    M. Spiliopoulou, L. C. Faulstich, K. Winkler: A data miner analyzing the navigational behaviour of Web users. In: Proc. Workshop on Machine Learning in User Modelling of the ACAI’99Int. Conf.Creta, July 1999 Google Scholar
  24. 7.24
    R. Srikant, R. Agrawal: Mining sequential patterns: generalizations and performance improvements. In: P. M. G. Apers, M. Bouzeghoub, G. Gardarin (eds.) Proc. 5th Int’l Conf. on Extending Database Technology (EDBT)Avignon, 25–29 March 1996 (Springer, 1996) pp. 3–17 Google Scholar
  25. 7.25
    J. Srivastava, R. Cooley, M. Deshpande, P. N. Tan: Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explorations, 1, 12–23 (2000)CrossRefGoogle Scholar
  26. 7.26
    P. N. Tan, V. Kumar: Interestingness measures for association patterns: a perspective. In: Proc. KDD 2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, 20 August 2000Google Scholar
  27. 7.27
    P. N. Tan, V. Kumar: Discovery of Web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6, 9–35 (2001)MathSciNetCrossRefGoogle Scholar
  28. 7.28
    P. N. Tan, V. Kumar: Mining association patterns in Web usage data In International Conference on Advances in Infrastructure for e-BusinessL’Aquila, 21–27 January 2002 Google Scholar
  29. 7.29
    P. N. Tan, V. Kumar, J. Srivastava: Indirect association: mining higher order dependencies in data. In: D. A. Zighed, H. J. Komorowski, J. M. Zytkow (eds.) Proc. 4th European Conf. on Principles and Practice of Knowledge Discovery in DatabasesLyon, 13–16 September 2000 (Springer, 2000) pp. 632–637 Google Scholar
  30. 7.30
    P. N. Tan, V. Kumar, J. Srivastava: Selecting the right interestingness measure for association patterns. In: D. Hand, D. Keim, R. Ng (eds.), Proc. 8th Int. Conf. on Knowledge Discovery and Data MiningEdmonton, 23–26 July 2002 (ACM Press) pp. 32–41 Google Scholar
  31. 7.31
    H. Toivonen, M. Klemettinen, R Ronkainen, K. Hatonen, H. Mannila: Pruning and grouping discovered association rules. In:Proc. ECML-95 Workshop on Statistics Machine Learning and Knowledge Discovery in DatabasesHeraklion, 25–27 April 1995, pp. 47–52 Google Scholar
  32. 7.32
    A. Wexelblat: An environment for aiding information-browsing tasks. In:Proc. ofAAAI Symposium on Acquisition Learning and Demonstration: Automating Tasks for UsersBirmingham, UK, 25–27 March 1996 Google Scholar
  33. 7.33
    T. W. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal: From user access patterns to dynamic hypertext linking. Computer Networks and ISDN Systems, 28, 1007–1014 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Pang-Ning Tan
    • 1
  • Vipin Kumar
    • 1
  1. 1.AHPCRC/University of MinnesotaMinneapolisUSA

Personalised recommendations