A Random Indexing Approach for Web User Clustering and Web Prefetching

  • Miao Wan
  • Arne Jönsson
  • Cong Wang
  • Lixiang Li
  • Yixian Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7104)


In this paper we present a novel technique to capture Web users’ behaviour based on their interest-oriented actions. In our approach we utilise the vector space model Random Indexing to identify the latent factors or hidden relationships among Web users’ navigational behaviour. Random Indexing is an incremental vector space technique that allows for continuous Web usage mining. User requests are modelled by Random Indexing for individual users’ navigational pattern clustering and common user profile creation. Clustering Web users’ access patterns may capture common user interests and, in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. We present results from the Web user clustering approach through experiments on a real Web log file with promising results. We also apply our data to a prefetching task and compare that with previous approaches. The results show that Random Indexing provides more accurate prefetchings.


Web user clustering User behavior Random Indexing Web prefetching 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Etzioni, O.: The world-wide Web: quagmire or gold mine? Communications of the ACM 39(11), 65–68 (1996)CrossRefGoogle Scholar
  2. 2.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. J. Knowl. Inf. Syst. 1(1), 5–32 (1999)CrossRefGoogle Scholar
  3. 3.
    Cao, L.: In-depth Behavior Understanding and Use: the Behavior Informatics Approach. Information Science 180(17), 3067–3085 (2010)CrossRefGoogle Scholar
  4. 4.
    Krishnapuram, R., Joshi, A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transaction of Fuzzy System 4(9), 596–607 (2003)Google Scholar
  5. 5.
    Cadez, I., Heckerman, D., Meek, C., Smyth, P., Whire, S.: Visualization of Navigation Patterns on a Website Using Model Based Clustering. Technical Report MSR-TR-00-18, Microsoft Research (March 2002)Google Scholar
  6. 6.
    Xie, Y., Phoha, V.V.: Web User Clustering from Access Log Using Belief Function. In: Proceedings of K-CAP 2001, pp. 202–208 (2001)Google Scholar
  7. 7.
    Hou, J., Zhang, Y.: Effectively Finding Relevant Web Pages from Linkage Information. IEEE Trans. Knowl. Data Eng. 15(4), 940–951 (2003)CrossRefGoogle Scholar
  8. 8.
    Paik, H.Y., Benatallah, B., Hamadi, R.: Dynamic restructuring of e-catalog communities based on user interaction patterns. World Wide Web 5(4), 325–366 (2002)CrossRefGoogle Scholar
  9. 9.
    Wan, M., Li, L., Xiao, J., Yang, Y., Wang, C., Guo, X.: CAS based clustering algorithm for Web users. Nonlinear Dynamics 61(3), 347–361 (2010)CrossRefzbMATHGoogle Scholar
  10. 10.
    Berendt, B.: Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery 6(1), 37–59 (2002)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Ansari, S., Kohavi, R., Mason, L., Zheng, Z.: Integrating e-commerce and data mining: Architecture and challenges. In: Proceedings of ICDM 2001, pp. 27–34 (2001)Google Scholar
  12. 12.
    Kanerva, P., Kristofersson, J., Holst, A.: Random Indexing of text samples for Latent Semantic Analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036 (2000)Google Scholar
  13. 13.
    Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using Random Indexing of parallel corpora. Journal of Natural Language Engineering, Special Issue on Parallel Texts 6 (2005)Google Scholar
  14. 14.
    Landauer, T., Dumais, S.: A solution to Plato problem: the Latent Semantic Analysis theory for acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  15. 15.
    Kanerva, P.: Sparse distributed memory. The MIT Press, Cambridge (1988)zbMATHGoogle Scholar
  16. 16.
    MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  17. 17.
    Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality Scheme Assessment in the Clustering Process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  18. 18.
    Cunha, C.A., Bestavros, A., Crovella, M.E.: Characteristics of WWW Client Traces, Boston University Department of Computer Science, Technical Report TR-95-010 (April 1995)Google Scholar
  19. 19.
    The Internet Traffic Archive.
  20. 20.
    Gorman, J., Curran, J.R.: Random indexing using statistical weight functions. In: Proceedings of EMNLP 2006, pp. 457–464 (2006)Google Scholar
  21. 21.
    Teng, W., Chang, C., Chen, M.: Integrating Web Caching and Web Prefetching in Client-Side Proxies. IEEE Trans. Parallel Distr. Syst. 16(5), 444–455 (2005)CrossRefGoogle Scholar
  22. 22.
    Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: Effective Prediction of Web-User Accesses: A Data Mining Approach. In: Proceeding of Workshop WEBKDD (2001)Google Scholar
  23. 23.
    Wu, Y., Chen, A.: Prediction of web page accesses by proxy server log. World Wide Web 5, 67–88 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Miao Wan
    • 1
  • Arne Jönsson
    • 2
  • Cong Wang
    • 1
  • Lixiang Li
    • 1
  • Yixian Yang
    • 1
  1. 1.Information Security Center, State Key Laboratory of Networking and Switching TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Department of Computer and Information ScienceLinköping UniversityLinköpingSweden

Personalised recommendations