Skip to main content

Topic Analysis of Web User Behavior Using LDA Model on Proxy Logs

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Included in the following conference series:

Abstract

We propose a web user profiling and clustering framework based on LDA-based topic modeling with an analogy to document analysis in which documents and words represent users and their actions. The main technical challenge addressed here is how to symbolize web access actions, by words, that are monitored through a web proxy. We develop a hierarchical URL dictionary generated from Yahoo! Directory and a cross-hierarchical matching method that provides the function of automatic abstraction. We apply the proposed framework to 7500 students in Osaka University. The results include, for example, 24 topics such as ”Technology Oriented”, ”Job Hunting”, and ”SNS-addict.” The results reflect the typical interest profiles of University students, while perplexity analysis is employed to confirm the optimality of the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buchner, A.G., Mulvenna, M.D.: Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining. The ACM SIGMOD Record 27(4), 54–61 (1998)

    Article  Google Scholar 

  2. Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google News Personalization: Scalable Online@Collaborative Filtering. In: Proc. of the 16th International Conference on World Wide Web, Alberta, Canada (2007)

    Google Scholar 

  3. Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites through Usage-based Clustering of URLs. In: Proc. of the 1999 Workshop on Knowledge and Data Engineering Exchange (1999)

    Google Scholar 

  4. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization. Data Mining and Knowledge Discovery 6(1), 61–82

    Google Scholar 

  5. Lin, C., Xue, G.-R., Zeng, H.-J., Yu, Y.: Using probabilistic latent semantic analysis for personalized web search. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 707–717. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research archive 3, 993–1022 (2003)

    MATH  Google Scholar 

  7. Weng Ngu, D.S., Wu, X.: Sitehelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web. In: Proc. of the 6th International World Wide Web Conference, Santa Clara (1997)

    Google Scholar 

  8. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420

    Google Scholar 

  9. Xu, G., Zhang, Y., Zhou, X.: Using probabilistic latent semantic analysis for Web page grouping. In: Proc. of the Research Issues in Data Engineering: Stream Data Mining and Applications (2005)

    Google Scholar 

  10. Xu, G., Zhang, Y., Zhou, X.: A Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis. In: Proc. of the 6th International Conference on Web Information System Engineering, New York (2005)

    Google Scholar 

  11. Xu, G., Zhang, Y., Yi, X.: Modeling User Behavior for Web Recommendation Using LDA Model. In: Proc. of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technol, Melbourne (2008)

    Google Scholar 

  12. Fujimoto, H., Etoh, M., Kinno, A., Akinaga, Y.: Web User Profiling on Proxy Logs and its Evaluation in Personalization. In: Proc. of the 13th Asia-Pacific Web Conference (2011) (to appear)

    Google Scholar 

  13. Lieberman, H.: An Agent That Assists Web Browsing. In: Proc. of the 13th International Joint Conference on Artificial Intelligence, Montreal, Canada (1995)

    Google Scholar 

  14. Weng, J., Lim, E.P., Jiang, J., He, Q.: TwitterRank: finding topic-sensitive influential twitterers. In: Proc. of the 3rd ACM International Conference on Web Search and Data Mining (2010)

    Google Scholar 

  15. Bessho, K.: Text Segmentation Using Word Conceptual Vectors. The Transactions of Information Processing Society of Japan 42(11), 2650–2662 (2001)

    Google Scholar 

  16. Tipping, M.E.: Sparse bayesian learning and the relevance vector machine. The Journal of Machine Learning Research Archive 1, 211–244 (2001)

    MathSciNet  MATH  Google Scholar 

  17. Perkowitz, M., Etzioni, O.: Adaptive Web Sites: Automatically Synthesizing Web Pages. In: Proc. of the 15th National Conference on Artificial Intelligence, Madison (1998)

    Google Scholar 

  18. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. The Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  19. Joachims, T., Freitag, D., Mitchell, T.: Webwatcher: A Tour Guide or the World Wide Web. In: Proc. of the 15th International Joint Conference on Artificial Intelligence, Nagoya, Japan (1995)

    Google Scholar 

  20. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, California (1999)

    Google Scholar 

  21. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)

    Article  Google Scholar 

  22. Jin, X., Zhou, Y., Mobasher, B.: Web usage mining based on probabilistic latent semantic analysis. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle (2004)

    Google Scholar 

  23. Wu, X., Yan, J., Liu, N., Yan, S., Chen, Y., et al.: Probabilistic latent semantic user segmentation for behavioral targeted advertising. In: the Proc of the 3rd International Workshop on Data Mining and Audience Intelligence for Advertising, Paris (2009)

    Google Scholar 

  24. Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  25. Elberrichi, Z., Rahmoun, A., Bentaalah, M.A.: Using WordNet for Text Categorization. The International Arab Journal of Information Technology 5(1) (January 2008)

    Google Scholar 

  26. Yahoo! Directory, http://dir.yahoo.com/ , http://dir.yahoo.co.jp/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fujimoto, H., Etoh, M., Kinno, A., Akinaga, Y. (2011). Topic Analysis of Web User Behavior Using LDA Model on Proxy Logs. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20841-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20840-9

  • Online ISBN: 978-3-642-20841-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics