Abstract
We propose a web user profiling and clustering framework based on LDA-based topic modeling with an analogy to document analysis in which documents and words represent users and their actions. The main technical challenge addressed here is how to symbolize web access actions, by words, that are monitored through a web proxy. We develop a hierarchical URL dictionary generated from Yahoo! Directory and a cross-hierarchical matching method that provides the function of automatic abstraction. We apply the proposed framework to 7500 students in Osaka University. The results include, for example, 24 topics such as ”Technology Oriented”, ”Job Hunting”, and ”SNS-addict.” The results reflect the typical interest profiles of University students, while perplexity analysis is employed to confirm the optimality of the framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buchner, A.G., Mulvenna, M.D.: Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining. The ACM SIGMOD Record 27(4), 54–61 (1998)
Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google News Personalization: Scalable Online@Collaborative Filtering. In: Proc. of the 16th International Conference on World Wide Web, Alberta, Canada (2007)
Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites through Usage-based Clustering of URLs. In: Proc. of the 1999 Workshop on Knowledge and Data Engineering Exchange (1999)
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization. Data Mining and Knowledge Discovery 6(1), 61–82
Lin, C., Xue, G.-R., Zeng, H.-J., Yu, Y.: Using probabilistic latent semantic analysis for personalized web search. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 707–717. Springer, Heidelberg (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research archive 3, 993–1022 (2003)
Weng Ngu, D.S., Wu, X.: Sitehelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web. In: Proc. of the 6th International World Wide Web Conference, Santa Clara (1997)
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420
Xu, G., Zhang, Y., Zhou, X.: Using probabilistic latent semantic analysis for Web page grouping. In: Proc. of the Research Issues in Data Engineering: Stream Data Mining and Applications (2005)
Xu, G., Zhang, Y., Zhou, X.: A Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis. In: Proc. of the 6th International Conference on Web Information System Engineering, New York (2005)
Xu, G., Zhang, Y., Yi, X.: Modeling User Behavior for Web Recommendation Using LDA Model. In: Proc. of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technol, Melbourne (2008)
Fujimoto, H., Etoh, M., Kinno, A., Akinaga, Y.: Web User Profiling on Proxy Logs and its Evaluation in Personalization. In: Proc. of the 13th Asia-Pacific Web Conference (2011) (to appear)
Lieberman, H.: An Agent That Assists Web Browsing. In: Proc. of the 13th International Joint Conference on Artificial Intelligence, Montreal, Canada (1995)
Weng, J., Lim, E.P., Jiang, J., He, Q.: TwitterRank: finding topic-sensitive influential twitterers. In: Proc. of the 3rd ACM International Conference on Web Search and Data Mining (2010)
Bessho, K.: Text Segmentation Using Word Conceptual Vectors. The Transactions of Information Processing Society of Japan 42(11), 2650–2662 (2001)
Tipping, M.E.: Sparse bayesian learning and the relevance vector machine. The Journal of Machine Learning Research Archive 1, 211–244 (2001)
Perkowitz, M., Etzioni, O.: Adaptive Web Sites: Automatically Synthesizing Web Pages. In: Proc. of the 15th National Conference on Artificial Intelligence, Madison (1998)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. The Journal of the American Society for Information Science 41(6), 391–407 (1990)
Joachims, T., Freitag, D., Mitchell, T.: Webwatcher: A Tour Guide or the World Wide Web. In: Proc. of the 15th International Joint Conference on Artificial Intelligence, Nagoya, Japan (1995)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, California (1999)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)
Jin, X., Zhou, Y., Mobasher, B.: Web usage mining based on probabilistic latent semantic analysis. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle (2004)
Wu, X., Yan, J., Liu, N., Yan, S., Chen, Y., et al.: Probabilistic latent semantic user segmentation for behavioral targeted advertising. In: the Proc of the 3rd International Workshop on Data Mining and Audience Intelligence for Advertising, Paris (2009)
Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009)
Elberrichi, Z., Rahmoun, A., Bentaalah, M.A.: Using WordNet for Text Categorization. The International Arab Journal of Information Technology 5(1) (January 2008)
Yahoo! Directory, http://dir.yahoo.com/ , http://dir.yahoo.co.jp/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fujimoto, H., Etoh, M., Kinno, A., Akinaga, Y. (2011). Topic Analysis of Web User Behavior Using LDA Model on Proxy Logs. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-20841-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)