Abstract
In order to understand the behavior of website users, a deep analysis of content and usage data can reveal valuable knowledge about the main subjects these visitors are truly interested in. Preprocessing and clustering the highly unstructured content of web pages should be addressed very carefully in order to provide effective results. In this paper, a novel proposed two-phase self organizing feature map clustering framework to segment web users based on their subject interests in the diverse content of a University website is described. Also, the overall noise and dimensionality reduction of the sample web site content is properly addressed through the formulation of a comprehensive ten-step preprocessing procedure, which provided very promising experimental results when applied to the input web pages in the first phase of the proposed framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abonyi, J., Feil, B.: Cluster Analysis for Data Mining and System Identification. Birkhauser Verlag, Berlin (2007)
Akerkar, R., Lingras, P.: Building an Intelligent Web, Theory and Practice. Jones and Barlett, London (2008)
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining, A Knowledge Discovery Approach. Springer Science+Business Media, New York (2007)
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1) (1999)
Hornick, M.F., Marcadé, E., Venkayala, S.: Java Data Mining: Strategy, Standard, and Practice. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2007)
Jung, J.: Semantic Preprocessing of Web Request Streams for Web Usage Mining. Journal of Universal Computer Science 11(8), 1383–1396 (2005)
Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM – Self-organizing maps of document collections. Neurocomputing, 101–117 (1998)
Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001)
Kohonen, T., Kaski, S., Lagus, K., Honkela, T.: Very Large Two-Level SOM for the Browsing of Newsgroups. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 269–274. Springer, Heidelberg (1996)
Li, Y., Zhang, C., Zhang, S.: Cooperative strategy for Web data mining and cleaning. Applied Artificial Intelligence 17, 443–460 (2003)
Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proceedings of IEEE Knowledge and Data Engineering Exchange (1999)
Nurnberger, A., Borgelt, C.: Fast Fuzzy Clustering of Web Page Collections. In: Proceedings of PKDD Workshop on Statistical Approaches for Web Mining, Pisa, Italy (2004)
Porter, M.: An algorithm for suffix stripping. Program (14), 130–137 (1980)
Velásquez, J.D., Yasuda, H., Aoki, T.: Using Self Organizing Feature Maps to acquire knowledge about visitor behavior. In: Proceedings of the knowledge-based intelligent information and engineering systems, Oxford, UK, pp. 951–958 (2003)
Velásquez, J.D., Yasuda, H., Aoki, T., Weber, R.: A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems E200-D(1) (2004)
Vesanto, J.: Using SOM in Data Mining. Licentiate’s thesis, Helsinki University of Technology (2000)
Zhang, S., Zhang, C., Yang, Q.: Data Preparation for Data Mining. Applied Artificial Intelligence 17, 375–381 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ammari, A., Zharkova, V. (2009). Understanding Users’ Subject Interests in the Web Site Based on Their Usage of Its Content: A Novel Two-Phase Clustering Framework. In: Håkansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2009. Lecture Notes in Computer Science(), vol 5559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01665-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-01665-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01664-6
Online ISBN: 978-3-642-01665-3
eBook Packages: Computer ScienceComputer Science (R0)