Understanding Users’ Subject Interests in the Web Site Based on Their Usage of Its Content: A Novel Two-Phase Clustering Framework

Ammari, Ahmad; Zharkova, Valentina

doi:10.1007/978-3-642-01665-3_39

Understanding Users’ Subject Interests in the Web Site Based on Their Usage of Its Content: A Novel Two-Phase Clustering Framework

Ahmad Ammari²⁴ &
Valentina Zharkova²⁴

Conference paper

1459 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5559))

Abstract

In order to understand the behavior of website users, a deep analysis of content and usage data can reveal valuable knowledge about the main subjects these visitors are truly interested in. Preprocessing and clustering the highly unstructured content of web pages should be addressed very carefully in order to provide effective results. In this paper, a novel proposed two-phase self organizing feature map clustering framework to segment web users based on their subject interests in the diverse content of a University website is described. Also, the overall noise and dimensionality reduction of the sample web site content is properly addressed through the formulation of a comprehensive ten-step preprocessing procedure, which provided very promising experimental results when applied to the input web pages in the first phase of the proposed framework.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abonyi, J., Feil, B.: Cluster Analysis for Data Mining and System Identification. Birkhauser Verlag, Berlin (2007)
MATH Google Scholar
Akerkar, R., Lingras, P.: Building an Intelligent Web, Theory and Practice. Jones and Barlett, London (2008)
Google Scholar
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining, A Knowledge Discovery Approach. Springer Science+Business Media, New York (2007)
Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1) (1999)
Google Scholar
Hornick, M.F., Marcadé, E., Venkayala, S.: Java Data Mining: Strategy, Standard, and Practice. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2007)
Google Scholar
Jung, J.: Semantic Preprocessing of Web Request Streams for Web Usage Mining. Journal of Universal Computer Science 11(8), 1383–1396 (2005)
Google Scholar
Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM – Self-organizing maps of document collections. Neurocomputing, 101–117 (1998)
Google Scholar
Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001)
Book MATH Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Honkela, T.: Very Large Two-Level SOM for the Browsing of Newsgroups. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 269–274. Springer, Heidelberg (1996)
Chapter Google Scholar
Li, Y., Zhang, C., Zhang, S.: Cooperative strategy for Web data mining and cleaning. Applied Artificial Intelligence 17, 443–460 (2003)
Article Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proceedings of IEEE Knowledge and Data Engineering Exchange (1999)
Google Scholar
Nurnberger, A., Borgelt, C.: Fast Fuzzy Clustering of Web Page Collections. In: Proceedings of PKDD Workshop on Statistical Approaches for Web Mining, Pisa, Italy (2004)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program (14), 130–137 (1980)
Google Scholar
Velásquez, J.D., Yasuda, H., Aoki, T.: Using Self Organizing Feature Maps to acquire knowledge about visitor behavior. In: Proceedings of the knowledge-based intelligent information and engineering systems, Oxford, UK, pp. 951–958 (2003)
Google Scholar
Velásquez, J.D., Yasuda, H., Aoki, T., Weber, R.: A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems E200-D(1) (2004)
Google Scholar
Vesanto, J.: Using SOM in Data Mining. Licentiate’s thesis, Helsinki University of Technology (2000)
Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data Preparation for Data Mining. Applied Artificial Intelligence 17, 375–381 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Informatics, and Media, University of Bradford, UK
Ahmad Ammari & Valentina Zharkova

Authors

Ahmad Ammari
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Zharkova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Systems Science, Stockholm University, Forum 100, 164 40, Kista, Sweden
Anne Håkansson
Institute of Informatics, Wroclaw University of Technology, Str. Janiszweskiego 11/17, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen
Department of Computer Science, Franklin University, 201 South Grant Ave., 43215, Columbus, Ohio, USA
Ronald L. Hartung
School of Environment and Technology, Centre for SMART Systems, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ammari, A., Zharkova, V. (2009). Understanding Users’ Subject Interests in the Web Site Based on Their Usage of Its Content: A Novel Two-Phase Clustering Framework. In: Håkansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2009. Lecture Notes in Computer Science(), vol 5559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01665-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-01665-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01664-6
Online ISBN: 978-3-642-01665-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics