Skip to main content

Evaluation of Web Session Cluster Quality Based on Access-Time Dissimilarity and Evolutionary Algorithms

  • Conference paper
Computational Science and Its Applications – ICCSA 2014 (ICCSA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8583))

Included in the following conference series:

  • 3537 Accesses

Abstract

Web session cluster refinement is one of the major research issues for the improvement of cluster quality in recent days. The motive of refinement using Evolutionary Algorithms is quite obvious because in any clustering algorithm the obtained clusters shall have some data items that are inappropriately clustered, hence, never giving us well separated and cohesive clusters. Hence the quality of clusters is improved using refinement techniques. Initial clusters are formed using K-Means clustering algorithm which suffers from local minima problem. The refinement on clusters is performed on the basis of access and time features (Modified Knockout Refinement Algorithm) which is a distance based dissimilarity, Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and a combination of MKRA with GA and MKRA with PSO. Results are evaluated on five synthetic datasets and three real datasets. Further, it is shown experimentally that effectiveness of combining MKRA with evolutionary techniques produces better quality clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mobasher, Discovery of aggregate usage profiles for web personalization. WebKDD, Boston (2009)

    Google Scholar 

  2. Deborah, L., Baskaran, R., Kannan, A.: A Survey on Internal Validity Measure for Cluster Validation. International Journal of Computer Science & Engineering Survey (IJCSES) 1(2) (2010)

    Google Scholar 

  3. Sanghoun, O., Chang, W.A., Moongu, J.: An Evolutionary Cluster Validation Index (2008)

    Google Scholar 

  4. Nock, R., Nielsen, F.: On Weighting Clustering. IEEE Transactions and Pattern Analysis and Machine Intelligence 28(8), 1223–1235 (2006)

    Article  Google Scholar 

  5. Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web. Wiley (2003)

    Google Scholar 

  6. Chakrabarti, S.: Mining the Web. Morgan Kaufmann Publishers (2003)

    Google Scholar 

  7. Banerjee, A., Ghosh, J.: Click stream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining (2001)

    Google Scholar 

  8. Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Model-based clustering and visualization of navigation patterns on a Web site. Data Mining and Knowledge Discovery 7(4), 399–424 (2003)

    Article  MathSciNet  Google Scholar 

  9. Eiron, N., McCurley, K.: Untangling compound documents on the Web. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia (2003)

    Google Scholar 

  10. Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of Web Communities. IEEE Computer 35(3) (2002)

    Google Scholar 

  11. Berkhin, P.: Survey of clustering data mining techniques. Springer, Heidelberg (2006)

    Google Scholar 

  12. Xie, Y., Phoha, V.V.: Web user clustering from access log using belief function. In: Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), pp. 202–208. ACM Press (2001)

    Google Scholar 

  13. Shahabi, C., Banaei-Kashani, F.: A framework for efficient and anonymous web usage mining based on client-side tracking. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 113–144. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Fu, Y., Sandhu, K., Shih, M.: Clustering of Web users based on access patterns. Proceedings of WEBKDD (1999)

    Google Scholar 

  15. Gonzales, E., Mabu, S., Taboada, K., Hirasawa, K.: Web Mining using Genetic Relation Algorithm. In: SICE Annual Conference, pp. 1622–1627 (2010)

    Google Scholar 

  16. Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Third International Workshop on Mining Web Log Data Across All Customers Touch Points, EBKDD 2001 (2001)

    Google Scholar 

  17. Castellano, G., Fanelli, A.M., Mencar, C., Torsello, M.: Similarity based Fuzzy clustering for user profiling. In: Proceedings of International Conference on Web Intelligence and Intelligent Agent Technology. IEEE/WIC/ACM (2007)

    Google Scholar 

  18. Bentley, J.: Multidimensional Binary Search Trees Used for Associative Searching. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  19. Bradley, P.S., Fayyad, U., Reina, C.: Scaling Clustering Algorithms to Large Databases. In: 4th International Conference on Knowledge Discovery and Data Mining, KDD 1998. AAAI Press (August 1998)

    Google Scholar 

  20. Scholkopf, B., Smola, J., Muller, R.: Technical Report: Nonlinear component analysis as a kernel eigen value problem. Neural Comput. 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  21. Dhillon, I.S., Fan, J., Guan, Y.: Efficient clustering of very large document collections. In: Data Mining for Scientific and Engineering Applications, pp. 357–381. Kluwer Academic Publishers (2001)

    Google Scholar 

  22. Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 609–616 (2003)

    Google Scholar 

  23. Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)

    Article  Google Scholar 

  24. Pelleg, D., Moore, A.: Accelerating exact kmeans algorithm with geometric reasoning. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 727–734 (1999)

    Google Scholar 

  25. Karypis, G., Han, E., Kumar, V.: Multilevel Refinement for Hierarchical Clustering. Department of Computer Science & Engineering Army HPC Research Center (1999)

    Google Scholar 

  26. Sujatha, N., Iyakutty, K.: Refinement of Web usage Data Clustering from K-means with Genetic Algorithm. European Journal of Scientific Research 42(3), 478–490 (2010) ISSN 1450-216X

    Google Scholar 

  27. Merwe, V.D., Engelbrecht, A.: Data clustering using particle swarm optimization. In: The 2003 Congress on Evolutionary Computation, CEC 2003, vol. 1, pp. 215–220. IEEExplore (2003)

    Google Scholar 

  28. Xiao, X., Dow, E.R., Eberhart, R., Miled, Z., Oppelt, R.: Gene Clustering using Self-Organizing Maps and Particle Swarm Optimization. In: Guo, M. (ed.) ISPA 2003. LNCS, vol. 2745, pp. 154–160. Springer, Heidelberg (2003)

    Google Scholar 

  29. Omran, M., Salman, A., Engelbrecht, A.: Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Analysis and Applications, 332–344 (2006)

    Google Scholar 

  30. Mitchell, M.: An Introduction to Genetic Algorithms, ch. 1-6, pp. 1–203. MIT Press (1998)

    Google Scholar 

  31. Arben, A., Alireza, L.: Using genetic algorithm for dynamic and multiple criteria web-site optimizations. European Journal of Operational Research, 1767–1777 (2007)

    Google Scholar 

  32. Ahmadyfard, A., Modares, H.: Combining PSO and K-Means to Enhance Data Clustering. In: International Symposium on Telecommunications. Published by IEEE (2008)

    Google Scholar 

  33. Krishna, K., Murty, M.N.: Genetic K-Means Algorithm. IEEE Transactions Published in: Systems, Man, and Cybernetics, Part B: Cybernetics 29(3) (1999)

    Google Scholar 

  34. Dixit, V.S.: Refinement of Clusters Based on Dissimilarity Measures. International Journal of Multidisciplinary Research and Advances in Engineering (IJMRAE) 6(1) (January 2014) (accepted to be published)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Dixit, V.S., Bhatia, S.K., Singh, V.B. (2014). Evaluation of Web Session Cluster Quality Based on Access-Time Dissimilarity and Evolutionary Algorithms. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham. https://doi.org/10.1007/978-3-319-09156-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09156-3_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09155-6

  • Online ISBN: 978-3-319-09156-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics