Abstract
Web session cluster refinement is one of the major research issues for the improvement of cluster quality in recent days. The motive of refinement using Evolutionary Algorithms is quite obvious because in any clustering algorithm the obtained clusters shall have some data items that are inappropriately clustered, hence, never giving us well separated and cohesive clusters. Hence the quality of clusters is improved using refinement techniques. Initial clusters are formed using K-Means clustering algorithm which suffers from local minima problem. The refinement on clusters is performed on the basis of access and time features (Modified Knockout Refinement Algorithm) which is a distance based dissimilarity, Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and a combination of MKRA with GA and MKRA with PSO. Results are evaluated on five synthetic datasets and three real datasets. Further, it is shown experimentally that effectiveness of combining MKRA with evolutionary techniques produces better quality clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mobasher, Discovery of aggregate usage profiles for web personalization. WebKDD, Boston (2009)
Deborah, L., Baskaran, R., Kannan, A.: A Survey on Internal Validity Measure for Cluster Validation. International Journal of Computer Science & Engineering Survey (IJCSES) 1(2) (2010)
Sanghoun, O., Chang, W.A., Moongu, J.: An Evolutionary Cluster Validation Index (2008)
Nock, R., Nielsen, F.: On Weighting Clustering. IEEE Transactions and Pattern Analysis and Machine Intelligence 28(8), 1223–1235 (2006)
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web. Wiley (2003)
Chakrabarti, S.: Mining the Web. Morgan Kaufmann Publishers (2003)
Banerjee, A., Ghosh, J.: Click stream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining (2001)
Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Model-based clustering and visualization of navigation patterns on a Web site. Data Mining and Knowledge Discovery 7(4), 399–424 (2003)
Eiron, N., McCurley, K.: Untangling compound documents on the Web. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia (2003)
Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of Web Communities. IEEE Computer 35(3) (2002)
Berkhin, P.: Survey of clustering data mining techniques. Springer, Heidelberg (2006)
Xie, Y., Phoha, V.V.: Web user clustering from access log using belief function. In: Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), pp. 202–208. ACM Press (2001)
Shahabi, C., Banaei-Kashani, F.: A framework for efficient and anonymous web usage mining based on client-side tracking. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 113–144. Springer, Heidelberg (2002)
Fu, Y., Sandhu, K., Shih, M.: Clustering of Web users based on access patterns. Proceedings of WEBKDD (1999)
Gonzales, E., Mabu, S., Taboada, K., Hirasawa, K.: Web Mining using Genetic Relation Algorithm. In: SICE Annual Conference, pp. 1622–1627 (2010)
Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Third International Workshop on Mining Web Log Data Across All Customers Touch Points, EBKDD 2001 (2001)
Castellano, G., Fanelli, A.M., Mencar, C., Torsello, M.: Similarity based Fuzzy clustering for user profiling. In: Proceedings of International Conference on Web Intelligence and Intelligent Agent Technology. IEEE/WIC/ACM (2007)
Bentley, J.: Multidimensional Binary Search Trees Used for Associative Searching. ACM 18(9), 509–517 (1975)
Bradley, P.S., Fayyad, U., Reina, C.: Scaling Clustering Algorithms to Large Databases. In: 4th International Conference on Knowledge Discovery and Data Mining, KDD 1998. AAAI Press (August 1998)
Scholkopf, B., Smola, J., Muller, R.: Technical Report: Nonlinear component analysis as a kernel eigen value problem. Neural Comput. 10(5), 1299–1319 (1998)
Dhillon, I.S., Fan, J., Guan, Y.: Efficient clustering of very large document collections. In: Data Mining for Scientific and Engineering Applications, pp. 357–381. Kluwer Academic Publishers (2001)
Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 609–616 (2003)
Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient kmeans clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
Pelleg, D., Moore, A.: Accelerating exact kmeans algorithm with geometric reasoning. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 727–734 (1999)
Karypis, G., Han, E., Kumar, V.: Multilevel Refinement for Hierarchical Clustering. Department of Computer Science & Engineering Army HPC Research Center (1999)
Sujatha, N., Iyakutty, K.: Refinement of Web usage Data Clustering from K-means with Genetic Algorithm. European Journal of Scientific Research 42(3), 478–490 (2010) ISSN 1450-216X
Merwe, V.D., Engelbrecht, A.: Data clustering using particle swarm optimization. In: The 2003 Congress on Evolutionary Computation, CEC 2003, vol. 1, pp. 215–220. IEEExplore (2003)
Xiao, X., Dow, E.R., Eberhart, R., Miled, Z., Oppelt, R.: Gene Clustering using Self-Organizing Maps and Particle Swarm Optimization. In: Guo, M. (ed.) ISPA 2003. LNCS, vol. 2745, pp. 154–160. Springer, Heidelberg (2003)
Omran, M., Salman, A., Engelbrecht, A.: Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Analysis and Applications, 332–344 (2006)
Mitchell, M.: An Introduction to Genetic Algorithms, ch. 1-6, pp. 1–203. MIT Press (1998)
Arben, A., Alireza, L.: Using genetic algorithm for dynamic and multiple criteria web-site optimizations. European Journal of Operational Research, 1767–1777 (2007)
Ahmadyfard, A., Modares, H.: Combining PSO and K-Means to Enhance Data Clustering. In: International Symposium on Telecommunications. Published by IEEE (2008)
Krishna, K., Murty, M.N.: Genetic K-Means Algorithm. IEEE Transactions Published in: Systems, Man, and Cybernetics, Part B: Cybernetics 29(3) (1999)
Dixit, V.S.: Refinement of Clusters Based on Dissimilarity Measures. International Journal of Multidisciplinary Research and Advances in Engineering (IJMRAE) 6(1) (January 2014) (accepted to be published)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Dixit, V.S., Bhatia, S.K., Singh, V.B. (2014). Evaluation of Web Session Cluster Quality Based on Access-Time Dissimilarity and Evolutionary Algorithms. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham. https://doi.org/10.1007/978-3-319-09156-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-09156-3_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09155-6
Online ISBN: 978-3-319-09156-3
eBook Packages: Computer ScienceComputer Science (R0)