Abstract
Clustering is grouping objects into clusters such that objects within the same cluster are similar and objects of different clusters are dissimilar. Several clustering algorithms have been proposed in the literature, and they are used in several areas: security, marketing, documentation, social networks etc. The K-means algorithm is one of the best clustering algorithms. It is very efficient but its performance is very sensitive to the initialization of clusters. Several solutions have been proposed to address this problem. In this paper we propose a hybrid algorithm for document web clustering. The proposed algorithm is based on K-means, PSO and Sampling algorithms. It is evaluated on four datasets and the results are compared to those obtained by the algorithms: K-means, PSO, Sampling+K-means, and PSO+K-means. The results show that the proposed algorithm generates the most compact clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subpace clustering of high dimensional data for data mining applications (1999)
Nagesh, H., Goil, S., Choudhary, A.: Efficient and scalable subspace clustering for every large data sets (1999)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, New York City, USA, pp. 428–439 (1998)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. John Wiley & Sons (1990)
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification. W. H. Freeman and Compagny, San Francisco (1973)
Vazirani. Algorithmes d’approximation, V. Collection IRIS. Springer (2006)
TREC. Text Retrieval Conference (1999), http://trec.nist.gov
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)
Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)
Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)
Mirkin, B.: Clustering for data mining: A data recovery approach. Chapman and Hall, London (2005)
Kwedlo, W., Iwanowicz, P.: Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part II. LNCS, vol. 6114, pp. 165–172. Springer, Heidelberg (2010)
Xiaohui, C., Potok, T.E.: Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm. Applied Software Engineering Research Group, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831- 6085, USA (2005)
Saatchi, S., Hung, C.-C.: Hybridization of the Ant Colony Optimization with the K-Means Algorithm for Clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)
Carlisle, A., Dozier, G.: An Off-The- Shelf PSO. In: Proceedings of the 2001 Workshop on Particle Swarm Optimization, Indianapolis, IN, pp. 1–6 (2001)
Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, New York (2001)
Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998)
Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002), Singapore, pp. 370–374 (2002)
Van, D.M., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Canbella, Australia, pp. 215–220 (2003)
Alireza, A., Hamidreza, M.: Combining PSO and k-means to Enhance Data Clustering. In: International Symposium on Telecommunication, vol. 1 and 2, pp. 688–691 (2008)
Taher, N., Babak, A.: An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing 10(1), 183–197 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kamel, N., Ouchen, I., Baali, K. (2014). A Sampling-PSO-K-means Algorithm for Document Clustering. In: Pan, JS., Krömer, P., Snášel, V. (eds) Genetic and Evolutionary Computing. Advances in Intelligent Systems and Computing, vol 238. Springer, Cham. https://doi.org/10.1007/978-3-319-01796-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-01796-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01795-2
Online ISBN: 978-3-319-01796-9
eBook Packages: EngineeringEngineering (R0)