A Sampling-PSO-K-means Algorithm for Document Clustering

Kamel, Nadjet; Ouchen, Imane; Baali, Karim

doi:10.1007/978-3-319-01796-9_5

Nadjet Kamel^5,6,
Imane Ouchen⁶ &
Karim Baali⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 238))

1848 Accesses
15 Citations

Abstract

Clustering is grouping objects into clusters such that objects within the same cluster are similar and objects of different clusters are dissimilar. Several clustering algorithms have been proposed in the literature, and they are used in several areas: security, marketing, documentation, social networks etc. The K-means algorithm is one of the best clustering algorithms. It is very efficient but its performance is very sensitive to the initialization of clusters. Several solutions have been proposed to address this problem. In this paper we propose a hybrid algorithm for document web clustering. The proposed algorithm is based on K-means, PSO and Sampling algorithms. It is evaluated on four datasets and the results are compared to those obtained by the algorithms: K-means, PSO, Sampling+K-means, and PSO+K-means. The results show that the proposed algorithm generates the most compact clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subpace clustering of high dimensional data for data mining applications (1999)
Google Scholar
Nagesh, H., Goil, S., Choudhary, A.: Efficient and scalable subspace clustering for every large data sets (1999)
Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, New York City, USA, pp. 428–439 (1998)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. John Wiley & Sons (1990)
Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification. W. H. Freeman and Compagny, San Francisco (1973)
MATH Google Scholar
Vazirani. Algorithmes d’approximation, V. Collection IRIS. Springer (2006)
Google Scholar
TREC. Text Retrieval Conference (1999), http://trec.nist.gov
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)
Article Google Scholar
Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)
Article Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Mirkin, B.: Clustering for data mining: A data recovery approach. Chapman and Hall, London (2005)
Book Google Scholar
Kwedlo, W., Iwanowicz, P.: Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part II. LNCS, vol. 6114, pp. 165–172. Springer, Heidelberg (2010)
Chapter Google Scholar
Xiaohui, C., Potok, T.E.: Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm. Applied Software Engineering Research Group, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831- 6085, USA (2005)
Google Scholar
Saatchi, S., Hung, C.-C.: Hybridization of the Ant Colony Optimization with the K-Means Algorithm for Clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)
Chapter Google Scholar
Carlisle, A., Dozier, G.: An Off-The- Shelf PSO. In: Proceedings of the 2001 Workshop on Particle Swarm Optimization, Indianapolis, IN, pp. 1–6 (2001)
Google Scholar
Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, New York (2001)
Google Scholar
Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998)
Chapter Google Scholar
Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002), Singapore, pp. 370–374 (2002)
Google Scholar
Van, D.M., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Canbella, Australia, pp. 215–220 (2003)
Google Scholar
Alireza, A., Hamidreza, M.: Combining PSO and k-means to Enhance Data Clustering. In: International Symposium on Telecommunication, vol. 1 and 2, pp. 688–691 (2008)
Google Scholar
Taher, N., Babak, A.: An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing 10(1), 183–197 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Sciences, UFAS, Setif, Algeria
Nadjet Kamel
LRIA, Computer Science Department, USTHB, Algiers, Algeria
Nadjet Kamel, Imane Ouchen & Karim Baali

Authors

Nadjet Kamel
View author publications
You can also search for this author in PubMed Google Scholar
Imane Ouchen
View author publications
You can also search for this author in PubMed Google Scholar
Karim Baali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nadjet Kamel .

Editor information

Editors and Affiliations

Department of Electronic Engineering, National Kaohiung University of Applied Sciences, Kaohsiung, Taiwan
Jeng-Shyang Pan
Department of Computer Science Faculty of Ele. Eng. & Computer Science, VŠB-TUO, Ostrava-Poruba, Czech Republic
Pavel Krömer
Department of Computer Science Faculty of Ele. Eng. & Computer Science, VŠB-TUO, Ostrava-Poruba, Czech Republic
Václav Snášel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kamel, N., Ouchen, I., Baali, K. (2014). A Sampling-PSO-K-means Algorithm for Document Clustering. In: Pan, JS., Krömer, P., Snášel, V. (eds) Genetic and Evolutionary Computing. Advances in Intelligent Systems and Computing, vol 238. Springer, Cham. https://doi.org/10.1007/978-3-319-01796-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-01796-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01795-2
Online ISBN: 978-3-319-01796-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics