Skip to main content

A Sampling-PSO-K-means Algorithm for Document Clustering

  • Conference paper
Genetic and Evolutionary Computing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 238))

Abstract

Clustering is grouping objects into clusters such that objects within the same cluster are similar and objects of different clusters are dissimilar. Several clustering algorithms have been proposed in the literature, and they are used in several areas: security, marketing, documentation, social networks etc. The K-means algorithm is one of the best clustering algorithms. It is very efficient but its performance is very sensitive to the initialization of clusters. Several solutions have been proposed to address this problem. In this paper we propose a hybrid algorithm for document web clustering. The proposed algorithm is based on K-means, PSO and Sampling algorithms. It is evaluated on four datasets and the results are compared to those obtained by the algorithms: K-means, PSO, Sampling+K-means, and PSO+K-means. The results show that the proposed algorithm generates the most compact clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subpace clustering of high dimensional data for data mining applications (1999)

    Google Scholar 

  2. Nagesh, H., Goil, S., Choudhary, A.: Efficient and scalable subspace clustering for every large data sets (1999)

    Google Scholar 

  3. Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, New York City, USA, pp. 428–439 (1998)

    Google Scholar 

  4. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. John Wiley & Sons (1990)

    Google Scholar 

  5. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. The Principles and Practice of Numerical Classification. W. H. Freeman and Compagny, San Francisco (1973)

    MATH  Google Scholar 

  6. Vazirani. Algorithmes d’approximation, V. Collection IRIS. Springer (2006)

    Google Scholar 

  7. TREC. Text Retrieval Conference (1999), http://trec.nist.gov

  8. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  9. Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)

    Article  Google Scholar 

  10. Milligan, G.W.: The validation of four ultrametric clustering algorithms. Pattern Recognition 12, 41–50 (1980)

    Article  Google Scholar 

  11. Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  12. Mirkin, B.: Clustering for data mining: A data recovery approach. Chapman and Hall, London (2005)

    Book  Google Scholar 

  13. Kwedlo, W., Iwanowicz, P.: Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part II. LNCS, vol. 6114, pp. 165–172. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Xiaohui, C., Potok, T.E.: Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm. Applied Software Engineering Research Group, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831- 6085, USA (2005)

    Google Scholar 

  15. Saatchi, S., Hung, C.-C.: Hybridization of the Ant Colony Optimization with the K-Means Algorithm for Clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Carlisle, A., Dozier, G.: An Off-The- Shelf PSO. In: Proceedings of the 2001 Workshop on Particle Swarm Optimization, Indianapolis, IN, pp. 1–6 (2001)

    Google Scholar 

  17. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, New York (2001)

    Google Scholar 

  18. Shi, Y., Eberhart, R.C.: Parameter selection in particle swarm optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 591–600. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  19. Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning 2002 (SEAL 2002), Singapore, pp. 370–374 (2002)

    Google Scholar 

  20. Van, D.M., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation 2003 (CEC 2003), Canbella, Australia, pp. 215–220 (2003)

    Google Scholar 

  21. Alireza, A., Hamidreza, M.: Combining PSO and k-means to Enhance Data Clustering. In: International Symposium on Telecommunication, vol. 1 and 2, pp. 688–691 (2008)

    Google Scholar 

  22. Taher, N., Babak, A.: An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing 10(1), 183–197 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadjet Kamel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kamel, N., Ouchen, I., Baali, K. (2014). A Sampling-PSO-K-means Algorithm for Document Clustering. In: Pan, JS., Krömer, P., Snášel, V. (eds) Genetic and Evolutionary Computing. Advances in Intelligent Systems and Computing, vol 238. Springer, Cham. https://doi.org/10.1007/978-3-319-01796-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01796-9_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01795-2

  • Online ISBN: 978-3-319-01796-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics