Skip to main content

Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

Abstract

One of the most difficult problems in clustering, the task of grouping similar instances in a dataset, is automatically determining the number of clusters that should be created. When a dataset has a large number of attributes (features), this task becomes even more difficult due to the relationship between the number of features and the number of clusters produced. One method of addressing this is feature selection, the process of selecting a subset of features to be used. Evolutionary computation techniques have been used very effectively for solving clustering problems, but have seen little use for simultaneously performing the three tasks of clustering, feature selection, and determining the number of clusters. Furthermore, only a small number of existing methods exist, but they have a number of limitations that affect their performance and scalability. In this work, we introduce a number of novel techniques for improving the performance of these three tasks using particle swarm optimisation and statistical techniques. We conduct a series of experiments across a range of datasets with clustering problems of varying difficulty. The results show our proposed methods achieve significantly better clustering performance than existing methods, while only using a small number of features and automatically determining the number of clusters more accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  2. García, A.J., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)

    Article  Google Scholar 

  3. Sheng, W., Liu, X., Fairhurst, M.C.: A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Trans. Knowl. Data Eng. 20(7), 868–879 (2008)

    Article  Google Scholar 

  4. Javani, M., Faez, K., Aghlmandi, D.: Clustering and feature selection via PSO algorithm. In: International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 71–76. IEEE (2011)

    Google Scholar 

  5. Lensen, A., Xue, B., Zhang, M.: Particle swarm optimisation representations for simultaneous clustering and feature selection. In: Proceedings of the Symposium Series on Computational Intelligence. IEEE (2016, to appear)

    Google Scholar 

  6. Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)

    Article  Google Scholar 

  7. Alelyani, S., Tang, J., Liu, H.: Feature selection for clustering: a review. In: Data Clustering: Algorithms and Applications, pp. 29–60 (2013)

    Google Scholar 

  8. Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press (2014)

    Google Scholar 

  9. Chiang, M.M., Mirkin, B.G.: Intelligent choice of the number of clusters in K-means clustering: an experimental study with different cluster spreads. J. Classif. 27(1), 3–40 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Muni, D.P., Pal, N.R., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. Part B 36(1), 106–117 (2006)

    Article  Google Scholar 

  11. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  12. Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: an ever evolving frontier in data mining. In: Proceedings of the Fourth International Workshop on Feature Selection in Data Mining, pp. 4–13 (2010)

    Google Scholar 

  13. Van Den Bergh, F.: An analysis of particle swarm optimizers. PhD thesis, University of Pretoria (2006)

    Google Scholar 

  14. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  15. Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lensen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lensen, A., Xue, B., Zhang, M. (2017). Using Particle Swarm Optimisation and the Silhouette Metric to Estimate the Number of Clusters, Select Features, and Perform Clustering. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55849-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55848-6

  • Online ISBN: 978-3-319-55849-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics