Fast unsupervised feature selection based on the improved binary ant system and mutation strategy

  • Zhaleh Manbari
  • Fardin Akhlaghian TabEmail author
  • Chiman Salavati
Original Article


The “curse of dimensionality” issue caused by high-dimensional datasets not only imposes high memory and computational costs but also deteriorates the capability of learning methods. The main purpose of feature selection is to reduce the dimensionality of these datasets by discarding redundant and irrelevant features, which improves the performance of the learning algorithm. In this paper, a new feature selection algorithm, referred to as FSBACOM, was presented based on the binary ant system (BAS). The proposed method sought to improve feature selection by decreasing redundancy and achieved an optimum solution by increasing search space in a short time. For this purpose, the features were organized sequentially in a circular graph, where each feature was connected to the next one with two select/deselect edges. The proposed representation of the search space reduced computational time significantly, particularly on the high-dimensional datasets. Inspired from genetic algorithm and simulated annealing, a damped mutation strategy was introduced to avoid falling into local optima. In addition, a new idea was utilized to reduce the redundancy between selected features as far as possible. The performance of the proposed algorithm was compared to that of state-of-the-art feature selection algorithms using different classifiers on real-world datasets. The experimental results confirmed that FSBACOM significantly reduces computational time and achieves better performance than other feature selection methods.


Data classification High-dimensional data Feature selection Binary ant system Filter approach Mutation 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502Google Scholar
  2. 2.
    Jenatton R, Audibert J-Y, Bach F (2011) Structured variable selection with sparsity-inducing norms. J Mach Learn Res 12(10):2777–2824MathSciNetzbMATHGoogle Scholar
  3. 3.
    Kim Y, Kim J (2004) Gradient LASSO for feature selection. In: Proceedings of the twenty-first international conference on machine learning. ACM, p 60Google Scholar
  4. 4.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238Google Scholar
  5. 5.
    Yang J-B, Ong C-J (2011) Feature selection using probabilistic prediction of support vector regression. IEEE Trans Neural Netw 22(6):954–962Google Scholar
  6. 6.
    Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754Google Scholar
  7. 7.
    Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632Google Scholar
  8. 8.
    He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Paper presented at the advances in neural information processing systemsGoogle Scholar
  9. 9.
    Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157Google Scholar
  10. 10.
    Jiang Y, Ren J (2011) Eigenvalue sensitive feature selection. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 89–96Google Scholar
  11. 11.
    Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011) l2, 1-norm regularized discriminative feature selection for unsupervised learning. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 1, p 1589Google Scholar
  12. 12.
    Padungweang P, Lursinsap C, Sunat K (2012) A discrimination analysis for unsupervised feature selection via optic diffraction principle. IEEE Trans Neural Netw Learn Syst 23(10):1587–1600Google Scholar
  13. 13.
    Xu Z, King I, Lyu MR-T, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047Google Scholar
  14. 14.
    Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71(10):1842–1849Google Scholar
  15. 15.
    Zeng Z, Wang X, Zhang J, Wu Q (2016) Semi-supervised feature selection based on local discriminative information. Neurocomputing 173:102–109Google Scholar
  16. 16.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(3):1157–1182zbMATHGoogle Scholar
  17. 17.
    Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13zbMATHGoogle Scholar
  18. 18.
    Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123Google Scholar
  19. 19.
    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517Google Scholar
  20. 20.
    Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recognit 43(6):2068–2081. zbMATHGoogle Scholar
  21. 21.
    Wei J, Zhang R, Yu Z, Hu R, Tang J, Gui C, Yuan Y (2017) A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection. Appl Soft Comput 58:176–192Google Scholar
  22. 22.
    Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258Google Scholar
  23. 23.
    Huang C-L, Huang W-L (2009) Handling sequential pattern decay: developing a two-stage collaborative recommender system. Electron Commer Res Appl 8(3):117–129Google Scholar
  24. 24.
    Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94Google Scholar
  25. 25.
    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28Google Scholar
  26. 26.
    Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205Google Scholar
  27. 27.
    Oh I-S, Lee J-S, Moon B-R (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437Google Scholar
  28. 28.
    Chuang L-Y, Yang C-H, Li J-C (2011) Chaotic maps based on binary particle swarm optimization for feature selection. Appl Soft Comput 11(1):239–248Google Scholar
  29. 29.
    Chuang L-Y, Tsai S-W, Yang C-H (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38(10):12699–12707Google Scholar
  30. 30.
    Amoozegar M, Minaei-Bidgoli B (2018) Optimizing Multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst Appl 113:499–514Google Scholar
  31. 31.
    Ghaemi M, Feizi-Derakhshi M-R (2016) Feature selection using forest optimization algorithm. Pattern Recognit 60:121–129Google Scholar
  32. 32.
    Jiang S, Chin K-S, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230Google Scholar
  33. 33.
    Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103Google Scholar
  34. 34.
    Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recognit Lett 31(3):226–233Google Scholar
  35. 35.
    Chen Y, Miao D, Wang R, Wu K (2011) A rough set approach to feature selection based on power set tree. Knowl Based Syst 24(2):275–281Google Scholar
  36. 36.
    Tabakhi S, Moradi P (2015) Relevance–redundancy feature selection based on ant colony optimization. Pattern Recognit 48(9):2798–2811Google Scholar
  37. 37.
    Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036Google Scholar
  38. 38.
    Kashef S, Nezamabadi-pour H (2015) An advanced ACO algorithm for feature subset selection. Neurocomputing 147:271–279Google Scholar
  39. 39.
    Kong M, Tian P (2006) Introducing a binary ant colony optimization. In: International workshop on ant colony optimization and swarm intelligence. Springer, pp 444–451Google Scholar
  40. 40.
    Jang S-H, Roh J-H, Kim W, Sherpa T, Kim J-H, Park J-B (2011) A novel binary ant colony optimization: application to the unit commitment problem of power systems. J Electr Eng Technol 6(2):174–181Google Scholar
  41. 41.
    Chen B, Chen L, Chen Y (2013) Efficient ant colony optimization for image feature selection. Sig Process 93(6):1566–1576Google Scholar
  42. 42.
    Kadri O, Mouss LH, Mouss MD (2012) Fault diagnosis of rotary kiln using SVM and binary ACO. J Mech Sci Technol 26(2):601–608Google Scholar
  43. 43.
    Kong M, Tian P (2005) A binary ant colony optimization for the unconstrained function optimization problem. Comput Intell Secur 3801:682–687Google Scholar
  44. 44.
    Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66Google Scholar
  45. 45.
    Mohan U (2011) Bio inspired computing. BSc. Seminar. Division of CS SOE. CUSATGoogle Scholar
  46. 46.
    Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Benesty J et al (eds) Noise reduction in speech processing. Springer, Berlin, pp 1–4. Google Scholar
  47. 47.
    Kabir MM, Shahjahan M, Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17):2914–2928Google Scholar
  48. 48.
    Martens D, Baesens B, Fawcett T (2011) Editorial survey: swarm intelligence for data mining. Mach Learn 82(1):1–42MathSciNetGoogle Scholar
  49. 49.
    Theodoridis S, Koutroumbas K (2008) Pattern recognition. Elsevier Science, AmsterdamzbMATHGoogle Scholar
  50. 50.
    Lai C, Reinders MJT, Wessels L (2006) Random subspace method for multivariate feature selection. Pattern Recognit Lett 27(10):1067–1076. Google Scholar
  51. 51.
    Ghazavi SN, Liao TW (2008) Medical data mining by fuzzy modeling with selected features. Artif Intell Med 43(3):195–206Google Scholar
  52. 52.
    Haindl M, Somol P, Ververidis D, Kotropoulos C (2006) Feature selection based on mutual correlation. In: Progress in pattern recognition, image analysis and applications, pp 569–577Google Scholar
  53. 53.
    Ferreira AJ, Figueiredo MA (2012) Efficient feature selection filters for high-dimensional data. Pattern Recognit Lett 33(13):1794–1804Google Scholar
  54. 54.
    Ferreira AJ, Figueiredo MAT (2012) An unsupervised approach to feature discretization and selection. Pattern Recognit 45(9):3048–3060. Google Scholar
  55. 55.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750Google Scholar
  56. 56.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537Google Scholar
  57. 57.
    Bache K, Lichman M (2013) UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine.
  58. 58.
    Nene SA, Nayar SK, Murase H et al (1996) Columbia object image library (coil-20). Technical report CUCS-005-96Google Scholar
  59. 59.
    Unler A, Murat A, Chinnam RB (2011) mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci 181(20):4625–4641. Google Scholar
  60. 60.
    Zhang Y, Yang A, Xiong C, Wang T, Zhang Z (2014) Feature selection using data envelopment analysis. Knowl Based Syst 64:70–80Google Scholar
  61. 61.
    Almuallim H, Dietterich TG (1991) Efficient algorithms for identifying relevant features. In: Proceedings of the 9th Canadian conference on artificial intelligence. Citeseer, pp 38–45Google Scholar
  62. 62.
    Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Zhaleh Manbari
    • 1
  • Fardin Akhlaghian Tab
    • 1
    Email author
  • Chiman Salavati
    • 1
  1. 1.Department of Computer EngineeringUniversity of KurdistanSanandajIran

Personalised recommendations