A review of unsupervised feature selection methods

Abstract

In recent years, unsupervised feature selection methods have raised considerable interest in many research areas; this is mainly due to their ability to identify and select relevant features without needing class label information. In this paper, we provide a comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature. We present a taxonomy of these methods and describe the main characteristics and the fundamental ideas they are based on. Additionally, we summarized the advantages and disadvantages of the general lines in which we have categorized the methods analyzed in this review. Moreover, an experimental comparison among the most representative methods of each approach is also presented. Finally, we discuss some important open challenges in this research area.

This is a preview of subscription content, log in to check access.

Fig. 1

Notes

  1. 1.

    Also called instances, observations or samples; commonly represented as vectors.

  2. 2.

    The set composed by the square of the singular values of the data matrix.

  3. 3.

    Clustering can be made using the Constrained Boolean Matrix Factorization (CBMF) algorithm proposed by Li et al. (2014a) or employing eigendecomposition and exhaustive search.

  4. 4.

    The number in parentheses denotes the number of datasets used for validation.

  5. 5.

    Unlike supervised feature selection, which has class labels to guide the search for discriminative features, in UFS, we must define feature relevancy in the form of objective concepts.

  6. 6.

    https://archive.ics.uci.edu/ml/index.php.

  7. 7.

    In order to get more reliable results, we repeat the k-means algorithm ten times with different initial points and report the average clustering quality results.

References

  1. Agrawal S, Agrawal J (2015) Survey on anomaly detection using data mining techniques. Procedia Comput Sci 60(1):708–713. https://doi.org/10.1016/j.procs.2015.08.220

    Article  Google Scholar 

  2. Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Genera Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001

    Article  Google Scholar 

  3. Alelyani S (2013) On feature selection stability: a data perspective. Arizona State University, Tempe

    Google Scholar 

  4. Alelyani S, Liu H, Wang L (2011) The effect of the characteristics of the dataset on the selection stability. In: Proceedings—international conference on tools with artificial intelligence, ICTAI, pp 970–977. https://doi.org/10.1109/ICTAI.2011.167

  5. Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: a review. Data Cluster Algorithms Appl 29:110–121

    Google Scholar 

  6. Alter O, Alter O (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97(18):10101–10106

    Google Scholar 

  7. Ambusaidi MA, He X, Nanda P (2015) Unsupervised feature selection method for intrusion detection system. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, vol 1, pp 295–301. https://doi.org/10.1109/Trustcom.2015.387

  8. Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform 13(5):971–989. https://doi.org/10.1109/TCBB.2015.2478454

    Article  Google Scholar 

  9. Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272

    Google Scholar 

  10. Banerjee M, Pal NR (2014) Feature selection with SVD entropy: some modification and extension. Inf Sci 264:118–134. https://doi.org/10.1016/j.ins.2013.12.029

    MathSciNet  Article  MATH  Google Scholar 

  11. Beni G, Wang J (1993) Swarm intelligence in cellular robotic systems. In: Dario P, Sandini G, Aebischer P (eds) Robots and biological systems: towards a new bionics?. Springer, Berlin, pp 703–712. https://doi.org/10.1007/978-3-642-58069-7_38

    Google Scholar 

  12. Bharti KK, kumar Singh P (2014) A survey on filter techniques for feature selection in text mining. In: Proceedings of the second international conference on soft computing for problem solving (SocProS 2012), December 28–30, 2012. Springer, pp 1545–1559

  13. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. https://doi.org/10.1007/978-3-319-21858-8

    Google Scholar 

  14. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J et al (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach® Learn 3(1):1–122

    MATH  Google Scholar 

  15. Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recognit 44(4):854–865. https://doi.org/10.1016/j.patcog.2010.10.006

    Article  Google Scholar 

  16. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 333–342

  17. Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 0:1–10. https://doi.org/10.1016/j.neucom.2017.11.077

    Article  Google Scholar 

  18. Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27. https://doi.org/10.1080/03610927408827101, http://www.tandfonline.com/doi/abs/10.1080/03610927408827101?journalCode=lsta19#preview

    MathSciNet  MATH  Google Scholar 

  19. Chakrabarti S, Frank E, Güting RH, Han J, Jiang X, Kamber M, Lightstone SS, Nadeau TP, Neapolitan RE et al (2008) Data mining: know it all. Elsevier Science. https://books.google.com.mx/books?id=WRqZ0QsdxKkC

  20. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024

    Article  Google Scholar 

  21. Chung FRK (1997) Spectral graph theory, vol 92. American Mathematical Society, Providence

    Google Scholar 

  22. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  23. Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York

    Google Scholar 

  24. Dadaneh BZ, Markid HY, Zakerolhosseini A (2016) Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst Appl 53:27–42. https://doi.org/10.1016/j.eswa.2016.01.021

    Article  Google Scholar 

  25. Daniels MJ, Normand SLT (2005) Longitudinal profiling of health care units based on continuous and discrete patient outcomes. Biostatistics 7(1):1–15

    MATH  Google Scholar 

  26. Dash M, Liu H (2000) Feature selection for Clustering. In: Terano T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. Current issues and new applications, vol 1805, pp 110–121. https://doi.org/10.1007/3-540-45571-X_13

    Google Scholar 

  27. Dash M, Ong YS (2011) RELIEF-C: efficient feature selection for clustering over noisy data. In: 2011 23rd IEEE international conference on tools with artificial intelligence (ICTAI). IEEE, pp 869–872

  28. Dash M, Liu H, Yao J (1997) Dimensionality reduction of unsupervised data. In: Proceedings Ninth IEEE international conference on tools with artificial intelligence. IEEE Computer Society, pp 532–539. https://doi.org/10.1109/TAI.1997.632300, http://ieeexplore.ieee.org/document/632300/

  29. Dash M, Choi K, Scheuermann P, Liu HLH (2002) Feature selection for clustering—a filter solution. In: 2002 Proceedings 2002 IEEE international conference on data mining. pp 115–122. https://doi.org/10.1109/ICDM.2002.1183893

  30. De Leon AR, Chough KC (2013) Analysis of mixed data: methods and applications. CRC Press, London

    Google Scholar 

  31. Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM-Alogrithm, vol 39. https://doi.org/10.2307/2984875, arXiv:0710.5696v2

  32. Devakumari D, Thangavel K (2010) Unsupervised adaptive floating search feature selection based on Contribution Entropy. In: 2010 International conference on communication and computational intelligence (INCOCCI). IEEE, pp 623–627

  33. Devaney M, Ram A (1997) Efficient feature selection in conceptual clustering. In: ICML ’97 Proceedings of the fourteenth international conference on machine learning. pp 92–97. Morgan Kaufmann Publishers Inc, San Francisco, CA. http://dl.acm.org/citation.cfm?id=645526.657124

  34. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Pattern recognition: a statistical approach. http://www.scopus.com/inward/record.url?eid=2-s2.0-0019926397&partnerID=40

  35. Dong G, Liu H (2018) Feature engineering for machine learning and data analytics. CRC Press. https://books.google.com.au/books?hl=en&lr=&id=QmNRDwAAQBAJ&oi=fnd&pg=PT15&ots=4FR0a_rfAH&sig=xMBalldd_vLcQdcnDWy9q7c_z7c#v=onepage&q&f=false

  36. Donoho DL, Tsaig Y (2008) Fast solution of-norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54(11):4789–4812

    MathSciNet  MATH  Google Scholar 

  37. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evolut Comput 1(1):53–66

    Google Scholar 

  38. Du S, Ma Y, Li S, Ma Y (2017) Robust unsupervised feature selection via matrix factorization. Neurocomputing 241:115–127. https://doi.org/10.1016/j.neucom.2017.02.034

    Article  Google Scholar 

  39. Dutta D, Dutta P, Sil J (2014) Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int J Hybrid Intell Syst 11(1):41–54

    Google Scholar 

  40. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889. https://doi.org/10.1016/j.patrec.2014.11.006

    MathSciNet  Article  MATH  Google Scholar 

  41. El Ghaoui L, Li GC, Duong VA, Pham V, Srivastava AN, Bhaduri K (2011) Sparse machine learning methods for understanding large text corpora. In: CIDU, pp 159–173

  42. Feldman R, Sanger J (2006) The text mining handbook. Cambridge university press. https://doi.org/10.1017/CBO9780511546914, https://www.cambridge.org/core/product/identifier/9780511546914/type/book, arXiv:1011.1669v3

  43. Ferreira AJ, Figueiredo MA (2012) An unsupervised approach to feature discretization and selection. Pattern Recognit 45(9):3048–3060. https://doi.org/10.1016/j.patcog.2011.12.008

    Article  Google Scholar 

  44. Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396. https://doi.org/10.1109/34.990138

    Article  Google Scholar 

  45. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10.1023/A:1022852608280

    Article  Google Scholar 

  46. Fix E, Hodges Jr JL (1951) Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report. California University Berkeley

  47. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    MATH  Google Scholar 

  48. Fowlkes EB, Gnanadesikan R, Kettenring JR (1988) Variable selection in clustering. J Classif 5(2):205–228. https://doi.org/10.1007/BF01897164

    MathSciNet  Article  Google Scholar 

  49. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701. https://doi.org/10.1080/01621459.1937.10503522

    Article  MATH  Google Scholar 

  50. Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, 1st edn. Springer series in statistics. Springer, New York

    Google Scholar 

  51. Fukunaga K (1990) Introduction to statistical pattern recognition, vol 22. https://doi.org/10.1016/0098-3004(96)00017-9, http://books.google.com/books?id=BIJZTGjTxBgC&pgis=1, arXiv:1011.1669v3

    Google Scholar 

  52. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, 72nd edn. Springer, New York. https://doi.org/10.1007/978-3-319-10247-4

    Google Scholar 

  53. Garcia-Garcia D, Santos-Rodriguez R (2009) Spectral clustering and feature selection for microarray data. In: International conference on machine learning and applications, 2009 ICMLA ’09 pp 425–428. https://doi.org/10.1109/ICMLA.2009.86

  54. Gu S, Zhang L, Zuo W, Feng X (2014) Projective dictionary pair learning for pattern classification. In: Advances in neural information processing systems, pp 793–801

  55. Guo J, Zhu W (2018) Dependence guided unsupervised feature selection. In: Aaai, pp 2232–2239

  56. Guo J, Guo Y, Kong X, He R (2017) Unsupervised feature selection with ordinal locality school of information and communication engineering. Dalian University of Technology National, Laboratory of Pattern Recognition, CASIA Center for Excellence in Brain Science and Intelligence Technology, Dalian

    Google Scholar 

  57. Guyon I, Elisseeff A, De AM (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182. https://doi.org/10.1016/j.aca.2011.07.027, arXiv:1111.6189v1

    Google Scholar 

  58. Haindl M, Somol P, Ververidis D, Kotropoulos C (2006) Feature selection based on mutual correlation. In: Progress in pattern recognition, image analysis and applications, pp 569–577

    Google Scholar 

  59. Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato Hamilton

  60. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  61. Han J, Sun Z, Hao H (2015) Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowl Based Syst 86:210–223. https://doi.org/10.1016/j.knosys.2015.06.008

    Article  Google Scholar 

  62. He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160

  63. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems 18, vol 186, pp 507–514

  64. Hou C, Nie F, Yi D, Wu Y (2011) Feature selection via joint embedding learning and sparse regression. In: IJCAI Proceedings-international joint conference on artificial intelligence, Citeseer, vol 22. pp 1324

  65. Hou C, Nie F, Li X, Yi D, Wu Y (2014) Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans Cybern 44(6):793–804

    Google Scholar 

  66. Hruschka ER, Covoes TF (2005) Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. In: 2005 and international conference on intelligent agents, web technologies and internet commerce, international conference on computational intelligence for modelling, control and automation, vol 1. IEEE, pp 32–38

  67. Hruschka ER, Hruschka ER, Covoes TF, Ebecken NFF (2005) Feature selection for clustering problems: a hybrid algorithm that iterates between k-means and a Bayesian filter. In: Fifth international conference on hybrid intelligent systems, 2005. HIS ’05. IEEE. https://doi.org/10.1109/ICHIS.2005.42

  68. Hruschka ER, Covoes TF, Hruschka JER, Ebecken NFF (2007) Adapting supervised feature selection methods for clustering tasks. In: Methods for clustering tasks in managing worldwide operations and communications with information technology (IRMA 2007 proceedings), information resources management association (IRMA) international conference vancouver 2007 99-102 Hershey: Idea Group Publishing. https://doi.org/10.4018/978-1-59904-929-8.ch024

  69. Hu J, Xiong C, Shu J, Zhou X, Zhu J (2009) An improved text clustering method based on hybrid model. Int J Modern Educ Comput Sci 1(1):35

    Google Scholar 

  70. Huang Z (1997) Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining,(PAKDD), Singapore. pp 21–34

  71. Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Google Scholar 

  72. Jashki A, Makki M, Bagheri E, Ghorbani AA (2009) An iterative hybrid filter-wrapper approach to feature selection for document clustering. In: Proceedings of the 22nd Canadian conference on artificial intelligence (AI’09) 2009

    Google Scholar 

  73. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 338–345

  74. Kim Y, Gao J (2006) Unsupervised gene selection for high dimensional data. In: Sixth IEEE symposium on bioinformatics and bioengineering (BIBE’06), pp 227–234. https://doi.org/10.1109/BIBE.2006.253339, http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4019664

  75. Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556

    MATH  Google Scholar 

  76. Kong D, Ding C, Huang H (2011) Robust nonnegative matrix factorization using l21-norm. In: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM), pp 673–682. https://doi.org/10.1145/2063576.2063676, http://dl.acm.org/citation.cfm?id=2063676

  77. Kotsiantis SB (2011) Feature selection for machine learning classification problems: a recent overview. Artifi Intell Rev 42:157–176. https://doi.org/10.1007/s10462-011-9230-1

    Article  Google Scholar 

  78. Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166

    Google Scholar 

  79. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowé A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119. https://doi.org/10.1109/TCBB.2012.33

    Article  Google Scholar 

  80. Lee W, Stolfo SJ, Mok KW (2000) Adaptive intrusion detection: a data mining approach. Artif Intell Rev 14(6):533–567

    MATH  Google Scholar 

  81. Lee PY, Loh WP, Chin JF (2017) Feature selection in multimedia: the state-of-the-art review. Image Vis Comput 67:29–42. https://doi.org/10.1016/j.imavis.2017.09.004

    Article  Google Scholar 

  82. Li Z, Tang J (2015) Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans Image Process 24(12):5343–5355. https://doi.org/10.1109/TIP.2015.2479560, http://ieeexplore.ieee.org/document/7271072/

    MathSciNet  MATH  Google Scholar 

  83. Li Y, Lu BL, Wu ZF (2006) A hybrid method of unsupervised feature selection based on ranking. In: 18th international conference on pattern recognition (ICPR’06), Hong Kong, China, pp 687–690. https://doi.org/10.1109/ICPR.2006.84, http://dl.acm.org/citation.cfm?id=1172253

  84. Li Y, Lu BL, Wu ZF (2007) Hierarchical fuzzy filter method for unsupervised feature selection. J Intell Fuzzy Syst 18(2):157–169. http://dl.acm.org/citation.cfm?id=1368376.1368381

  85. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI

  86. Li Z, Cheong LF, Zhou SZ (2014a) SCAMS: Simultaneous clustering and model selection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 264–271. https://doi.org/10.1109/CVPR.2014.41

  87. Li Z, Liu J, Yang Y, Zhou X, Lu H (2014b) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150

    Google Scholar 

  88. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2016) Feature selection: a data perspective. J Mach Learn Res 1–73. arXiv:1601.07996

    Google Scholar 

  89. Lichman M (2013) UCI Machine learning repository. http://archive.ics.uci.edu/ml

  90. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. https://doi.org/10.1007/978-1-4615-5689-3, arXiv:1011.1669v3

    Google Scholar 

  91. Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, London

    Google Scholar 

  92. Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(1–3):503–528. https://doi.org/10.1007/BF01589116, arXiv:1011.1669v3

    MathSciNet  MATH  Google Scholar 

  93. Liu H, Yu L, Member SS, Yu L, Member SS (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502. https://doi.org/10.1109/TKDE.2005.66

    Article  Google Scholar 

  94. Liu J, Ji S, Ye J (2009a) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 339–348

  95. Liu R, Yang N, Ding X, Ma L (2009b) An unsupervised feature selection algorithm: Laplacian score combined with distance-based entropy measure. In: 3rd international symposium on intelligent information technology application, IITA 2009, vol 3, pp 65–68. https://doi.org/10.1109/IITA.2009.390

  96. Liu H, Wei R, Jiang G (2013) A hybrid feature selection scheme for mixed attributes data. Comput Appl Math 32(1):145–161

    MathSciNet  MATH  Google Scholar 

  97. Lu Q, Li X, Dong Y (2018) Structure preserving unsupervised feature selection. Neurocomputing 301:36–45. https://doi.org/10.1016/j.neucom.2018.04.001

    Article  Google Scholar 

  98. Luo Y, Xiong S (2009) Clustering ensemble for unsupervised feature selection. In: Fourth international conference on fuzzy systems and knowledge discovery. IEEE Computer Society, Los Alamitos, vol 1, pp 445–448. https://doi.org/10.1109/FSKD.2009.449

  99. Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2018) Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neural Netw Learn Syst 29(4):944–956. https://doi.org/10.1109/TNNLS.2017.2650978, http://www.contrib.andrew.cmu.edu/~uqxchan1/papers/TNNLS2017_ANFS.pdf

    Google Scholar 

  100. Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z, http://dl.acm.org/citation.cfm?id=1288832

    MathSciNet  Google Scholar 

  101. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. http://projecteuclid.org/euclid.bsmsp/1200512992

  102. Mao K (2005) Identifying critical variables of principal components for unsupervised feature selection. Syst Man Cybern Part B Cybern 35(2):339–44. https://doi.org/10.1109/TSMCB.2004.843269

    MathSciNet  Article  Google Scholar 

  103. Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8(3):404–417. https://doi.org/10.1145/321075.321084, http://portal.acm.org/citation.cfm?doid=321075.321084

    MATH  Google Scholar 

  104. Miao J, Niu L (2016) A survey on feature selection. Procedia Comput Sci 91(Itqm):919–926. https://doi.org/10.1016/j.procs.2016.07.111

    Article  Google Scholar 

  105. Mitra PFSUFS, Ca M, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intelligence 24(3):301–312. https://doi.org/10.1109/34.990133

    Article  Google Scholar 

  106. Mugunthadevi K, Punitha SC, Punithavalli M (2011) Survey on feature selection in document clustering. Int J Comput Sci Eng 3(3):1240–1244. http://www.enggjournals.com/ijcse/doc/IJCSE11-03-03-077.pdf

  107. Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint 2, 1-norms minimization. In: Advances in neural information processing systems, pp 1813–1821

  108. Nie F, Zhu W, Li X (2016) Unsupervised feature selection with structured graph optimization. In: Proceedings of the 30th conference on artificial intelligence (AAAI 2016), vol 13, No. 9, pp 1302–1308

  109. Niijima S, Okuno Y (2009) Laplacian linear discriminant analysis approach to unsupervised feature selection. IEEE ACM Trans Comput Biol Bioinform 6(4):605–614. https://doi.org/10.1109/TCBB.2007.70257

    Article  Google Scholar 

  110. Osborne MR, Presnell B, Turlach BA (2000) On the lasso and its dual. J Comput Graph Stat 9(2):319–337

    MathSciNet  Google Scholar 

  111. Padungweang P, Lursinsap C, Sunat K (2009) Univariate filter technique for unsupervised feature selection using a new Laplacian score based local nearest neighbors. In: Asia-Pacific conference on information processing, 2009. APCIP 2009, vol 2. IEEE, pp 196–200

  112. Pal SK, Mitra P (2004) Pattern Recognit Algorithms Data Min, 1st edn. Chapman and Hall/CRC, London

    Google Scholar 

  113. Pal SK, De RK, Basak J (2000) Unsupervised feature evaluation: a neuro-fuzzy approach. IEEE Trans Neural Netw 11(2):366–376

    Google Scholar 

  114. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159

    Article  Google Scholar 

  115. Qian M, Zhai C (2013) Robust unsupervised feature selection. In: Proceedings of the twenty-third international joint conference on artificial intelligence, pp 1621–1627. http://dl.acm.org/citation.cfm?id=2540361

  116. Rao VM, Sastry VN (2012) Unsupervised feature ranking based on representation entropy. In: 2012 1st international conference on recent advances in information technology, RAIT-2012, pp 421–425. https://doi.org/10.1109/RAIT.2012.6194631

  117. Ritter G (2015) Robust cluster analysis and variable selection, vol 137. CRC Press, London

    Google Scholar 

  118. Roth V, Lange T (2004) Feature selection in clustering problems. Adv Neural Inf Process Syst 16:473–480

    Google Scholar 

  119. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science (New York, NY) 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323

    Article  Google Scholar 

  120. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. https://doi.org/10.1093/bioinformatics/btm344

    Article  Google Scholar 

  121. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64(2016):141–158. https://doi.org/10.1016/j.patcog.2016.11.003

    Article  MATH  Google Scholar 

  122. Shi L, Du L, Shen YD (2015) Robust spectral learning for unsupervised feature selection. In: Proceedings—IEEE international conference on data mining, ICDM 2015-Janua, pp 977–982. https://doi.org/10.1109/ICDM.2014.58

  123. Shi Y, Miao J, Wang Z, Zhang P, Niu L (2018) Feature Selection With L2,1–2 Regularization. IEEE Trans Neural Netw Learn Syst 29(10):4967–4982. https://doi.org/10.1109/TNNLS.2017.2785403, https://ieeexplore.ieee.org/document/8259312/

  124. Solorio-Fernández S, Carrasco-Ochoa J, Martínez-Trinidad J (2016) A new hybrid filterwrapper feature selection method for clustering based on ranking. Neurocomputing 214, https://doi.org/10.1016/j.neucom.2016.07.026

    Google Scholar 

  125. Solorio-Fernández S, Martínez-Trinidad JF, Carrasco-Ochoa JA (2017) A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognit 72:314–326. https://doi.org/10.1016/j.patcog.2017.07.020

    Article  Google Scholar 

  126. Swets D, Weng J (1995) Efficient content-based image retrieval using automatic feature selection. Proceedings, international symposium on computer vision, 1995. pp 85–90, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=476982

  127. Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognit 48(9):2798–2811. https://doi.org/10.1016/j.patcog.2015.03.020

    Article  Google Scholar 

  128. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123. https://doi.org/10.1016/j.engappai.2014.03.007

    Article  Google Scholar 

  129. Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036. https://doi.org/10.1016/j.neucom.2015.05.022

    Article  Google Scholar 

  130. Talavera L (2000) Dependency-based feature selection for clustering symbolic data. Intell Data Anal 4:19–28

    MATH  Google Scholar 

  131. Tang J, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26(12):2914–2927

    Google Scholar 

  132. Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Data Classification, CRC Press, pp 37–64. https://doi.org/10.1201/b17320

    Google Scholar 

  133. Tang C, Liu X, Li M, Wang P, Chen J, Wang L, Li W (2018a) Robust unsupervised feature selection via dual self-representation and manifold regularization. Knowl Based Syst 145:109–120. https://doi.org/10.1016/j.knosys.2018.01.009

    Article  Google Scholar 

  134. Tang C, Zhu X, Chen J, Wang P, Liu X, Tian J (2018b) Robust graph regularized unsupervised feature selection. Expert Syst Appl 96:64–76. https://doi.org/10.1016/j.eswa.2017.11.053

    Article  Google Scholar 

  135. Theodoridis S, Koutroumbas K (2008a) Pattern recognition. Elsevier Science. https://books.google.com.mx/books?id=QgD-3Tcj8DkC

  136. Theodoridis S, Koutroumbas K (2008b) Pattern recognition, 4th edn. Academic Press, New York

    Google Scholar 

  137. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288

    MathSciNet  MATH  Google Scholar 

  138. Tou JT, González RC (1974) Pattern recognition principles. Addison-Wesley Pub. Co. https://books.google.com/books?id=VWQoAQAAIAAJ

  139. Varshavsky R, Gottlieb A, Linial M, Horn D (2006) Novel unsupervised feature filtering of biological data. Bioinformatics 22(14):e507–e513. https://doi.org/10.1093/bioinformatics/btl214, http://bioinformatics.oxfordjournals.org/content/22/14/e507.abstract

    Google Scholar 

  140. Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186, https://doi.org/10.1007/s00521-013-1368-0, arXiv:1509.07577

    Google Scholar 

  141. Wang S, Wang H (2017) Unsupervised feature selection via low-rank approximation and structure learning. Knowl Based Syst 124:70–79. https://doi.org/10.1016/j.knosys.2017.03.002

    Article  Google Scholar 

  142. Wang S, Pedrycz W, Zhu Q, Zhu W (2015a) Unsupervised feature selection via maximum projection and minimum redundancy. Knowl Based Syst 75:19–29. https://doi.org/10.1016/j.knosys.2014.11.008

    Article  MATH  Google Scholar 

  143. Wang S, Tang J, Liu H (2015b) Embedded unsupervised feature selection. In: Twenty-ninth AAAI conference on artificial intelligence, p 7

  144. Wang X, Zhang X, Zeng Z, Wu Q, Zhang J (2016) Unsupervised spectral feature selection with l1-norm graph. Neurocomputing 200:47–54. https://doi.org/10.1016/j.neucom.2016.03.017

    Article  Google Scholar 

  145. Webb AR (2003) Statistical pattern recognition, vol 35, 2nd edn. Wliey, New York. https://doi.org/10.1137/1035031

    Google Scholar 

  146. Wu M, Schölkopf B (2007) A local learning approach for clustering. In: Advances in neural information processing systems, pp 1529–1536

  147. Yang Y, Liao Y, Meng G, Lee J (2011a) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38(9):11311–11320. http://dblp.uni-trier.de/db/journals/eswa/eswa38.html#YangLML11

    Google Scholar 

  148. Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011b) L2,1-Norm regularized discriminative feature selection for unsupervised learning. In: IJCAI international joint conference on artificial intelligence, pp 1589–1594. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-267

  149. Yasmin M, Mohsin S, Sharif M (2014) Intelligent image retrieval techniques: a survey. J Appl Res Technology 12(1):87–103

    Google Scholar 

  150. Yen CC, Chen LC, Lin SD (2010) Unsupervised feature selection: minimize information redundancy of features. In: Proceedings—international conference on technologies and applications of artificial intelligence, TAAI 2010. pp 247–254. https://doi.org/10.1109/TAAI.2010.49

  151. Yi Y, Zhou W, Cao Y, Liu Q, Wang J (2016) Unsupervised feature selection with graph regularized nonnegative self-representation. In: You Z, Zhou J, Wang Y, Sun Z, Shan S, Zheng W, Feng J, Zhao Q (eds) Biometric recognition: 11th Chinese conference, CCBR 2016, Chengdu, China, October 14–16, 2016, Proceedings. Springer International Publishing, Cham, pp 591–599. https://doi.org/10.1007/978-3-319-46654-5_65

    Google Scholar 

  152. Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Google Scholar 

  153. Yu J (2011) A hybrid feature selection scheme and self-organizing map model for machine health assessment. Appl Soft Comput 11(5):4041–4054

    Google Scholar 

  154. Zafarani R, Abbasi MA, Liu H (2014) Social media mining: an introduction. Cambridge University Press, Cambridge

    Google Scholar 

  155. Zeng H, Cheung YM (2011) Feature selection and kernel learning for local learning-based clustering. IEEE Trans Pattern Anal Mach Intell 33(8):1532–1547. https://doi.org/10.1109/TPAMI.2010.215

    Article  Google Scholar 

  156. Zhao Z (2010) Spectral feature selection for mining ultrahigh dimensional data. Ph.d thesis, Tempe

  157. Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157

  158. Zhao Z, Liu H (2011) Spectral feature selection for data mining. CRC Press. pp 1–216. https://www.taylorfrancis.com/books/9781439862100

  159. Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632. https://doi.org/10.1109/TKDE.2011.222, http://ieeexplore.ieee.org.proxy.lib.umich.edu/ielx5/69/6419729/06051436.pdf?tp=&arnumber=6051436&isnumber=6419729

    Google Scholar 

  160. Zheng Z, Lei W, Huan L (2010) Efficient spectral feature selection with minimum redundancy. In: Twenty-fourth AAAI conference on artificial intelligence, pp 1–6

  161. Zhou W, Wu C, Yi Y, Luo G (2017) Structure preserving non-negative feature self-representation for unsupervised feature selection. IEEE Access 5:8792–8803. https://doi.org/10.1109/ACCESS.2017.2699741

    Article  Google Scholar 

  162. Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recognit 48(2):438–446

    MATH  Google Scholar 

  163. Zhu P, Hu Q, Zhang C, Zuo W (2016) Coupled dictionary learning for unsupervised feature selection. In: AAAI, pp 2422–2428

  164. Zhu P, Zhu W, Wang W, Zuo W, Hu Q (2017) Non-convex regularized self-representation for unsupervised feature selection. Image Vis Comput 60:22–29. https://doi.org/10.1016/j.imavis.2016.11.014

    Article  Google Scholar 

Download references

Acknowledgements

The first author gratefully acknowledges to the National Council of Science and Technology of Mexico (CONACyT) for his Ph.D. fellowship, through the scholarship 224490.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Saúl Solorio-Fernández.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Solorio-Fernández, S., Carrasco-Ochoa, J.A. & Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif Intell Rev 53, 907–948 (2020). https://doi.org/10.1007/s10462-019-09682-y

Download citation

Keywords

  • Unsupervised learning
  • Dimensionality reduction
  • Unsupervised feature selection
  • Feature selection for clustering