Abstract
Unsupervised Feature Selection methods have raised considerable interest in the scientific community due to their capability of identifying and selecting relevant features in unlabeled data. In this paper, we evaluate and compare seven of the most widely used and outstanding ranking based unsupervised feature selection methods of the state-of-the-art, which belong to the filter approach. Our study was made on 25 high dimensional real-world datasets taken from the ASU Feature Selection Repository. From our experiments, we conclude which methods perform significantly better in terms of quality of selection and runtime.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Alelyani, S., Tang, J., Liu, H.: Feature selection for clustering: a review. Data Clust.: Algorithms Appl. 29, 110–121 (2013)
Dash, M., Liu, H., Yao, J.: Dimensionality reduction of unsupervised data. In: Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 532–539. IEEE Computer Society (1997)
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 24(3), 301–312 (2002)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18, vol. 186, pp. 507–514 (2005)
Varshavsky, R., Gottlieb, A., Linial, M., Horn, D.: Novel unsupervised feature filtering of biological data. Bioinformatics 22(14), e507–e513 (2006)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 1151–1157. ACM (2007)
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised learning. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1589–1594 (2011)
Tabakhi, S., Moradi, P., Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)
Solorio-Fernández, S., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recogn. 72, 314–326 (2017)
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intell. Data Anal. 6(6), 531–556 (2002)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1154–1166 (2004)
Breaban, M., Luchian, H.: A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn. 44(4), 854–865 (2011)
Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int. J. Hybrid Intell. Syst. 11(1), 41–54 (2014)
Dash, M., Liu, H.: Feature selection for clustering. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 110–121. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45571-X_13
Hruschka, E.R., Hruschka, E.R., Covoes, T.F., Ebecken, N.F.F.: Feature selection for clustering problems: a hybrid algorithm that iterates between k-means and a Bayesian filter. In: 2005 Fifth International Conference on Hybrid Intelligent Systems. HIS 2005. IEEE (2005)
Li, Y., Lu, B.L., Wu, Z.F.: A hybrid method of unsupervised feature selection based on ranking. In: 18th International Conference on Pattern Recognition. ICPR 2006, Hong Kong, China, pp. 687–690 (2006)
Solorio-Fernández, S., Carrasco-Ochoa, J., Martínez-Trinidad, J.: A new hybrid filter-wrapper feature selection method for clustering based on ranking. Neurocomputing 214, 866–880 (2016)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Elsevier Science, Amsterdam (2008)
Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. IEEE NLP-KE 2005, pp. 597–601. IEEE (2005)
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems, pp. 153–160 (2004)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional Inc., San Diego (1990)
Alter, O., Alter, O.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. U.S.A. 97(18), 10101–10106 (2000)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2017)
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., Liu, H.: Advancing feature selection research. ASU Feature Selection Repository (2010)
Fix, E., Hodges Jr., J.L.: Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California University, Berkeley (1951)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Maron, M.E.: Automatic indexing: an experimental inquiry. J. ACM 8(3), 404–417 (1961)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
Buhmann, M.D.: Radial Basis Functions: Theory and Implementations. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2003)
Acknowledgements
The first author gratefully acknowledges to the National Council of Science and Technology of Mexico (CONACyT) for his Ph.D. fellowship, through the scholarship 428478.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2018). Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets. In: Batyrshin, I., Martínez-Villaseñor, M., Ponce Espinosa, H. (eds) Advances in Soft Computing. MICAI 2018. Lecture Notes in Computer Science(), vol 11288. Springer, Cham. https://doi.org/10.1007/978-3-030-04491-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-04491-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04490-9
Online ISBN: 978-3-030-04491-6
eBook Packages: Computer ScienceComputer Science (R0)