Abstract
Without class label, unsupervised feature selection methods choose a subset of features that faithfully maintain the intrinsic structure of original data. Conventional methods assume that the exact value of pairwise samples distance used in structure regularization is effective. However, this assumption imposes strict restrictions to feature selection, and it causes more features to be kept for data representation. Motivated by this, we propose Unsupervised Feature Selection via Local Total-order Preservation, called UFSLTP. In particular, we characterize a local structure by a novel total-order relation, which applies the comparison of pairwise samples distance. To achieve a desirable features subset, we map total-order relation into probability space and attempt to preserve the relation by minimizing the differences of the probability distributions calculated before and after feature selection. Due to the inherent nature of machine learning and total-order relation, less features are needed to represent data without adverse effecting on performance. Moreover, we propose two efficient methods, namely Adaptive Neighbors Selection(ANS) and Uniform Neighbors Serialization(UNS), to reduce the computational complexity and improve the method performance. The results of experiments on benchmark datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods. Compared to the competitors by clustering performance, it averagely achieves \(31.01\%\) improvement in terms of NMI and \(14.44\%\) in terms of Silhouette Coefficient.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In the following, we term pairwise samples similarity as global structure in order to keep it consistent with local manifold structure.
- 2.
- 3.
References
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010). https://doi.org/10.1002/wics.101
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995). https://doi.org/10.2172/204262
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010). https://doi.org/10.1145/1835804.1835848
Chen, C.C., Juan, H.H., Tsai, M.Y., Lu, H.H.S.: Unsupervised learning and pattern recognition of biological data structures with density functional theory and machine learning. Sci. Rep. 8(1), 557 (2018). https://doi.org/10.1038/s41598-017-18931-5
Du, L., Shen, Y.D.: Unsupervised feature selection with adaptive structure learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 209–218. ACM (2015). https://doi.org/10.1145/2783258.2783345
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006)
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems, pp. 153–160 (2004). https://doi.org/10.1016/j.patcog.2011.05.014
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018). https://doi.org/10.1145/3136625
Li, J., Wu, L., Dani, H., Liu, H.: Unsupervised personalized feature selection. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989). https://doi.org/10.1007/bf01589116
Liu, X., Wang, L., Zhang, J., Yin, J., Liu, H.: Global and local structure preservation for feature selection. IEEE Trans. Neural Netw. Learn. Syst. 25(6), 1083–1095 (2013). https://doi.org/10.1109/tnnls.2013.2287275
Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A.G., Zheng, Q.: Adaptive unsupervised feature selection with structure regularization. IEEE Trans. Neural Netw. Learn. Syst. 29(4), 944–956 (2017). https://doi.org/10.1109/tnnls.2017.2650978
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004). https://doi.org/10.1145/1007730.1007731
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). https://doi.org/10.1524/auto.2011.0951
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). https://doi.org/10.1126/science.290.5500.2323
Sammut, C., Webb, G.I.: Encyclopedia of Machine Learning. Springer, Heidelberg (2011). https://doi.org/10.1007/978-0-387-30164-8
Shi, L., Du, L., Shen, Y.D.: Robust spectral learning for unsupervised feature selection. In: 2014 IEEE International Conference on Data Mining, pp. 977–982. IEEE (2014). https://doi.org/10.1109/icdm.2014.58
Solorio-Fernández, S., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recogn. 72, 314–326 (2017). https://doi.org/10.1016/j.patcog.2017.07.020
Wang, D., Nie, F., Huang, H.: Unsupervised feature selection via unified trace ratio formulation and K-means clustering (TRACK). In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 306–321. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44845-8_20
Wang, H., Shi, P., Zhang, Y.: Jointcloud: a cross-cloud cooperation architecture for integrated internet service customization. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1846–1855. IEEE (2017). https://doi.org/10.1109/icdcs.2017.237
Wang, Y., Li, S.: Research and performance evaluation of data replication technology in distributed storage systems. Comput. Math. Appl. 51(11), 1625–1632 (2006). https://doi.org/10.1016/j.camwa.2006.05.002
Wang, Y., Li, X., Li, X., Wang, Y.: A survey of queries over uncertain data. Knowl. Inf. Syst. 37(3), 485–530 (2013). https://doi.org/10.1007/s10115-013-0638-6
Wang, Y., Ma, X.: A general scalable and elastic content-based publish/subscribe service. IEEE Trans. Parallel Distrib. Syst. 26(8), 2100–2113 (2014). https://doi.org/10.1109/tpds.2014.2346759
Wang, Y., Pei, X., Ma, X., Xu, F.: Ta-update: an adaptive update scheme with tree-structured transmission in erasure-coded storage systems. IEEE Trans. Parallel Distrib. Syst. 29(8), 1893–1906 (2017). https://doi.org/10.1109/tpds.2017.2717981
Wei, X., Philip, S.Y.: Unsupervised feature selection by preserving stochastic neighbors. In: Artificial Intelligence and Statistics, pp. 995–1003 (2016). https://doi.org/10.1145/2694859.2694864
Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-267
Zhang, T., Yang, J., Zhao, D., Ge, X.: Linear local tangent space alignment and application to face recognition. Neurocomputing 70(7–9), 1547–1553 (2007). https://doi.org/10.1016/j.neucom.2006.11.007
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 1151–1157. ACM (2007). https://doi.org/10.1145/1273496.1273641
Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2011). https://doi.org/10.1109/tkde.2011.222
Acknowledgment
This work is supported by the National Key Research and Development Program of China(2016YFB1000101), the National Natural Science Foundation of China(Grant No.61379052), the Science Foundation of Ministry of Education of China(Grant No.2018A02002), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province(Grant No.14JJ1026).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, R., Wang, Y., Cheng, L. (2019). Unsupervised Feature Selection via Local Total-Order Preservation. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)