Abstract
Feature selection is an important topic in data mining and machine learning, and has been extensively studied in many literature. In real-world applications, the dimensionality is extremely high, in millions, and keeps growing. Unlike traditional batch learning methods, online learning is more efficient for real-world applications. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with the other kind of online learning methods that only deal with sequentially added instances. The key challenge for current online streaming feature selection is the large feature space, possibly of unknown or infinite size. To select a small number of features in an online manner more effectively, we propose a novel algorithm using sampling techniques and correlations between features. We evaluate the performance of the proposed algorithms for online streaming feature selection on several public datasets, and demonstrate their applications to real-world problems as image classification in computer vision. From Experiments, we can see that our algorithm consistently surpassed the baseline algorithms for all the situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp. 856–863 (2003)
Zheng, H.-T., Zhang, H.: Online feature selection based on passive-aggressive algorithm with retaining features. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 707–719. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25255-1_58
Glocer, K., Eads, D., Theiler, J.: Online feature selection for pixel classification. In: ICML, pp. 249–256 (2005)
Yu, K., Wu, X., Ding, W., Pei, J.: Towards scalable and accurate online feature selection for big data. In: IEEE ICDM 2014, pp. 660–669. IEEE (2014)
Wang, J., Zhao, P., Hoi, S.C.H., Jin, R.: Online feature selection and its applications. In: TKDE (2013), pp. 1–14 (2013)
Wu, X., Yu, K., Wang, H., Ding, W.: Online streaming feature selection. In: ICML, pp. 1159–1166 (2010)
Zhou, J., Foster, D.P., Stine, R., Ungar, L.H.: Streaming feature selection using Alpha-investing. KDD 2005, pp. 384–393 (2005)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Dash, M., Gopalkrishnan, V.: Distance based feature selection for clustering microarray data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 512–519. Springer, Heidelberg (2008)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML, pp. 1151–1157 (2007)
Ren, J., Qiu, Z., Fan, W., Cheng, H., Yu, P.S.: Forward semi-supervised feature selection. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 970–976. Springer, Heidelberg (2008)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp. 856–863 (2003)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Xu, Z., Jin, R., Ye, J., Lyu, M.R., King, I.: Non-monotonic feature selection. In: ICML, p. 144 (2009)
Hoi, S.C.H., Wang, J., Zhao, P., Jin, R.: Online feature selection for mining big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 93–100. ACM (2012)
Perkins, S., Theiler, J.: Online feature selection using grafting. In: ICML 2003, pp. 592–599 (2003)
Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML 1995, pp. 284–292 (1995)
Yu, K., Wu, X., Ding, W., Pei, J.: Scalable and accurate online feature selection for big data. arXiv:1511.09263v1 [cs.LG], (2015)
Zhou, J., Foster, D.P., Stine, R.A., Ungar, L.H.: Streamwise feature selection. J. Mach. Learn. Res. 7, 1861–1885 (2006)
Lei, Yu., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Acknowledgement
This research is supported by National Natural Science Foundation of China (Grant No. 61375054 and 61402045), Natural Science Foundation of Guangdong Province (Grant No. 2014A030313745), Tsinghua University Initiative Scientific Research Program (Grant No. 20131089256), and Cross fund of Graduate School at Shenzhen, Tsinghua University (Grant No. JC20140001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zheng, HT., Zhang, H. (2016). Online Streaming Feature Selection Using Sampling Technique and Correlations Between Features. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45817-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)