Skip to main content

Online Streaming Feature Selection Using Sampling Technique and Correlations Between Features

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9932))

Included in the following conference series:

Abstract

Feature selection is an important topic in data mining and machine learning, and has been extensively studied in many literature. In real-world applications, the dimensionality is extremely high, in millions, and keeps growing. Unlike traditional batch learning methods, online learning is more efficient for real-world applications. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with the other kind of online learning methods that only deal with sequentially added instances. The key challenge for current online streaming feature selection is the large feature space, possibly of unknown or infinite size. To select a small number of features in an online manner more effectively, we propose a novel algorithm using sampling techniques and correlations between features. We evaluate the performance of the proposed algorithms for online streaming feature selection on several public datasets, and demonstrate their applications to real-world problems as image classification in computer vision. From Experiments, we can see that our algorithm consistently surpassed the baseline algorithms for all the situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.

  2. 2.

    http://www.ics.uci.edu/~mlearn/MLRepository.html.

  3. 3.

    http://www.cs.toronto.edu/kriz/cifar.html.

References

  1. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)

    Article  Google Scholar 

  2. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  3. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp. 856–863 (2003)

    Google Scholar 

  4. Zheng, H.-T., Zhang, H.: Online feature selection based on passive-aggressive algorithm with retaining features. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds.) APWeb 2015. LNCS, vol. 9313, pp. 707–719. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25255-1_58

    Chapter  Google Scholar 

  5. Glocer, K., Eads, D., Theiler, J.: Online feature selection for pixel classification. In: ICML, pp. 249–256 (2005)

    Google Scholar 

  6. Yu, K., Wu, X., Ding, W., Pei, J.: Towards scalable and accurate online feature selection for big data. In: IEEE ICDM 2014, pp. 660–669. IEEE (2014)

    Google Scholar 

  7. Wang, J., Zhao, P., Hoi, S.C.H., Jin, R.: Online feature selection and its applications. In: TKDE (2013), pp. 1–14 (2013)

    Google Scholar 

  8. Wu, X., Yu, K., Wang, H., Ding, W.: Online streaming feature selection. In: ICML, pp. 1159–1166 (2010)

    Google Scholar 

  9. Zhou, J., Foster, D.P., Stine, R., Ungar, L.H.: Streaming feature selection using Alpha-investing. KDD 2005, pp. 384–393 (2005)

    Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  11. Dash, M., Gopalkrishnan, V.: Distance based feature selection for clustering microarray data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 512–519. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: ICML, pp. 1151–1157 (2007)

    Google Scholar 

  13. Ren, J., Qiu, Z., Fan, W., Cheng, H., Yu, P.S.: Forward semi-supervised feature selection. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 970–976. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp. 856–863 (2003)

    Google Scholar 

  15. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  MATH  Google Scholar 

  16. Xu, Z., Jin, R., Ye, J., Lyu, M.R., King, I.: Non-monotonic feature selection. In: ICML, p. 144 (2009)

    Google Scholar 

  17. Hoi, S.C.H., Wang, J., Zhao, P., Jin, R.: Online feature selection for mining big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 93–100. ACM (2012)

    Google Scholar 

  18. Perkins, S., Theiler, J.: Online feature selection using grafting. In: ICML 2003, pp. 592–599 (2003)

    Google Scholar 

  19. Koller, D., Sahami, M.: Toward optimal feature selection. In: ICML 1995, pp. 284–292 (1995)

    Google Scholar 

  20. Yu, K., Wu, X., Ding, W., Pei, J.: Scalable and accurate online feature selection for big data. arXiv:1511.09263v1 [cs.LG], (2015)

  21. Zhou, J., Foster, D.P., Stine, R.A., Ungar, L.H.: Streamwise feature selection. J. Mach. Learn. Res. 7, 1861–1885 (2006)

    MathSciNet  MATH  Google Scholar 

  22. Lei, Yu., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

This research is supported by National Natural Science Foundation of China (Grant No. 61375054 and 61402045), Natural Science Foundation of Guangdong Province (Grant No. 2014A030313745), Tsinghua University Initiative Scientific Research Program (Grant No. 20131089256), and Cross fund of Graduate School at Shenzhen, Tsinghua University (Grant No. JC20140001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Tao Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zheng, HT., Zhang, H. (2016). Online Streaming Feature Selection Using Sampling Technique and Correlations Between Features. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45817-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45816-8

  • Online ISBN: 978-3-319-45817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics