Skip to main content

UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Abstract

Streaming features applications pose challenges for feature selection. For such dynamic features applications: (a) features are sequentially generated and are processed one by one upon their arrival while the number of instances/points remains fixed; and (b) the complete feature space is not known in advance. Existing approaches require class labels as a guide to select the representative features. However, in real-world applications most data are not labeled and, moreover, manual labeling is costly. A new algorithm, called Unsupervised Feature Selection for Streaming Features (UFSSF), is proposed in this paper to select representative features in streaming features applications without the need to know the features or class labels in advance. UFSSF extends the k-mean clustering algorithm to include linearly dependent similarity measures so as to incrementally decide whether to add the newly arrived feature to the existing set of representative features. Those features that are not representative are discarded. Experimental results indicates that UFSSF significantly has a better prediction accuracy and running time compared to the baseline approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/Waveform+Database+Generator+(Version+2).

  2. 2.

    https://archive.ics.uci.edu/ml/datasets/Ionosphere.

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Almalawi, A.: Designing unsupervised intrusion detection for SCADA systems (2014)

    Google Scholar 

  3. Chen, H.-L., Yang, B., Liu, J., Liu, D.-Y.: A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst. Appl. 38(7), 9014–9022 (2011)

    Article  Google Scholar 

  4. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  5. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  Google Scholar 

  6. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. arXiv preprint arXiv:1601.07996 (2016)

  7. Li, J., Hu, X., Tang, J., Liu, H.: Unsupervised streaming feature selection in social media. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1041–1050. ACM (2015)

    Google Scholar 

  8. Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)

    Article  Google Scholar 

  9. Onwuegbuzie, A.J., Daniel, L., Leech, N.L.: Pearson product-moment correlation coefficient. In: Encyclopedia of Measurement and Statistics, pp. 751–756 (2007)

    Google Scholar 

  10. Perkins, S., Lacker, K., Theiler, J.: Grafting: fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)

    Google Scholar 

  12. Radhakrishna Rao, C.: Linear Statistical Inference and Its Applications, vol. 22. Wiley, New York (2009)

    Google Scholar 

  13. Xindong, W., Kui, Y., Ding, W., Wang, H., Zhu, X.: Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1178–1192 (2013)

    Article  Google Scholar 

  14. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)

    Google Scholar 

  15. Zhou, J., Foster, D., Stine, R., Ungar, L.: Streaming feature selection using alpha-investing. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 384–393. ACM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Naif Almusallam , Zahir Tari , Jeffrey Chan or Adil AlHarthi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almusallam, N., Tari, Z., Chan, J., AlHarthi, A. (2018). UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics