Online Feature Selection by Adaptive Sub-gradient Methods

  • Tingting Zhai
  • Hao Wang
  • Frédéric Koriche
  • Yang GaoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)


The overall goal of online feature selection is to iteratively select, from high-dimensional streaming data, a small, “budgeted” number of features for constructing accurate predictors. In this paper, we address the online feature selection problem using novel truncation techniques for two online sub-gradient methods: Adaptive Regularized Dual Averaging (ARDA) and Adaptive Mirror Descent (AMD). The corresponding truncation-based algorithms are called B-ARDA and B-AMD, respectively. The key aspect of our truncation techniques is to take into account the magnitude of feature values in the current predictor, together with their frequency in the history of predictions. A detailed regret analysis for both algorithms is provided. Experiments on six high-dimensional datasets indicate that both B-ARDA and B-AMD outperform two advanced online feature selection algorithms, OFS and SOFS, especially when the number of selected features is small. Compared to sparse online learning algorithms that use \(\ell _1\) regularization, B-ARDA is superior to \(\ell _1\)-ARDA, and B-AMD is superior to Ada-Fobos. Code related to this paper is available at:


Online feature selection Adaptive sub-gradient methods High-dimensional streaming data 



The authors would like to acknowledge support for this project from the National Key R&D Program of China (2017YFB0702600, 2017YFB0702601), the National Natural Science Foundation of China (Nos. 61432008, 61503178) and the Natural Science Foundation of Jiangsu Province of China (BK20150587).


  1. 1.
    Brown, G., Pocock, A.C., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Condat, L.: Fast projection onto the simplex and the \(\ell _1\) ball. Math. Program. 158(1–2), 575–585 (2016)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Duchi, J.C., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Duchi, J.C., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In: Proceedings of ICML, pp. 272–279 (2008)Google Scholar
  6. 6.
    Duchi, J.C., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of COLT, pp. 14–26 (2010)Google Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  8. 8.
    Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777–801 (2009)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Rao, N.S., Nowak, R.D., Cox, C.R., Rogers, T.T.: Classification with the sparse group lasso. IEEE Trans. Signal Process. 64(2), 448–463 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. Optim. 20(6), 2807–2832 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Song, L., Smola, A.J., Gretton, A., Bedo, J., Borgwardt, K.M.: Feature selection via dependence maximization. J. Mach. Learn. Res. 13, 1393–1434 (2012)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Tan, M., Tsang, I.W., Wang, L.: Towards ultrahigh dimensional feature selection for big data. J. Mach. Learn. Res. 15(1), 1371–1429 (2014)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of ICML, pp. 1047–1054 (2010)Google Scholar
  15. 15.
    Wang, D., Wu, P., Zhao, P., Wu, Y., Miao, C., Hoi, S.C.H.: High-dimensional data stream classification via sparse online learning. In: Proceedings of ICDM, pp. 1007–1012 (2014)Google Scholar
  16. 16.
    Wang, J., Zhao, P., Hoi, S.C., Jin, R.: Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26(3), 698–710 (2014)CrossRefGoogle Scholar
  17. 17.
    Wang, J., et al.: Online feature selection with group structure analysis. IEEE Trans. Knowl. Data Eng. 27(11), 3029–3041 (2015)CrossRefGoogle Scholar
  18. 18.
    Woznica, A., Nguyen, P., Kalousis, A.: Model mining for robust feature selection. In: Proceedings of SIGKDD, pp. 913–921 (2012)Google Scholar
  19. 19.
    Wu, X., Yu, K., Ding, W., Wang, H., Zhu, X.: Online feature selection with streaming features. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1178–1192 (2013)CrossRefGoogle Scholar
  20. 20.
    Wu, Y., Hoi, S.C.H., Mei, T., Yu, N.: Large-scale online feature selection for ultra-high dimensional sparse data. ACM Trans. Knowl. Discov. Data 11(4), 48:1–48:22 (2017)CrossRefGoogle Scholar
  21. 21.
    Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Yu, K., Wu, X., Ding, W., Pei, J.: Scalable and accurate online feature selection for big data. ACM Trans. Knowl. Discov. Data 11(2), 16:1–16:39 (2016)CrossRefGoogle Scholar
  23. 23.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Center of Research in Information in LensUniversité d’ArtoisLensFrance

Personalised recommendations