Online Feature Selection by Adaptive Sub-gradient Methods
The overall goal of online feature selection is to iteratively select, from high-dimensional streaming data, a small, “budgeted” number of features for constructing accurate predictors. In this paper, we address the online feature selection problem using novel truncation techniques for two online sub-gradient methods: Adaptive Regularized Dual Averaging (ARDA) and Adaptive Mirror Descent (AMD). The corresponding truncation-based algorithms are called B-ARDA and B-AMD, respectively. The key aspect of our truncation techniques is to take into account the magnitude of feature values in the current predictor, together with their frequency in the history of predictions. A detailed regret analysis for both algorithms is provided. Experiments on six high-dimensional datasets indicate that both B-ARDA and B-AMD outperform two advanced online feature selection algorithms, OFS and SOFS, especially when the number of selected features is small. Compared to sparse online learning algorithms that use \(\ell _1\) regularization, B-ARDA is superior to \(\ell _1\)-ARDA, and B-AMD is superior to Ada-Fobos. Code related to this paper is available at: https://github.com/LUCKY-ting/online-feature-selection.
KeywordsOnline feature selection Adaptive sub-gradient methods High-dimensional streaming data
The authors would like to acknowledge support for this project from the National Key R&D Program of China (2017YFB0702600, 2017YFB0702601), the National Natural Science Foundation of China (Nos. 61432008, 61503178) and the Natural Science Foundation of Jiangsu Province of China (BK20150587).
- 5.Duchi, J.C., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In: Proceedings of ICML, pp. 272–279 (2008)Google Scholar
- 6.Duchi, J.C., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of COLT, pp. 14–26 (2010)Google Scholar
- 14.Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of ICML, pp. 1047–1054 (2010)Google Scholar
- 15.Wang, D., Wu, P., Zhao, P., Wu, Y., Miao, C., Hoi, S.C.H.: High-dimensional data stream classification via sparse online learning. In: Proceedings of ICDM, pp. 1007–1012 (2014)Google Scholar
- 18.Woznica, A., Nguyen, P., Kalousis, A.: Model mining for robust feature selection. In: Proceedings of SIGKDD, pp. 913–921 (2012)Google Scholar