Abstract
Online learning is a growing branch of machine learning with applications in many domains. One of the less studied topics in this area is development of strategies for online feature importance ranking. In this paper we present two methods for incremental ranking of features in classification tasks. Our ranking strategies are based on measurement of the sensitivity of the classification outcome with respect to individual features. The two methods work for different types of classification environments with discrete, continuous and mixed feature types with minimum prior assumptions. The second method, which is a modification of the original method, is designed to handle concept drift while avoiding cumbersome computations. Concept drift is described as sudden or slow changes in characteristics of the learning features which happens in many online learning tasks such as online marketing analysis. If the rankings are not adaptable, during the time, these changes will make the rankings obsolete. Moreover, we investigate different feature selection schemes for feature reduction in online environments to effectively remove irrelevant features from the classification model. Finally, we present experimental results which verify the efficacy of our methods against currently available online feature ranking algorithms.
Similar content being viewed by others
References
Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., & Reyes Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning (pp. 437–442).
Bi, J., Bennett, K., Embrechts, M., Breneman, C., & Song, M. (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3(Mar), 1229–1243.
Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). Moa: Massive online analysis. Journal of Machine Learning Research, 11(May), 1601–1604.
Bolon-Canedo, V., Fernández-Francos, D., Peteiro-Barral, D., Alonso-Betanzos, A., Guijarro-Berdiñas, B., & Sánchez-Maroño, N. (2016). A unified pipeline for online feature selection and classification. Expert Systems with Applications, 55, 532–545.
Carvalho, V. R. & Cohen, W. W. (2006). Single-pass online learning: Performance, voting schemes and online feature selection. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 548–553). ACM.
Cohen, L., Avrahami-Bakish, G., Last, M., Kandel, A., & Kipersztok, O. (2008). Real-time data mining of non-stationary data streams from sensor networks. Information Fusion, 9(3), 344–353.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7(Mar), 551–585.
Czitrom, V. (1999). One-factor-at-a-time versus designed experiments. The American Statistician, 53(2), 126–131.
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1–4), 131–156.
Duda, R. O., Hart, P. E., & Stork, D. G. (1973). Pattern classification. New York: Wiley.
Fan, Y.-J., & Chaovalitwongse, W. A. (2010). Optimizing feature selection to improve medical diagnosis. Annals of Operations Research, 174(1), 169–183.
Finch, T. (2009). Incremental calculation of weighted mean and variance, Vol. 4, pp. 11–15. University of Cambridge
Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining data streams: A review. ACM Sigmod Record, 34(2), 18–26.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.
Hoffman, J., Rodner, E., Donahue, J., Darrell, T., & Saenko, K. (2013). Efficient learning of domain-invariant image representations. arXiv preprint arXiv:1301.3224.
Katakis, I., Tsoumakas, G., & Vlahavas, I. (2005). On the utility of incremental feature selection for the classification of textual data streams. In P. Bozanis & E. N. Houstis (Eds.), Advances in informatics (pp. 338–348). Berlin: Springer.
Le Thi, H. A., & Nguyen, M. C. (2017). DCA based algorithms for feature selection in multi-class support vector machine. Annals of Operations Research, 249(1–2), 273–300.
Lichman, M. (2013). UCI machine learning repository.
Lin, Y., Guo, H., & Hu, J. (2013). An svm-based approach for stock market trend prediction. In The 2013 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.
Liu, H. & Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes. In ICTAI (pp. 388–391).
Liu, Y., Li, H., Peng, G., Lv, B., & Zhang, C. (2015). Online purchaser segmentation and promotion strategy selection: Evidence from chinese e-commerce market. Annals of Operations Research, 233(1), 263–279.
Nair, B. B., Mohandas, V., & Sakthivel, N. (2010). A decision tree-rough set hybrid system for stock market trend prediction. International Journal of Computer Applications, 6(9), 1–6.
Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., & Wan, L. (2012). Heterogeneous ensemble for feature drifts in data streams. In P. N. Tan, S. Chawla, C. K. Ho, & J. Bailey (Eds.), Advances in knowledge discovery and data mining (pp. 1–12). Berlin: Springer.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Perkins, S. & Theiler, J. (2003). Online feature selection using grafting. In ICML (pp. 592–599).
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J. R. (2014). C4. 5: Programs for machine learning. New York: Elsevier.
Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., & Herrera, F. (2017). A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, 239, 39–57.
Razmjoo, A., Xanthopoulos, P., & Zheng, Q. P. (2017). Online feature importance ranking based on sensitivity analysis. Expert Systems with Applications, 85, 397–406.
Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53(1–2), 23–69.
Saltelli, A., & Annoni, P. (2010). How to avoid a perfunctory sensitivity analysis. Environmental Modelling & Software, 25(12), 1508–1517.
Sayed-Mouchaweh, M. (2016). Learning from data streams in dynamic environments. Berlin: Springer.
Seref, O., Fan, Y.-J., Borenstein, E., & Chaovalitwongse, W. A. (2018). Information-theoretic feature selection with discrete k-median clustering. Annals of Operations Research, 263(1–2), 93–118.
Shen, K.-Q., Ong, C.-J., Li, X.-P., & Wilder-Smith, E. P. (2008). Feature selection via sensitivity analysis of svm probabilistic outputs. Machine Learning, 70(1), 1–20.
Thomopoulos, N. T. (2012). Essentials of Monte Carlo simulation: Statistical methods for building simulation models. Berlin: Springer.
Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Dublin: Computer Science Department, Trinity College Dublin.
Wang, J., Wang, M., Li, P., Liu, L., Zhao, Z., Hu, X., et al. (2015). Online feature selection with group structure analysis. IEEE Transactions on Knowledge and Data Engineering, 27(11), 3029–3041.
Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698–710.
Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.
Yu, L. & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856–863).
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 423–438). ACM.
Acknowledgements
Dr. Zheng’s work is in part supported by the AFRL Mathematical Modeling and Optimization Institute. The authors would also like to thank the reviewers and Editors for their constructive comments and recommendations.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Razmjoo, A., Xanthopoulos, P. & Zheng, Q.P. Feature importance ranking for classification in mixed online environments. Ann Oper Res 276, 315–330 (2019). https://doi.org/10.1007/s10479-018-2972-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-018-2972-2