Annals of Operations Research

, Volume 276, Issue 1–2, pp 315–330 | Cite as

Feature importance ranking for classification in mixed online environments

  • Alaleh RazmjooEmail author
  • Petros Xanthopoulos
  • Qipeng Phil Zheng
S.I.: Computational Biomedicine


Online learning is a growing branch of machine learning with applications in many domains. One of the less studied topics in this area is development of strategies for online feature importance ranking. In this paper we present two methods for incremental ranking of features in classification tasks. Our ranking strategies are based on measurement of the sensitivity of the classification outcome with respect to individual features. The two methods work for different types of classification environments with discrete, continuous and mixed feature types with minimum prior assumptions. The second method, which is a modification of the original method, is designed to handle concept drift while avoiding cumbersome computations. Concept drift is described as sudden or slow changes in characteristics of the learning features which happens in many online learning tasks such as online marketing analysis. If the rankings are not adaptable, during the time, these changes will make the rankings obsolete. Moreover, we investigate different feature selection schemes for feature reduction in online environments to effectively remove irrelevant features from the classification model. Finally, we present experimental results which verify the efficacy of our methods against currently available online feature ranking algorithms.


Feature ranking Online learning Stochastic gradient descent Mixed feature space 



Dr. Zheng’s work is in part supported by the AFRL Mathematical Modeling and Optimization Institute. The authors would also like to thank the reviewers and Editors for their constructive comments and recommendations.


  1. Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., & Reyes Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning (pp. 437–442).Google Scholar
  2. Bi, J., Bennett, K., Embrechts, M., Breneman, C., & Song, M. (2003). Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3(Mar), 1229–1243.Google Scholar
  3. Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). Moa: Massive online analysis. Journal of Machine Learning Research, 11(May), 1601–1604.Google Scholar
  4. Bolon-Canedo, V., Fernández-Francos, D., Peteiro-Barral, D., Alonso-Betanzos, A., Guijarro-Berdiñas, B., & Sánchez-Maroño, N. (2016). A unified pipeline for online feature selection and classification. Expert Systems with Applications, 55, 532–545.Google Scholar
  5. Carvalho, V. R. & Cohen, W. W. (2006). Single-pass online learning: Performance, voting schemes and online feature selection. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 548–553). ACM.Google Scholar
  6. Cohen, L., Avrahami-Bakish, G., Last, M., Kandel, A., & Kipersztok, O. (2008). Real-time data mining of non-stationary data streams from sensor networks. Information Fusion, 9(3), 344–353.Google Scholar
  7. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7(Mar), 551–585.Google Scholar
  8. Czitrom, V. (1999). One-factor-at-a-time versus designed experiments. The American Statistician, 53(2), 126–131.Google Scholar
  9. Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1–4), 131–156.Google Scholar
  10. Duda, R. O., Hart, P. E., & Stork, D. G. (1973). Pattern classification. New York: Wiley.Google Scholar
  11. Fan, Y.-J., & Chaovalitwongse, W. A. (2010). Optimizing feature selection to improve medical diagnosis. Annals of Operations Research, 174(1), 169–183.Google Scholar
  12. Finch, T. (2009). Incremental calculation of weighted mean and variance, Vol. 4, pp. 11–15. University of CambridgeGoogle Scholar
  13. Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining data streams: A review. ACM Sigmod Record, 34(2), 18–26.Google Scholar
  14. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.Google Scholar
  15. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422.Google Scholar
  16. Hoffman, J., Rodner, E., Donahue, J., Darrell, T., & Saenko, K. (2013). Efficient learning of domain-invariant image representations. arXiv preprint arXiv:1301.3224.
  17. Katakis, I., Tsoumakas, G., & Vlahavas, I. (2005). On the utility of incremental feature selection for the classification of textual data streams. In P. Bozanis & E. N. Houstis (Eds.), Advances in informatics (pp. 338–348). Berlin: Springer.Google Scholar
  18. Le Thi, H. A., & Nguyen, M. C. (2017). DCA based algorithms for feature selection in multi-class support vector machine. Annals of Operations Research, 249(1–2), 273–300.Google Scholar
  19. Lichman, M. (2013). UCI machine learning repository.Google Scholar
  20. Lin, Y., Guo, H., & Hu, J. (2013). An svm-based approach for stock market trend prediction. In The 2013 international joint conference on neural networks (IJCNN) (pp. 1–7). IEEE.Google Scholar
  21. Liu, H. & Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes. In ICTAI (pp. 388–391).Google Scholar
  22. Liu, Y., Li, H., Peng, G., Lv, B., & Zhang, C. (2015). Online purchaser segmentation and promotion strategy selection: Evidence from chinese e-commerce market. Annals of Operations Research, 233(1), 263–279.Google Scholar
  23. Nair, B. B., Mohandas, V., & Sakthivel, N. (2010). A decision tree-rough set hybrid system for stock market trend prediction. International Journal of Computer Applications, 6(9), 1–6.Google Scholar
  24. Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., & Wan, L. (2012). Heterogeneous ensemble for feature drifts in data streams. In P. N. Tan, S. Chawla, C. K. Ho, & J. Bailey (Eds.), Advances in knowledge discovery and data mining (pp. 1–12). Berlin: Springer.Google Scholar
  25. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
  26. Perkins, S. & Theiler, J. (2003). Online feature selection using grafting. In ICML (pp. 592–599).Google Scholar
  27. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.Google Scholar
  28. Quinlan, J. R. (2014). C4. 5: Programs for machine learning. New York: Elsevier.Google Scholar
  29. Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., & Herrera, F. (2017). A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, 239, 39–57.Google Scholar
  30. Razmjoo, A., Xanthopoulos, P., & Zheng, Q. P. (2017). Online feature importance ranking based on sensitivity analysis. Expert Systems with Applications, 85, 397–406.Google Scholar
  31. Robnik-Šikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53(1–2), 23–69.Google Scholar
  32. Saltelli, A., & Annoni, P. (2010). How to avoid a perfunctory sensitivity analysis. Environmental Modelling & Software, 25(12), 1508–1517.Google Scholar
  33. Sayed-Mouchaweh, M. (2016). Learning from data streams in dynamic environments. Berlin: Springer.Google Scholar
  34. Seref, O., Fan, Y.-J., Borenstein, E., & Chaovalitwongse, W. A. (2018). Information-theoretic feature selection with discrete k-median clustering. Annals of Operations Research, 263(1–2), 93–118.Google Scholar
  35. Shen, K.-Q., Ong, C.-J., Li, X.-P., & Wilder-Smith, E. P. (2008). Feature selection via sensitivity analysis of svm probabilistic outputs. Machine Learning, 70(1), 1–20.Google Scholar
  36. Thomopoulos, N. T. (2012). Essentials of Monte Carlo simulation: Statistical methods for building simulation models. Berlin: Springer.Google Scholar
  37. Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Dublin: Computer Science Department, Trinity College Dublin.Google Scholar
  38. Wang, J., Wang, M., Li, P., Liu, L., Zhao, Z., Hu, X., et al. (2015). Online feature selection with group structure analysis. IEEE Transactions on Knowledge and Data Engineering, 27(11), 3029–3041.Google Scholar
  39. Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698–710.Google Scholar
  40. Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.Google Scholar
  41. Yu, L. & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856–863).Google Scholar
  42. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 423–438). ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Industrial Engineering and Management SystemsUniversity of Central FloridaOrlandoUSA
  2. 2.Department of Decision and Information Sciences, School of Business AdministrationStetson UniversityDeLandUSA

Personalised recommendations