Does Feature Selection Improve Classification? A Large Scale Experiment in OpenML

  • Martijn J. PostEmail author
  • Peter van der Putten
  • Jan N. van Rijn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)


It is often claimed that data pre-processing is an important factor contributing towards the performance of classification algorithms. In this paper we investigate feature selection, a common data pre-processing technique. We conduct a large scale experiment and present results on what algorithms and data sets benefit from this technique. Using meta-learning we can find out for which combinations this is the case. To complement a large set of meta-features, we introduce the Feature Selection Landmarkers, which prove useful for this task. All our experimental results are made publicly available on OpenML.


Feature selection Meta-learning Open science 


  1. 1.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Brazdil, P., Gama, J., Henery, B.: Characterizing the applicability of classification algorithms using meta-level learning. In: Bergadano, F., Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994). doi: 10.1007/3-540-57868-4_52 CrossRefGoogle Scholar
  3. 3.
    Carpenter, J.: May the best analyst win. Science 331(6018), 698–699 (2011)CrossRefGoogle Scholar
  4. 4.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  5. 5.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRefGoogle Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)Google Scholar
  9. 9.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  10. 10.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  11. 11.
    Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002). doi: 10.1007/3-540-36182-0_14 CrossRefGoogle Scholar
  12. 12.
    Pfahringer, B., Bensusan, H., Giraud-Carrier, C.: Tell me who can learn you and I can tell you who you are: landmarking various learning algorithms. In: Proceedings of the 17th International Conference on Machine Learning, pp. 743–750 (2000)Google Scholar
  13. 13.
    Pinto, F., Soares, C., Mendes-Moreira, J.: Towards automatic generation of metafeatures. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 215–226. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-31753-3_18 CrossRefGoogle Scholar
  14. 14.
    van der Putten, P., van Someren, M.: A bias-variance analysis of a real world learning problem: the coil challenge 2000. Mach. Learn. 57(1), 177–195 (2004)CrossRefzbMATHGoogle Scholar
  15. 15.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  16. 16.
    Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. JMLR 11, 2487–2531 (2010)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)CrossRefGoogle Scholar
  18. 18.
    van Rijn, J.N., Abdulrahman, S.M., Brazdil, P., Vanschoren, J.: Fast algorithm selection using learning curves. In: Fromont, E., Bie, T., Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 298–309. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24465-5_26 CrossRefGoogle Scholar
  19. 19.
    Tsamardinos, I., Aliferis, C.: Towards principled feature selection: relevancy, filters and wrappers. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003)Google Scholar
  20. 20.
    Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)CrossRefGoogle Scholar
  21. 21.
    Verikas, A., Bacauskiene, M.: Feature selection with neural networks. Pattern Recogn. Lett. 23(11), 1323–1335 (2002)CrossRefzbMATHGoogle Scholar
  22. 22.
    Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. IJCSA 1(1), 31–45 (2004)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Martijn J. Post
    • 1
    Email author
  • Peter van der Putten
    • 1
  • Jan N. van Rijn
    • 1
  1. 1.Leiden UniversityLeidenThe Netherlands

Personalised recommendations