Advertisement

A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper

  • Hugo DoradoEmail author
  • Carlos Cobos
  • Jose Torres-Jimenez
  • Daniel Jimenez
  • Martha Mendoza
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11308)

Abstract

The methods for variable importance measures and feature selection in the task of classification/regression in data mining and Big Data enable the removal of noise caused by irrelevant or redundant variables, the reduction of computational cost in the construction of models and facilitate the understanding of these models. This paper presents a proposal to measure the importance of the input variables in a classification/regression problem, taking as input the solutions evaluated by a wrapper and the performance information (quality of classification expressed for example in accuracy, precision, recall, F measure, among others) associated with each of these solutions. The proposed method quantifies the effect on the classification/regression performance produced by the presence or absence of each input variable in the subsets evaluated by the wrapper. This measure has the advantage of being specific for each classifier, which makes it possible to differentiate the effects each input variable can generate depending on the model built. The proposed method was evaluated using the results of three wrappers - one based on genetic algorithms (GA), another on particle swarm optimization (PSO), and a new proposal based on covering arrays (CA) - and compared with two filters and the variable importance in Random Forest. The experiments were performed on three classifiers (Naive Bayes, Random Forest and Multi-Layer Perception) and seven data sets from the UCI repository. The comparisons were made using Friedman’s Aligned Ranks test and the results indicate that the proposed measure stands out for maintaining in the first input variables a higher quality in the classification, approximating better to the variables found by the feature selection methods.

Keywords

Classification Variable importance Filters Wrappers Genetic algorithms Particle swarm optimization Covering arrays 

References

  1. 1.
    Cui, L., Lu, Z., Wang, P., Wang, W.: The ordering importance measure of random variable and its estimation. Math. Comput. Simul. 105, 132–143 (2014)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Li, L., Lu, Z.: Importance analysis for models with correlated variables and its sparse grid solution. Reliab. Eng. Syst. Saf. 119, 207–217 (2013)CrossRefGoogle Scholar
  3. 3.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)CrossRefGoogle Scholar
  4. 4.
    Aggarwal, C.C.: Feature selection for classification: a review. In: Tang, J., Alelyani, S., Liu, H. (eds.) Data Classification: Algorithms and Applications, 1st edn., pp. 37-64. Chapman & Hall/CRC (2014)Google Scholar
  5. 5.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefGoogle Scholar
  6. 6.
    Wei, P., Lu, Z., Song, J.: Variable importance analysis: a comprehensive review. Reliab. Eng. Syst. Saf. 142, 399–432 (2015)CrossRefGoogle Scholar
  7. 7.
    Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010)CrossRefGoogle Scholar
  8. 8.
    Kotsiantis, S.B.: Feature selection for machine learning classification problems : a recent overview. (2011)Google Scholar
  9. 9.
    Jovi, A., Brki, K., Bogunovi, N.: A review of feature selection methods with applications, pp. 25–29 (2015)Google Scholar
  10. 10.
    Abd-Alsabour, N.: A review on evolutionary feature selection. In: Proceedings - UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation, EMS 2014, pp. 20–26 (2014)Google Scholar
  11. 11.
    Khan, G.M.: Evolutionary computation. In: Evolution of Artificial Neural Development, pp. 29–37. Springer, Boston (2018).  https://doi.org/10.1007/978-1-4899-7687-1
  12. 12.
    Sastry, K., Goldberg, D.E., Kendall, G.: Genetic algorithms. In: Search Methodologies, pp. 93–117. Springer, Boston (2014).  https://doi.org/10.1007/0-387-28356-0_4
  13. 13.
    Wan, Y., Wang, M., Ye, Z., Lai, X.: A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 49, 248–258 (2016)CrossRefGoogle Scholar
  14. 14.
    Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013)CrossRefGoogle Scholar
  15. 15.
    Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul. 5, 4104–4108 (1997)Google Scholar
  16. 16.
    Tzanakis, G., Moura, L., Panario, D., Stevens, B.: Constructing new covering arrays from LFSR sequences over finite fields. Discrete Math. 339, 1158–1171 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
  18. 18.
    Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8, 1505–1512 (2008)CrossRefGoogle Scholar
  19. 19.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. (Ny) 180, 2044–2064 (2010)CrossRefGoogle Scholar
  20. 20.
    Zarshenas, A., Suzuki, K.: Binary coordinate ascent: an efficient optimization technique for feature subset selection for machine learning. Knowl.-Based Syst. 110, 191–201 (2016)CrossRefGoogle Scholar
  21. 21.
    Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 25 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Hugo Dorado
    • 1
    • 2
    Email author
  • Carlos Cobos
    • 1
  • Jose Torres-Jimenez
    • 3
  • Daniel Jimenez
    • 2
  • Martha Mendoza
    • 1
  1. 1.Information Technology Research Group (GTI)Universidad del CaucaPopayánColombia
  2. 2.International Center for Tropical Agriculture (CIAT)CaliColombia
  3. 3.Center for Research and Advanced Studies of the National Polytechnic InstituteCiudad VictoriaMexico

Personalised recommendations