Advertisement

A Comparison of Feature Selection Methods to Optimize Predictive Models Based on Decision Forest Algorithms for Academic Data Analysis

  • Antonio Jesús Fernández-García
  • Luis Iribarne
  • Antonio Corral
  • Javier Criado
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 745)

Abstract

Nowadays, Feature Selection (FS) methods are essential (1) to create easy-to-explain predictive models in shorter periods of time, (2) to reduce overfitting and (3) avoid sparsity of data. The suitability of using these techniques is studied in this paper. Furthermore, a comparison of some widely extended techniques is performed to know which one is more appropriated to create predictive models using decision forest algorithms. For this comparison, experiments are conducted in which predictive models for each FS method are built to foresee if students will finish their degree after finishing their first year in college. A real dataset with students’ data provided by the University of Almería is used to generate the predictive models. By comparing the accuracy of the built models, we can measure the effectiveness of each FS method, being the Chi-Square statistic the method that leads to better results in our experimental study.

Keywords

Feature selection Machine learning Decision forest 

Notes

Acknowledgement

This work has been funded by the EU ERDF and the Spanish Ministry of Economy and Competitiveness (MINECO) under Projects TIN2013-41576-R and TIN2017-83964-R. A.J. Fernández-García has been funded by a FPI Grant BES-2014-067974.

References

  1. 1.
    Campbell, M. (eds.): Statistics at square one, 9th edn. University of Southampton, Copyright BMJ Publishing Group (1997)Google Scholar
  2. 2.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electrical Eng. 40(1), 16–28 (2014). ISSN:0045-7906CrossRefGoogle Scholar
  3. 3.
    Criado, J., Rodriguez-Gracia, D., Iribarne, L., Padilla, N.: Toward the adaptation of component-based architectures by model transformation: behind smart user interfaces. Softw. Pract. Experience 45(12), 1677–1718 (2015). ISSN:0038-0644CrossRefGoogle Scholar
  4. 4.
    Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: A Microservice-based Architecture for Enhancing the User Experience in Cross-device Distributed Mashup UIs with Multiple Forms of Interaction, Universal Access in the Information Society (2017). Special Issue on Distributed UIs: Distributing InteractionsGoogle Scholar
  5. 5.
    Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: Evolving mashup interfaces using a distributed machine learning and model transformation methodology. In: Proceedings of On the Move to Meaningful Internet Systems: OTM 2015. International Workshop on Information Systems in Distributed Systems (ISDE). LNCS, vol. 9416, Rhodes, Greece, 26-30 October, pp 401–410. Springer, Cham (2015)CrossRefGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)CrossRefGoogle Scholar
  7. 7.
    Lazar, C., et al.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1106–1119 (2012)CrossRefGoogle Scholar
  8. 8.
    Mantel, N.: Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. J. Am. Stat. Assoc. 58(303), 690–700 (1963)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Microsoft Corporation: Microsoft Azure Machine Learning Studio. https://studio.azureml.net. Accessed 8 Aug 2017
  10. 10.
    Molina, L.C., Belanche. L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: IEEE International Conference on Data Mining, 2002, Proceedings, pp. 306–313 (2002)Google Scholar
  11. 11.
    Salem, A., Jiliang, T., Huan, L.: Feature Selection for Clustering: A Review. In: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC (2013). ISBN: 978-1-4665-8674-1. eBook, ISBN: 978-1-4665-8675-8Google Scholar
  12. 12.
    Sharp, T., Lengerich, R., Bai, S.: Online. STAT 509. Eberly College of Science. Penn State. https://onlinecourses.science.psu.edu/stat509/node/158. Accessed 8 Aug 2017

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Antonio Jesús Fernández-García
    • 1
  • Luis Iribarne
    • 1
  • Antonio Corral
    • 1
  • Javier Criado
    • 1
  1. 1.Applied Computing GroupUniversity of AlmeriaLa CañadaSpain

Personalised recommendations