A Comparison of Feature Selection Methods to Optimize Predictive Models Based on Decision Forest Algorithms for Academic Data Analysis
Nowadays, Feature Selection (FS) methods are essential (1) to create easy-to-explain predictive models in shorter periods of time, (2) to reduce overfitting and (3) avoid sparsity of data. The suitability of using these techniques is studied in this paper. Furthermore, a comparison of some widely extended techniques is performed to know which one is more appropriated to create predictive models using decision forest algorithms. For this comparison, experiments are conducted in which predictive models for each FS method are built to foresee if students will finish their degree after finishing their first year in college. A real dataset with students’ data provided by the University of Almería is used to generate the predictive models. By comparing the accuracy of the built models, we can measure the effectiveness of each FS method, being the Chi-Square statistic the method that leads to better results in our experimental study.
KeywordsFeature selection Machine learning Decision forest
This work has been funded by the EU ERDF and the Spanish Ministry of Economy and Competitiveness (MINECO) under Projects TIN2013-41576-R and TIN2017-83964-R. A.J. Fernández-García has been funded by a FPI Grant BES-2014-067974.
- 1.Campbell, M. (eds.): Statistics at square one, 9th edn. University of Southampton, Copyright BMJ Publishing Group (1997)Google Scholar
- 4.Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: A Microservice-based Architecture for Enhancing the User Experience in Cross-device Distributed Mashup UIs with Multiple Forms of Interaction, Universal Access in the Information Society (2017). Special Issue on Distributed UIs: Distributing InteractionsGoogle Scholar
- 5.Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: Evolving mashup interfaces using a distributed machine learning and model transformation methodology. In: Proceedings of On the Move to Meaningful Internet Systems: OTM 2015. International Workshop on Information Systems in Distributed Systems (ISDE). LNCS, vol. 9416, Rhodes, Greece, 26-30 October, pp 401–410. Springer, Cham (2015)CrossRefGoogle Scholar
- 9.Microsoft Corporation: Microsoft Azure Machine Learning Studio. https://studio.azureml.net. Accessed 8 Aug 2017
- 10.Molina, L.C., Belanche. L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: IEEE International Conference on Data Mining, 2002, Proceedings, pp. 306–313 (2002)Google Scholar
- 11.Salem, A., Jiliang, T., Huan, L.: Feature Selection for Clustering: A Review. In: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC (2013). ISBN: 978-1-4665-8674-1. eBook, ISBN: 978-1-4665-8675-8Google Scholar
- 12.Sharp, T., Lengerich, R., Bai, S.: Online. STAT 509. Eberly College of Science. Penn State. https://onlinecourses.science.psu.edu/stat509/node/158. Accessed 8 Aug 2017