A Comparison of Feature Selection Methods to Optimize Predictive Models Based on Decision Forest Algorithms for Academic Data Analysis

Fernández-García, Antonio Jesús; Iribarne, Luis; Corral, Antonio; Criado, Javier

doi:10.1007/978-3-319-77703-0_35

Antonio Jesús Fernández-García⁶,
Luis Iribarne⁶,
Antonio Corral⁶ &
…
Javier Criado⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 745))

Included in the following conference series:

World Conference on Information Systems and Technologies

8562 Accesses
3 Citations

Abstract

Nowadays, Feature Selection (FS) methods are essential (1) to create easy-to-explain predictive models in shorter periods of time, (2) to reduce overfitting and (3) avoid sparsity of data. The suitability of using these techniques is studied in this paper. Furthermore, a comparison of some widely extended techniques is performed to know which one is more appropriated to create predictive models using decision forest algorithms. For this comparison, experiments are conducted in which predictive models for each FS method are built to foresee if students will finish their degree after finishing their first year in college. A real dataset with students’ data provided by the University of Almería is used to generate the predictive models. By comparing the accuracy of the built models, we can measure the effectiveness of each FS method, being the Chi-Square statistic the method that leads to better results in our experimental study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Campbell, M. (eds.): Statistics at square one, 9th edn. University of Southampton, Copyright BMJ Publishing Group (1997)
Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electrical Eng. 40(1), 16–28 (2014). ISSN:0045-7906
Article Google Scholar
Criado, J., Rodriguez-Gracia, D., Iribarne, L., Padilla, N.: Toward the adaptation of component-based architectures by model transformation: behind smart user interfaces. Softw. Pract. Experience 45(12), 1677–1718 (2015). ISSN:0038-0644
Article Google Scholar
Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: A Microservice-based Architecture for Enhancing the User Experience in Cross-device Distributed Mashup UIs with Multiple Forms of Interaction, Universal Access in the Information Society (2017). Special Issue on Distributed UIs: Distributing Interactions
Google Scholar
Fernández-García, A.J., Iribarne, L., Corral, A., Wang, J.Z.: Evolving mashup interfaces using a distributed machine learning and model transformation methodology. In: Proceedings of On the Move to Meaningful Internet Systems: OTM 2015. International Workshop on Information Systems in Distributed Systems (ISDE). LNCS, vol. 9416, Rhodes, Greece, 26-30 October, pp 401–410. Springer, Cham (2015)
Chapter Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
Book Google Scholar
Lazar, C., et al.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1106–1119 (2012)
Article Google Scholar
Mantel, N.: Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. J. Am. Stat. Assoc. 58(303), 690–700 (1963)
MathSciNet MATH Google Scholar
Microsoft Corporation: Microsoft Azure Machine Learning Studio. https://studio.azureml.net. Accessed 8 Aug 2017
Molina, L.C., Belanche. L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: IEEE International Conference on Data Mining, 2002, Proceedings, pp. 306–313 (2002)
Google Scholar
Salem, A., Jiliang, T., Huan, L.: Feature Selection for Clustering: A Review. In: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC (2013). ISBN: 978-1-4665-8674-1. eBook, ISBN: 978-1-4665-8675-8
Google Scholar
Sharp, T., Lengerich, R., Bai, S.: Online. STAT 509. Eberly College of Science. Penn State. https://onlinecourses.science.psu.edu/stat509/node/158. Accessed 8 Aug 2017

Download references

Acknowledgement

This work has been funded by the EU ERDF and the Spanish Ministry of Economy and Competitiveness (MINECO) under Projects TIN2013-41576-R and TIN2017-83964-R. A.J. Fernández-García has been funded by a FPI Grant BES-2014-067974.

Author information

Authors and Affiliations

Applied Computing Group, University of Almeria, Ctra. Sacramento, s/n, 04120, La Cañada, Almería, Spain
Antonio Jesús Fernández-García, Luis Iribarne, Antonio Corral & Javier Criado

Authors

Antonio Jesús Fernández-García
View author publications
You can also search for this author in PubMed Google Scholar
Luis Iribarne
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Corral
View author publications
You can also search for this author in PubMed Google Scholar
Javier Criado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Jesús Fernández-García .

Editor information

Editors and Affiliations

Departamento de Engenharia Informática, Universidade de Coimbra, Coimbra, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
DSI/EEUM, Universidade do Minho, Guimarães, Portugal
Luís Paulo Reis
DIMES, Università della Calabria, Arcavacata di Rende, Italy
Sandra Costanzo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernández-García, A.J., Iribarne, L., Corral, A., Criado, J. (2018). A Comparison of Feature Selection Methods to Optimize Predictive Models Based on Decision Forest Algorithms for Academic Data Analysis. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 745. Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-77703-0_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77702-3
Online ISBN: 978-3-319-77703-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics