Abstract
Variable reduction is an essential step in data mining, which is able effectively to increase both the performance of machine learning and the process knowledge by removing the redundant and irrelevant input variables. The paper presents a variable selection approach merging the dominating set procedure for redundancy analysis and a wrapper approach in order to achieve an informative and not redundant subset of variables improving both the stability and the computational complexity. The proposed approach is tested on different datasets coming from the UCI repository and from industrial contexts and is compared to the exhaustive variable selection approach, which is often considered optimal in terms of system performance. Moreover the novel method is applied to both classification and regression procedures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://archive.ics.uci.edu/ml/datasets.html
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
Biggs, N., Lloyd, E., Wilson, R.: Graph Theory. Oxford University Press (1986)
Bondy, J.A., Murty, U.: Graph Theory. Springer (2008). ISBN 978-1-84628-969-9
Breiman, L., Friedman, J.H., Olshen, R.A., Stone., C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Cateni, S., Colla, V., Vannucci, M., Vannocci, M.: A procedure for building reduced reliable training datasets from realworld data. In: 13th IASTED International Conference on Artificial Intelligence and Applications, AIA 2014, Innsbruck, Austria, pp. 393–399 (2014)
Cateni, S., Colla, V., Vannucci, M.: A fuzzy system for combining filter features selection methods. Int. J. Fuzzy Syst. (2016)
Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: 8th European Modeling Symposium on Mathematical Modeling and Computer simulation EMS 2014, Pisa, Italy, vol. 1, pp. 1–8 (2014)
Cateni, S., Colla, V., Vannucci, M.: General purpose input variable extraction: a genetic algorithm based procedure give a gap. In: 9th International Conference on Intelligence Systems Design and Applications, ISDA 2009, pp. 1307–1311 (2009)
Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010)
Cateni, S., Colla, V.: A hybrid variable selection approach for NN-based classification in industrial context. In: Smart Innovation, Systems and Technologies (in press)
Cateni, S., Colla, V.: Improving the stability of sequential forward and backward variables selection. In: 15th International Conference on Intelligent Systems Design and Applications, ISDA 2015, pp. 374–379 (2016)
Cateni, S., Colla, V.: The importance of variable selection for neural networks based classification in an industrial context. In: International Workshop on Neural Networks, WIRN 2015, Smart Innovation, Systems and Technologies, vol. 54, pp. 363–370 (2016)
Cateni, S., Colla, V.: Improving the stability of wrapper variable selection applied to binary classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 214–225 (2016)
Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm based approach for selecting input variables and setting relevant network parameters of som based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011)
Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datadata in binary classification tasks for realworld problems. Neurocomputing 135, 32–41 (2014)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Fiasché, M.: A quantum-inspired evolutionary algorithm for optimization numerical problems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Part 3. LNCS, vol. 7665, pp. 686–693 (2012)
Fiasché, M.: SVM tree for personalized transductive learning in bioinformatics classification problems. Smart Innov. Syst. Technol. 26, 223–231 (2014)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Mach. Learn. 3, 1157–1182 (2003)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97, 273–324 (1997)
Loscalzo, S., Yu, L., Ding, C.: Consensus group stable feature selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 1, pp. 567–575. ACM (2009)
May, R., Dandy, G., Maier, H.: Review of input variable selection methods for artificial neural networks. Artif. Neural Netw. Methodol. Adv. Biomed. Appl. (2011)
Mitchell, T., Toby, J., Beauchamp, J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–32 (1988)
Novovicova, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms stability. In: IEEE International Conference Data Mining Workshops, vol. 1, pp. 382–387 (2009)
Sun, Y., Robinson, M., Adams, R., Boekhorst, R., Rust, A.G., Davey, N.: Using feature selection filtering methods for binding site predictions. In: Proceedings of 5th IEEE International Conference on Cognitive Informatics (ICCI 2006) (2006)
Turney, P.: Techncal note: bias and the quantification of stability. Mach. Learn. 20, 23–33 (1995)
Wang, S., Zhu, J.: Variable selection for model-based high dimensional clustering and its application on microarray data. Biometrics 64, 440–448 (2008)
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, ICML, vol. 1, pp. 856–863 (2003)
Acknowledgements
The work presented in this paper was developed within the project entitled “Piattaforma Integrata Avanzata per la Progettazione di Macchine e Sistemi Complessi” (PROMAS), which was co-funded under Tuscany POR FESR 2014–2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Cateni, S., Colla, V., Iannino, V. (2019). Improving the Stability of Variable Selection for Industrial Datasets. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-95098-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95097-6
Online ISBN: 978-3-319-95098-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)