Improving the Stability of Variable Selection for Industrial Datasets

Cateni, Silvia; Colla, Valentina; Iannino, Vincenzo

doi:10.1007/978-3-319-95098-3_19

Silvia Cateni⁷,
Valentina Colla⁷ &
Vincenzo Iannino⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 102))

Included in the following conference series:

Italian Workshop on Neural Nets

488 Accesses
1 Citations

Abstract

Variable reduction is an essential step in data mining, which is able effectively to increase both the performance of machine learning and the process knowledge by removing the redundant and irrelevant input variables. The paper presents a variable selection approach merging the dominating set procedure for redundancy analysis and a wrapper approach in order to achieve an informative and not redundant subset of variables improving both the stability and the computational complexity. The proposed approach is tested on different datasets coming from the UCI repository and from industrial contexts and is compared to the exhaustive variable selection approach, which is often considered optimal in terms of system performance. Moreover the novel method is applied to both classification and regression procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://archive.ics.uci.edu/ml/datasets.html
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
Google Scholar
Biggs, N., Lloyd, E., Wilson, R.: Graph Theory. Oxford University Press (1986)
Google Scholar
Bondy, J.A., Murty, U.: Graph Theory. Springer (2008). ISBN 978-1-84628-969-9
Book Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone., C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Cateni, S., Colla, V., Vannucci, M., Vannocci, M.: A procedure for building reduced reliable training datasets from realworld data. In: 13th IASTED International Conference on Artificial Intelligence and Applications, AIA 2014, Innsbruck, Austria, pp. 393–399 (2014)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: A fuzzy system for combining filter features selection methods. Int. J. Fuzzy Syst. (2016)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: 8th European Modeling Symposium on Mathematical Modeling and Computer simulation EMS 2014, Pisa, Italy, vol. 1, pp. 1–8 (2014)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: General purpose input variable extraction: a genetic algorithm based procedure give a gap. In: 9th International Conference on Intelligence Systems Design and Applications, ISDA 2009, pp. 1307–1311 (2009)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010)
Google Scholar
Cateni, S., Colla, V.: A hybrid variable selection approach for NN-based classification in industrial context. In: Smart Innovation, Systems and Technologies (in press)
Google Scholar
Cateni, S., Colla, V.: Improving the stability of sequential forward and backward variables selection. In: 15th International Conference on Intelligent Systems Design and Applications, ISDA 2015, pp. 374–379 (2016)
Google Scholar
Cateni, S., Colla, V.: The importance of variable selection for neural networks based classification in an industrial context. In: International Workshop on Neural Networks, WIRN 2015, Smart Innovation, Systems and Technologies, vol. 54, pp. 363–370 (2016)
Chapter Google Scholar
Cateni, S., Colla, V.: Improving the stability of wrapper variable selection applied to binary classification. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 8, 214–225 (2016)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm based approach for selecting input variables and setting relevant network parameters of som based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011)
Google Scholar
Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datadata in binary classification tasks for realworld problems. Neurocomputing 135, 32–41 (2014)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Google Scholar
Fiasché, M.: A quantum-inspired evolutionary algorithm for optimization numerical problems. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Part 3. LNCS, vol. 7665, pp. 686–693 (2012)
Chapter Google Scholar
Fiasché, M.: SVM tree for personalized transductive learning in bioinformatics classification problems. Smart Innov. Syst. Technol. 26, 223–231 (2014)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Mach. Learn. 3, 1157–1182 (2003)
MATH Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)
Article Google Scholar
Kohavi, R., John, G.: Wrappers for feature selection. Artif. Intell. 97, 273–324 (1997)
Article Google Scholar
Loscalzo, S., Yu, L., Ding, C.: Consensus group stable feature selection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 1, pp. 567–575. ACM (2009)
Google Scholar
May, R., Dandy, G., Maier, H.: Review of input variable selection methods for artificial neural networks. Artif. Neural Netw. Methodol. Adv. Biomed. Appl. (2011)
Google Scholar
Mitchell, T., Toby, J., Beauchamp, J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83, 1023–32 (1988)
Article MathSciNet Google Scholar
Novovicova, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms stability. In: IEEE International Conference Data Mining Workshops, vol. 1, pp. 382–387 (2009)
Google Scholar
Sun, Y., Robinson, M., Adams, R., Boekhorst, R., Rust, A.G., Davey, N.: Using feature selection filtering methods for binding site predictions. In: Proceedings of 5th IEEE International Conference on Cognitive Informatics (ICCI 2006) (2006)
Google Scholar
Turney, P.: Techncal note: bias and the quantification of stability. Mach. Learn. 20, 23–33 (1995)
Google Scholar
Wang, S., Zhu, J.: Variable selection for model-based high dimensional clustering and its application on microarray data. Biometrics 64, 440–448 (2008)
Article MathSciNet Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation based filter solution. In: Proceedings of the 20th International Conference on Machine Learning, ICML, vol. 1, pp. 856–863 (2003)
Google Scholar

Download references

Acknowledgements

The work presented in this paper was developed within the project entitled “Piattaforma Integrata Avanzata per la Progettazione di Macchine e Sistemi Complessi” (PROMAS), which was co-funded under Tuscany POR FESR 2014–2020.

Author information

Authors and Affiliations

Scuola Superiore Sant’ Anna - TeCIP Institute, Via Alamanni 13B, 56010, Pisa, Italy
Silvia Cateni, Valentina Colla & Vincenzo Iannino

Authors

Silvia Cateni
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Colla
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Iannino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Cateni .

Editor information

Editors and Affiliations

Dipartimento di Psicologia, Università della Campana Luigi Vanvitelli, Caserta, Italy
Anna Esposito
Fundació Tecnocampus, Pompeu Fabra University, Mataro, Barcelona, Spain
Marcos Faundez-Zanuy
Department of Civil, Environmental, Energy, and Material Engineering, University Mediterranea of Reggio Calabria, Reggio Calabria, Italy
Francesco Carlo Morabito
Laboratorio di Neuronica, Dipartimento Elettronica e Telecomunicazioni, Politecnico di Torino, Torino, Italy
Eros Pasero

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cateni, S., Colla, V., Iannino, V. (2019). Improving the Stability of Variable Selection for Industrial Datasets. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-95098-3_19
Published: 22 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95097-6
Online ISBN: 978-3-319-95098-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics