Abstract
Feature selection is a useful step in data analysis procedure. In this chapter, we study the classical support vector machine recursive feature elimination (SVM-RFE) algorithm and improve it by incorporating a correlation bias reduction (CBR) strategy into the feature elimination procedure. Experiments are conducted on a synthetic dataset and two breath analysis datasets. Large and comprehensive sets of transient features are extracted from the sensor responses. The classification accuracy with feature selection proves the efficacy of the proposed SVM-RFE + CBR. It outperforms the original SVM-RFE and other typical algorithms. An ensemble method is further studied to improve the stability of the proposed method. By statistically analyzing the features’ rankings, some knowledge is obtained, which can guide future design of e-noses and feature extraction algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Awada W, Khoshgoftaar TM, Dittman D, Wald R, Napolitano A (2012) A review of the stability of feature selection techniques for bioinformatics data. In: 2012 IEEE 13th international conference on information reuse and integration (IRI). IEEE, Las Vegas, USA, pp 356–363
Bhondekar AP, Kaur R, Kumar R, Vig R, Kapur P (2011) A novel approach using dynamic social impact theory for optimization of impedance-tongue (itongue). Chemom Intell Lab 109(1):65–76
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Cho JH, Kurup PU (2011) Decision tree approach for classification and dimensionality reduction of electronic nose data. Sens Actuators: B Chem 160(1):542–548
Duan KB, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE T NanoBiosci 4(3):228–234
Gualdrón O, Brezmes J, Llobet E, Amari A, Vilanova X, Bouchikhi B, Correig X (2007) Variable selection for support vector machine based multisensor systems. Sens Actuators: B Chem 122(1):259–268
Guo D, Zhang D, Li N, Zhang L, Yang J (2010) A novel breath analysis system based on electronic olfaction. IEEE Trans Biomed Eng 57(11):2753–2763
Gutierrez-Osuna R, Gutierrez-Galvez A, Powar N (2003) Transient response analysis for temperature-modulated chemoresistors. Sens Actuators: B Chem 93(1):57–66
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Hierlemann A, Gutierrez-Osuna R (2008) Higher-order chemical sensing. Chem Rev 108(2):563–613
Hosseini-Golgoo S, Hossein-Babaei F (2011) Assessing the diagnostic information in the response patterns of a temperature-modulated tin oxide gas sensor. Meas Sci Technol 22(3):035, 201
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116
Kaur R, Kumar R, Gulati A, Ghanshyam C, Kapur P, Bhondekar AP (2012) Enhancing electronic nose performance: a novel feature selection approach using dynamic social impact theory and moving window time slicing for classification of kangra orthodox black tea (camellia sinensis (l.) o. kuntze). Sens Actuators B: Chem 166:309–319
Llobet E, Gualdrón O, Vinaixa M, El-Barbri N, Brezmes J, Vilanova X, Bouchikhi B, Gomez R, Carrasco J, Correig X (2007) Efficient feature selection for mass spectrometry based electronic nose applications. Chemom Intell Lab 85(2):253–261
Marco S, Gutiérrez-Gálvez A (2012) Signal and data processing for machine olfaction and chemical sensing: a review. IEEE Sens J 12(11):3189–3214
Martinelli E, Falconi C, D’Amico A, Di Natale C (2003) Feature extraction of chemical sensors in phase space. Sens Actuators: B Chem 95(1):132–139
Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans NanoBiosci 9(1):31–37
Pardo M, Sberveglieri G (2008) Random forests and nearest shrunken centroids for the classification of sensor array data. Sens Actuators: B Chem 131(1):93–99
Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227
Paulsson N, Larsson E, Winquist F (2000) Extraction and selection of parameters for evaluation of breath alcohol measurement with an electronic nose. Sens Actuators: A Phys 84(3):187–197
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Rakotomamonjy A (2003) Variable selection using SVM based criteria. J Mach Learn Res 3:1357–1370
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Saeys Y, Abeel T, Van de Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: Machine learning and knowledge discovery in databases. Springer, pp 313–325
Sharma DB, Bondell HD, Zhang HH (2013) Consistent group identification and variable selection in regression with correlated predictors. J Comput Graph Stat 22(2):319–340
Somol P, Novovicova J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939
Tang Y, Zhang YQ, Huang Z (2007) Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE ACM T Comput Bi 4(3):365–381
Toloşi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14):1986–1994
Yan K, Zhang D (2014a) Blood glucose prediction by breath analysis system with feature selection and model fusion. In: 2014 36th Annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 6406–6409
Yan K, Zhang D (2014b) Sensor evaluation in a breath analysis system. In: 2014 International Conference on medical biometrics (ICMB). IEEE, pp 35–40
Yan K, Zhang D (2015) Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens Actuators B: Chem 212:353–363
Yan K, Zhang D, Wu D, Wei H, Lu G (2014) Design of a breath analysis system for diabetes screening and blood glucose level prediction. IEEE Trans Biomed Eng 61(11):2787–2795
Yoon S, Kim S (2009) Mutual information-based SVM-RFE for diagnostic classification of digitized mammograms. Pattern Recogn Lett 30(16):1489–1495
Zhang S, Xie C, Hu M, Li H, Bai Z, Zeng D (2008) An entire feature extraction method of metal oxide gas sensors. Sens Actuators: B Chem 132(1):81–89
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Zhang, D., Guo, D., Yan, K. (2017). Feature Selection and Analysis on Correlated Breath Data . In: Breath Analysis for Medical Applications. Springer, Singapore. https://doi.org/10.1007/978-981-10-4322-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-10-4322-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4321-5
Online ISBN: 978-981-10-4322-2
eBook Packages: Computer ScienceComputer Science (R0)