Abstract
Irrelevant feature elimination, when used correctly, aids in enhancing the feature selection accuracy which is critical in dimensionality reduction task. The additional intelligence enhances the search for an optimal subset of features by reducing the dataset, based on the previous performance. The search procedures being used are completely probabilistic and heuristic. Although the existing algorithms use various measures to evaluate the best feature subsets, they fail to eliminate irrelevant features. The procedure explained in the current paper focuses on enhanced feature selection process based on random subset feature selection (RSFS). Random subset feature selection (RSFS) uses random forest (RF) algorithm for better feature reduction. Through an extensive testing of this procedure which is carried out on several scientific datasets previously with different geometries, we aim to show in this paper that the optimal subset of features can be derived by eliminating the features which are two standard deviations away from mean. In many real-world applications like scientific data (e.g., cancer detection, diabetes, and medical diagnosis) removing the irrelevant features result in increase in detection accuracy with less cost and time. This helps the domain experts by identifying the reduction of features and saving valuable diagnosis time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
C. Bartenhagen, H.U. Klein, C. Ruckert, X. Jiang, M. Dugas, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11(1), 567 (2010)
L. Shen, E.C. Tan, Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 166–175 (2005)
L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)
C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
F. Ding, C. Peng, H. Long, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy 27(8), 1226–1238 (2005)
L. Yu, H. Liu, Efficiently handling feature redundancy in high-dimensional data, in SIGKDD 03 (Aug 2003)
H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer Science & Business Media, Berlin, 2012)
J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high dimensional classification of speaker likability, intelligibility and personality traits (2013)
L. Yu, C. Ding, S. Loscalzo, Stable feature selection via dense feature groups, in Proceedings of the 14th ACM SIGKDD (2008)
M. Dash, H. Liu, Feature selection for classification 1, 131–156 (1997)
K. Kira, L.A. Rendell, A practical approach to feature selection, in Proceedings of the Ninth International Workshop on Machine learning, pp. 249–256 (1992)
J. Reunanen, Overfitting in making comparisons between variable selection methods 3, 1371–1382 (2003)
E. Maltseva, C. Pizzuti, D. Talia, Mining high dimensional scientific data sets using singular value decomposition, in Data Mining for Scientific and Engineering Applications (Kluwer Academic Publishers, Dordrecht, 2001), pp. 425–438
J. Kehrer, H. Hauser, Visualization and visual analysis of multifaceted scientific data: a survey. IEEE Trans. Visual Comput. Graphics 19(3), 495–513 (2013)
J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)
D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017), http://archive.ics.uci.edu/ml
S. Li, J. Harner, D. Adjeroh, Random kNN feature selection a fast and stable alternative to random forests. BMC Bioinform. (Dec 2011)
L. Breiman, Random forests 3, 5–32 (2001)
I. Guyon, A. Elisseeff, An introduction to variable and feature selection 3, 1157–1183 (2003)
O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech, in INTERSPEECH, pp. 210–214 (2013)
J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective arXiv:1601.07996 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lakshmi Padmaja, D., Vishnuvardhan, B. (2019). Variance-Based Feature Selection for Enhanced Classification Performance. In: Satapathy, S., Bhateja, V., Somanah, R., Yang, XS., Senkerik, R. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 862. Springer, Singapore. https://doi.org/10.1007/978-981-13-3329-3_51
Download citation
DOI: https://doi.org/10.1007/978-981-13-3329-3_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3328-6
Online ISBN: 978-981-13-3329-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)