Variance-Based Feature Selection for Enhanced Classification Performance

Lakshmi Padmaja, D.; Vishnuvardhan, B.

doi:10.1007/978-981-13-3329-3_51

Variance-Based Feature Selection for Enhanced Classification Performance

D. Lakshmi Padmaja¹⁹ &
B. Vishnuvardhan²⁰

Conference paper
First Online: 31 December 2018

680 Accesses
4 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 862))

Abstract

Irrelevant feature elimination, when used correctly, aids in enhancing the feature selection accuracy which is critical in dimensionality reduction task. The additional intelligence enhances the search for an optimal subset of features by reducing the dataset, based on the previous performance. The search procedures being used are completely probabilistic and heuristic. Although the existing algorithms use various measures to evaluate the best feature subsets, they fail to eliminate irrelevant features. The procedure explained in the current paper focuses on enhanced feature selection process based on random subset feature selection (RSFS). Random subset feature selection (RSFS) uses random forest (RF) algorithm for better feature reduction. Through an extensive testing of this procedure which is carried out on several scientific datasets previously with different geometries, we aim to show in this paper that the optimal subset of features can be derived by eliminating the features which are two standard deviations away from mean. In many real-world applications like scientific data (e.g., cancer detection, diabetes, and medical diagnosis) removing the irrelevant features result in increase in detection accuracy with less cost and time. This helps the domain experts by identifying the reduction of features and saving valuable diagnosis time.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

C. Bartenhagen, H.U. Klein, C. Ruckert, X. Jiang, M. Dugas, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11(1), 567 (2010)
Article Google Scholar
L. Shen, E.C. Tan, Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 166–175 (2005)
Article Google Scholar
L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)
Google Scholar
C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Article Google Scholar
F. Ding, C. Peng, H. Long, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy 27(8), 1226–1238 (2005)
Google Scholar
L. Yu, H. Liu, Efficiently handling feature redundancy in high-dimensional data, in SIGKDD 03 (Aug 2003)
Google Scholar
H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer Science & Business Media, Berlin, 2012)
MATH Google Scholar
J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high dimensional classification of speaker likability, intelligibility and personality traits (2013)
Google Scholar
L. Yu, C. Ding, S. Loscalzo, Stable feature selection via dense feature groups, in Proceedings of the 14th ACM SIGKDD (2008)
Google Scholar
M. Dash, H. Liu, Feature selection for classification 1, 131–156 (1997)
Google Scholar
K. Kira, L.A. Rendell, A practical approach to feature selection, in Proceedings of the Ninth International Workshop on Machine learning, pp. 249–256 (1992)
Google Scholar
J. Reunanen, Overfitting in making comparisons between variable selection methods 3, 1371–1382 (2003)
Google Scholar
E. Maltseva, C. Pizzuti, D. Talia, Mining high dimensional scientific data sets using singular value decomposition, in Data Mining for Scientific and Engineering Applications (Kluwer Academic Publishers, Dordrecht, 2001), pp. 425–438
Google Scholar
J. Kehrer, H. Hauser, Visualization and visual analysis of multifaceted scientific data: a survey. IEEE Trans. Visual Comput. Graphics 19(3), 495–513 (2013)
Article Google Scholar
J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)
Article Google Scholar
D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017), http://archive.ics.uci.edu/ml
S. Li, J. Harner, D. Adjeroh, Random kNN feature selection a fast and stable alternative to random forests. BMC Bioinform. (Dec 2011)
Google Scholar
L. Breiman, Random forests 3, 5–32 (2001)
Google Scholar
I. Guyon, A. Elisseeff, An introduction to variable and feature selection 3, 1157–1183 (2003)
Google Scholar
O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech, in INTERSPEECH, pp. 210–214 (2013)
Google Scholar
J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective arXiv:1601.07996 (2016)
https://www.broadinstitute.org/

Download references

Author information

Authors and Affiliations

Department of Information Technology, Anurag Group of Institutions (CVSR), Hyderabad, India
D. Lakshmi Padmaja
Department of Computer Science and Engineering, JNTUH, Hyderabad, India
B. Vishnuvardhan

Authors

D. Lakshmi Padmaja
View author publications
You can also search for this author in PubMed Google Scholar
B. Vishnuvardhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Vishnuvardhan .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Universite des Mascareignes, Beau Bassin-Rose Hill, Mauritius
Radhakhrishna Somanah
School of Science and Technology, Middlesex University, London, UK
Xin-She Yang
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Roman Senkerik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lakshmi Padmaja, D., Vishnuvardhan, B. (2019). Variance-Based Feature Selection for Enhanced Classification Performance. In: Satapathy, S., Bhateja, V., Somanah, R., Yang, XS., Senkerik, R. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 862. Springer, Singapore. https://doi.org/10.1007/978-981-13-3329-3_51

Download citation

DOI: https://doi.org/10.1007/978-981-13-3329-3_51
Published: 31 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3328-6
Online ISBN: 978-981-13-3329-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics