Skip to main content

Variance-Based Feature Selection for Enhanced Classification Performance

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 862))

Abstract

Irrelevant feature elimination, when used correctly, aids in enhancing the feature selection accuracy which is critical in dimensionality reduction task. The additional intelligence enhances the search for an optimal subset of features by reducing the dataset, based on the previous performance. The search procedures being used are completely probabilistic and heuristic. Although the existing algorithms use various measures to evaluate the best feature subsets, they fail to eliminate irrelevant features. The procedure explained in the current paper focuses on enhanced feature selection process based on random subset feature selection (RSFS). Random subset feature selection (RSFS) uses random forest (RF) algorithm for better feature reduction. Through an extensive testing of this procedure which is carried out on several scientific datasets previously with different geometries, we aim to show in this paper that the optimal subset of features can be derived by eliminating the features which are two standard deviations away from mean. In many real-world applications like scientific data (e.g., cancer detection, diabetes, and medical diagnosis) removing the irrelevant features result in increase in detection accuracy with less cost and time. This helps the domain experts by identifying the reduction of features and saving valuable diagnosis time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. C. Bartenhagen, H.U. Klein, C. Ruckert, X. Jiang, M. Dugas, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinform. 11(1), 567 (2010)

    Article  Google Scholar 

  2. L. Shen, E.C. Tan, Dimension reduction-based penalized logistic regression for cancer classification using microarray data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 166–175 (2005)

    Article  Google Scholar 

  3. L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  4. C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)

    Article  Google Scholar 

  5. F. Ding, C. Peng, H. Long, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy 27(8), 1226–1238 (2005)

    Google Scholar 

  6. L. Yu, H. Liu, Efficiently handling feature redundancy in high-dimensional data, in SIGKDD 03 (Aug 2003)

    Google Scholar 

  7. H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer Science & Business Media, Berlin, 2012)

    MATH  Google Scholar 

  8. J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high dimensional classification of speaker likability, intelligibility and personality traits (2013)

    Google Scholar 

  9. L. Yu, C. Ding, S. Loscalzo, Stable feature selection via dense feature groups, in Proceedings of the 14th ACM SIGKDD (2008)

    Google Scholar 

  10. M. Dash, H. Liu, Feature selection for classification 1, 131–156 (1997)

    Google Scholar 

  11. K. Kira, L.A. Rendell, A practical approach to feature selection, in Proceedings of the Ninth International Workshop on Machine learning, pp. 249–256 (1992)

    Google Scholar 

  12. J. Reunanen, Overfitting in making comparisons between variable selection methods 3, 1371–1382 (2003)

    Google Scholar 

  13. E. Maltseva, C. Pizzuti, D. Talia, Mining high dimensional scientific data sets using singular value decomposition, in Data Mining for Scientific and Engineering Applications (Kluwer Academic Publishers, Dordrecht, 2001), pp. 425–438

    Google Scholar 

  14. J. Kehrer, H. Hauser, Visualization and visual analysis of multifaceted scientific data: a survey. IEEE Trans. Visual Comput. Graphics 19(3), 495–513 (2013)

    Article  Google Scholar 

  15. J. Pohjalainen, O. Rasanen, S. Kadioglu, Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)

    Article  Google Scholar 

  16. D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017), http://archive.ics.uci.edu/ml

  17. S. Li, J. Harner, D. Adjeroh, Random kNN feature selection a fast and stable alternative to random forests. BMC Bioinform. (Dec 2011)

    Google Scholar 

  18. L. Breiman, Random forests 3, 5–32 (2001)

    Google Scholar 

  19. I. Guyon, A. Elisseeff, An introduction to variable and feature selection 3, 1157–1183 (2003)

    Google Scholar 

  20. O. Räsänen, J. Pohjalainen, Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech, in INTERSPEECH, pp. 210–214 (2013)

    Google Scholar 

  21. J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective arXiv:1601.07996 (2016)

  22. https://www.broadinstitute.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Vishnuvardhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lakshmi Padmaja, D., Vishnuvardhan, B. (2019). Variance-Based Feature Selection for Enhanced Classification Performance. In: Satapathy, S., Bhateja, V., Somanah, R., Yang, XS., Senkerik, R. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 862. Springer, Singapore. https://doi.org/10.1007/978-981-13-3329-3_51

Download citation

Publish with us

Policies and ethics