Skip to main content

Abstract

Feature selection is employed to diminish the number of features in various applications where data has more than hundreds of attributes. Essential or relevant attribute recognition has converted a vital job to utilize data mining algorithms efficiently in today’s world situations. Current feature selection techniques primarily concentrate on obtaining relevant attributes. This paper presents the notions of feature relevance, redundancy, evaluation criteria, and literature survey on the feature selection approaches in the different areas by many researchers. This paper supports to choose feature selection techniques without identifying the knowledge of every algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

AA:

Average Accuracy

AIC:

Akaike information criterion

ANN:

Artificial Neural Network

AUC:

Area under the Curve

BWO:

Binary Wolf Optimization

CART:

Classification and Regression Tree

CFA:

Cuttlefish algorithm

CFS:

Correlation-based Feature Selection

CS:

Chi-Square

DM:

Data Mining

F:

F-Score

FCBF:

Fast Correlation-based Feature selection

FP:

False Positive

GA:

Genetic Algorithm

GR:

Gain Ratio

IG:

Information Gain

K-NN:

K-Nearest Neighbor

LMT:

Logistic Model Tree

MI:

Mutual Information

MLP:

Multi-Layer Perceptron

NB:

Naïve Bayes

OA:

Overall Accuracy

P:

Precision

PCA:

Principal Component Analysis

PSO:

Particle Swarm Optimization

R:

Recall

RBF:

Radial Basis Function

ROC:

Receiver Operating Curve

SVM:

Support Vector Machine

TP:

True Positive

TV:

Term Variance

WOA:

Whale Optimization algorithm

References

  1. Hong S-S, Lee W, Han M-M (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7:1

    Google Scholar 

  2. Qian Y et al (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78

    Article  MathSciNet  Google Scholar 

  3. Liang D, Tsai C-F, Wu H-T (2015) The effect of feature selection on financial distress prediction. Knowl-Based Syst 73:289–297

    Article  Google Scholar 

  4. Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput & Applic 28(4):753–763

    Article  Google Scholar 

  5. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114

    Article  Google Scholar 

  6. Inbarani HH, Bagyamathi M, Azar AT (2015) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput & Applic 26(8):1859–1880

    Article  Google Scholar 

  7. Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342

    Article  Google Scholar 

  8. Han M, Ren W (2015) Global mutual information-based feature selection approach using single-objective and multi-objective optimization. Neurocomputing 168:47–54

    Article  Google Scholar 

  9. Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consum Serv 27:11–23

    Article  Google Scholar 

  10. Eesa AS, Orman Z, Brifcani AMA (2015) A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst Appl 42(5):2670–2679

    Article  Google Scholar 

  11. Dessì N, Pes B (2015) Similarity of feature selection methods: An empirical study across data intensive classification tasks. Expert Syst Appl 42(10):4632–4642

    Article  Google Scholar 

  12. Manek AS et al (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 20(2):135–154

    Article  Google Scholar 

  13. Osanaiye O et al (2016) Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J Wirel Commun Netw 1(2016):130

    Article  Google Scholar 

  14. Bagherzadeh-Khiabani F et al (2016) A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol 71:76–85

    Article  Google Scholar 

  15. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Article  Google Scholar 

  16. Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47

    Article  Google Scholar 

  17. Wan Y et al (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258

    Article  Google Scholar 

  18. Xi M et al (2016) Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. Comput Math Methods Med 2016

    Google Scholar 

  19. Shen L et al (2016) Evolving support vector machines using fruit fly optimization for medical data classification. Knowl-Based Syst 96:61–75

    Article  Google Scholar 

  20. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34

    Article  Google Scholar 

  21. Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932

    Article  Google Scholar 

  22. Mafarja MM, Mirjalili S (2017) Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312

    Article  Google Scholar 

  23. Faris H et al (2017) A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture. Neural Comput & Applic:1–15

    Google Scholar 

  24. Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saud Univer Comp Inform Sci 29(4):462–472

    Article  Google Scholar 

  25. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795

    Article  Google Scholar 

  26. Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36

    Article  Google Scholar 

  27. Tharwat A, Hassanien AE, Elnaghi BE (2017) A ba-based algorithm for parameter optimization of support vector machine. Pattern Recogn Lett 93:13–22

    Article  Google Scholar 

  28. Qi C et al (2017) Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification. Neurocomputing 220:181–190

    Article  Google Scholar 

  29. Shrivastava P et al (2017) A survey of nature-inspired algorithms for feature selection to identify Parkinson’s disease. Comput Methods Program Biomed 139:171–179

    Article  Google Scholar 

  30. Srisukkham W et al (2017) Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization. Appl Soft Comput 56:405–419

    Article  Google Scholar 

  31. Wang H, Niu B (2017) A novel bacterial algorithm with randomness control for feature selection in classification. Neurocomputing 228:176–186

    Article  Google Scholar 

  32. Gu S, Cheng R, Jin Y (2018) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22(3):811–822

    Article  Google Scholar 

  33. Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci 25:152–160

    Article  Google Scholar 

  34. Hancer E et al (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479

    Article  Google Scholar 

  35. Mafarja M et al (2018) Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowl-Based Syst 145:25–45

    Article  Google Scholar 

  36. Acharya N, Singh S (2018) An IWD-based feature selection method for intrusion detection system. Soft Comput 22(13):4407–4416

    Article  Google Scholar 

  37. Cheruku R et al (2018) RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl Soft Comput 67:764–780

    Article  Google Scholar 

  38. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  39. Chuang M-T, Hu Y-h, Lo C-L (2018) Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res 25(1):75–90

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Durairaj, M., Poornappriya, T.S. (2020). Why Feature Selection in Data Mining Is Prominent? A Survey. In: Kumar, L., Jayashree, L., Manimegalai, R. (eds) Proceedings of International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. AISGSC 2019 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-24051-6_88

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24051-6_88

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24050-9

  • Online ISBN: 978-3-030-24051-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics