Advertisement

Why Feature Selection in Data Mining Is Prominent? A Survey

  • M. Durairaj
  • T. S. Poornappriya
Conference paper
  • 44 Downloads

Abstract

Feature selection is employed to diminish the number of features in various applications where data has more than hundreds of attributes. Essential or relevant attribute recognition has converted a vital job to utilize data mining algorithms efficiently in today’s world situations. Current feature selection techniques primarily concentrate on obtaining relevant attributes. This paper presents the notions of feature relevance, redundancy, evaluation criteria, and literature survey on the feature selection approaches in the different areas by many researchers. This paper supports to choose feature selection techniques without identifying the knowledge of every algorithm.

Keywords

Relevance Redundancy Feature selection Filter-based approach Wrapper-based approach Classification techniques 

Abbreviations

AA

Average Accuracy

AIC

Akaike information criterion

ANN

Artificial Neural Network

AUC

Area under the Curve

BWO

Binary Wolf Optimization

CART

Classification and Regression Tree

CFA

Cuttlefish algorithm

CFS

Correlation-based Feature Selection

CS

Chi-Square

DM

Data Mining

F

F-Score

FCBF

Fast Correlation-based Feature selection

FP

False Positive

GA

Genetic Algorithm

GR

Gain Ratio

IG

Information Gain

K-NN

K-Nearest Neighbor

LMT

Logistic Model Tree

MI

Mutual Information

MLP

Multi-Layer Perceptron

NB

Naïve Bayes

OA

Overall Accuracy

P

Precision

PCA

Principal Component Analysis

PSO

Particle Swarm Optimization

R

Recall

RBF

Radial Basis Function

ROC

Receiver Operating Curve

SVM

Support Vector Machine

TP

True Positive

TV

Term Variance

WOA

Whale Optimization algorithm

References

  1. 1.
    Hong S-S, Lee W, Han M-M (2015) The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int J Adv Soft Comput Appl 7:1Google Scholar
  2. 2.
    Qian Y et al (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78MathSciNetCrossRefGoogle Scholar
  3. 3.
    Liang D, Tsai C-F, Wu H-T (2015) The effect of feature selection on financial distress prediction. Knowl-Based Syst 73:289–297CrossRefGoogle Scholar
  4. 4.
    Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput & Applic 28(4):753–763CrossRefGoogle Scholar
  5. 5.
    Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114CrossRefGoogle Scholar
  6. 6.
    Inbarani HH, Bagyamathi M, Azar AT (2015) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput & Applic 26(8):1859–1880CrossRefGoogle Scholar
  7. 7.
    Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342CrossRefGoogle Scholar
  8. 8.
    Han M, Ren W (2015) Global mutual information-based feature selection approach using single-objective and multi-objective optimization. Neurocomputing 168:47–54CrossRefGoogle Scholar
  9. 9.
    Koutanaei FN, Sajedi H, Khanbabaei M (2015) A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. J Retail Consum Serv 27:11–23CrossRefGoogle Scholar
  10. 10.
    Eesa AS, Orman Z, Brifcani AMA (2015) A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst Appl 42(5):2670–2679CrossRefGoogle Scholar
  11. 11.
    Dessì N, Pes B (2015) Similarity of feature selection methods: An empirical study across data intensive classification tasks. Expert Syst Appl 42(10):4632–4642CrossRefGoogle Scholar
  12. 12.
    Manek AS et al (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web 20(2):135–154CrossRefGoogle Scholar
  13. 13.
    Osanaiye O et al (2016) Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. EURASIP J Wirel Commun Netw 1(2016):130CrossRefGoogle Scholar
  14. 14.
    Bagherzadeh-Khiabani F et al (2016) A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol 71:76–85CrossRefGoogle Scholar
  15. 15.
    Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381CrossRefGoogle Scholar
  16. 16.
    Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47CrossRefGoogle Scholar
  17. 17.
    Wan Y et al (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258CrossRefGoogle Scholar
  18. 18.
    Xi M et al (2016) Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. Comput Math Methods Med 2016Google Scholar
  19. 19.
    Shen L et al (2016) Evolving support vector machines using fruit fly optimization for medical data classification. Knowl-Based Syst 96:61–75CrossRefGoogle Scholar
  20. 20.
    Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34CrossRefGoogle Scholar
  21. 21.
    Apolloni J, Leguizamón G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932CrossRefGoogle Scholar
  22. 22.
    Mafarja MM, Mirjalili S (2017) Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312CrossRefGoogle Scholar
  23. 23.
    Faris H et al (2017) A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture. Neural Comput & Applic:1–15Google Scholar
  24. 24.
    Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saud Univer Comp Inform Sci 29(4):462–472CrossRefGoogle Scholar
  25. 25.
    Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795CrossRefGoogle Scholar
  26. 26.
    Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36CrossRefGoogle Scholar
  27. 27.
    Tharwat A, Hassanien AE, Elnaghi BE (2017) A ba-based algorithm for parameter optimization of support vector machine. Pattern Recogn Lett 93:13–22CrossRefGoogle Scholar
  28. 28.
    Qi C et al (2017) Feature selection and multiple kernel boosting framework based on PSO with mutation mechanism for hyperspectral classification. Neurocomputing 220:181–190CrossRefGoogle Scholar
  29. 29.
    Shrivastava P et al (2017) A survey of nature-inspired algorithms for feature selection to identify Parkinson’s disease. Comput Methods Program Biomed 139:171–179CrossRefGoogle Scholar
  30. 30.
    Srisukkham W et al (2017) Intelligent leukaemia diagnosis with bare-bones PSO based feature optimization. Appl Soft Comput 56:405–419CrossRefGoogle Scholar
  31. 31.
    Wang H, Niu B (2017) A novel bacterial algorithm with randomness control for feature selection in classification. Neurocomputing 228:176–186CrossRefGoogle Scholar
  32. 32.
    Gu S, Cheng R, Jin Y (2018) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22(3):811–822CrossRefGoogle Scholar
  33. 33.
    Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci 25:152–160CrossRefGoogle Scholar
  34. 34.
    Hancer E et al (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479CrossRefGoogle Scholar
  35. 35.
    Mafarja M et al (2018) Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowl-Based Syst 145:25–45CrossRefGoogle Scholar
  36. 36.
    Acharya N, Singh S (2018) An IWD-based feature selection method for intrusion detection system. Soft Comput 22(13):4407–4416CrossRefGoogle Scholar
  37. 37.
    Cheruku R et al (2018) RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl Soft Comput 67:764–780CrossRefGoogle Scholar
  38. 38.
    Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215CrossRefGoogle Scholar
  39. 39.
    Chuang M-T, Hu Y-h, Lo C-L (2018) Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res 25(1):75–90CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • M. Durairaj
    • 1
  • T. S. Poornappriya
    • 1
  1. 1.School of Computer Science, Engineering and ApplicationsBharathidasan UniversityTiruchirappalliIndia

Personalised recommendations