Skip to main content

Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 801))

Abstract

Feature selection is a robust technique for data reduction and an essential step in successful machine learning applications. Different feature selection methods have been introduced in order to select a relevant subset of features. As each dimension reduction method uses a different aspect to select a sub-split of features, it results in different feature subsets for the same data set. So, a hybrid approach receives too much attention since it includes various aspects of feature relevance altogether for feature subset selection. Many methods were proposed in the literature such as union, intersection, and modified-union. The union and the Intersection approaches can lead sometimes to increase the total number of features and lose some important features. Therefore, to take the advantage of one method and lessen the deficiency of the other, an integration approach namely modified union is used. This approach applies union on selected features and applies intersection on remaining features subsets. In this work, we introduce a feature selection method that combines the Genetic Algorithm (GA) and Pearson Correlation Coefficient (PCC). The experimental results prove that the proposed method can be suitable to enhance the performance of feature selection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed.

References

  1. Shroff, K.P., Maheta, H.H.: A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy. In: 2015 International Conference on Computer Communication and Informatics (ICCCI), pp. 16 (2015)

    Google Scholar 

  2. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  3. Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270 (2008)

    Google Scholar 

  4. Rodriguez-Lujan, I., Huerta, R., Elkan, C., Cruz, C.S.: Quadratic programming feature selection. J. Mach. Learn. Res 11(Apr), 1491–1516 (2010)

    Google Scholar 

  5. Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10(Jul) 1341–1366 (2009)

    Google Scholar 

  6. Unler, A., Murat, A.: A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206(3), 528–539 (2010)

    Article  Google Scholar 

  7. Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)

    Article  Google Scholar 

  8. Jeong, Y.S., Shin, K.S., Jeong, M.K.: An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. J. Oper. Res. Soc. 66(4), 529–538 (2015)

    Article  Google Scholar 

  9. Seo, J.H., Lee, Y.H., Kim, Y.H.: Feature selection for very short-term heavy rainfall prediction using evolutionary computation. Adv. Meteorol. (2014)

    Google Scholar 

  10. Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)

    Article  Google Scholar 

  11. Tallon-Ballesteros, A.J., Riquelme, J.C.: Tackling ant colony optimization meta-heuristic as search method in feature subset selection based on correlation or consistency measures. In: International conference on intelligent data engineering and automated learning, pp. 386–393 (2014)

    Google Scholar 

  12. Zhang, C.-K., & Hu, H. (2005). Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1728–1732 (2005)

    Google Scholar 

  13. Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recognit Lett. 31(3), 226233 (2010)

    Article  Google Scholar 

  14. Yan, Z., Yuan, C.: Ant colony optimization for feature selection in face recognition. In: Biometric Authentication, pp. 65–84 (2004)

    Google Scholar 

  15. Unler, A., Murat, A., Chinnam, R.B.: mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)

    Google Scholar 

  16. Zhang, Y., Gong, D., Hu, Y., Zhang, W.: Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148, 150157 (2015)

    Google Scholar 

  17. Lin, S.-W., Chen, S.-C.: PSOLDA: A particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl. Soft Comput. 9(3), 10081015 (2009)

    Article  Google Scholar 

  18. Vieira, S.M., Mendonca, L.F., Farinha, G.J., Sousa, J.M.: Metaheuristics for feature selection: application to sepsis outcome prediction. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 18 (2012)

    Google Scholar 

  19. Mohemmed, A.W., Zhang, M., Johnston, M.: Particle swarm optimization based adaboost for face detection. In: 2009 IEEE Congress on Evolutionary computation (CEC’09), pp. 2494–2501 (2009)

    Google Scholar 

  20. Al-Sahaf, H., Zhang, M., Johnston, M.: Genetic programming for multiclass texture classification using a small number of instances. In: Seal, pp. 335–346

    Google Scholar 

  21. Hunt, R., Neshatian, K., Zhang, M.: A genetic programming approach to hyper-heuristic feature selection. In: Asia-Pacific Conference on Simulated Evolution and Learning, pp. 320–330 (2012)

    Google Scholar 

  22. Neshatian, K., Zhang, M.: Improving relevance measures using genetic programming. In: European Conference on Genetic Programming, pp. 97–108 (2012)

    Google Scholar 

  23. Seo, J.-H., Lee, Y. H., Kim, Y.-H.: Feature selection for very shortterm heavy rainfall prediction using evolutionary computation. Adv. Meteorol. (2014)

    Google Scholar 

  24. Jeong, Y.-S., Shin, K.S., Jeong, M.K.: An evolutionary algorithm with the partial sequential forward floating search mutation for large-scale feature selection problems. J. Oper. Res. Soc. 66(4), 529–538 (2015)

    Article  Google Scholar 

  25. Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)

    Article  Google Scholar 

  26. Xia, H., Zhuang, J., Yu, D.: Multi-objective unsupervised feature selection algorithm utilizing redundancy measure and negative epsilondominance for fault diagnosis. Neurocomputing 146, 113–124 (2014)

    Article  Google Scholar 

  27. Spolaôr, N., Lorena, A. C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: International Conference on Evolutionary Multi-criterion Optimization, pp. 462–476 (2011)

    Google Scholar 

  28. Banerjee, M., Mitra, S., Banka, H.: Evolutionary rough feature selection in gene expression data. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 37(4), 622–632 (2007)

    Google Scholar 

  29. Chakraborty, B.: Genetic algorithm with fuzzy fitness function for feature selection. In: IEEE International Symposium on Industrial Electronics (ISIE’02), vol. 1, pp. 315–319 (2002)

    Google Scholar 

  30. Holland, J.H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press (1992)

    Google Scholar 

  31. Maulik, U., Bandyopadhyay, S., Mukhopadhyay, A.: Genetic algorithms and multiobjective optimization. In: Multiobjective genetic algorithms for clustering, pp. 25–50. Springer (2011)

    Google Scholar 

  32. Ahn, C.W., Ramakrishna, R.S.: A genetic algorithm for shortest path routing problem and the sizing of populations. IEEE Trans. Evol. Comput. 6(6), 566–579 (2002)

    Article  Google Scholar 

  33. Rahnamayan, S., Tizhoosh, H.R., Salama, M.M.: A novel population initialization method for accelerating evolutionary algorithms. Comput. Math. Appl. 53(10), 1605–1614 (2007)

    Article  MathSciNet  Google Scholar 

  34. Waad, B., Brahim, A.B., Limam, M.: Feature selection by rank aggregation and genetic algorithms. In: KDIR/KMIS, pp. 74–81 (2013)

    Google Scholar 

  35. Di Geronimo, L., Ferrucci, F., Murolo, A., Sarro, F.: A parallel genetic algorithm based on hadoop mapreduce for the automatic generation of junit test suites. In: 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST), pp. 785–793 (2012)

    Google Scholar 

  36. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)

    Article  Google Scholar 

  37. Miller, B.L., Goldberg, D.E.: Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 9(3), 193–212 (1995)

    MathSciNet  Google Scholar 

  38. Alzubaidi, A., Cosma, G., Brown, D., Pockley, A.G.: Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information. In: 2016 International Conference on InteractiVe Technologies and Games (ITAG) pp. 70–76 (2016)

    Google Scholar 

  39. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognit. Lett. 10(5), 335347 (1989)

    Article  Google Scholar 

  40. Chaikla, N., Qi, Y.: Genetic algorithms in feature selection. In 1999 IEEE International Conference on Systems, Man, and Cybernetics, SMC99, vol. 5, pp. 538–540 (1999)

    Google Scholar 

  41. Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42(6), 3105–3114 (2015)

    Article  Google Scholar 

  42. Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing (2017)

    Google Scholar 

  43. Chhikara, R.R., Sharma, P., Singh, L.: A hybrid feature selection approach based on improved PSO and filter approaches for image steganalysis. Int. J. Mach. Learn. Cybern. 7(6), 1195–1206 (2016)

    Article  Google Scholar 

  44. Inbarani, H.H., Bagyamathi, M., Azar, A.T.: A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput. Appl. 26(8), 1859–1880 (2015)

    Article  Google Scholar 

  45. Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)

    Article  Google Scholar 

  46. Shreem, S.S., Abdullah, S., Nazri, M.Z.A.: Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm. Int. J. Syst. Sci. 47(6), 1312–1329 (2016)

    Article  Google Scholar 

  47. Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)

    Article  Google Scholar 

  48. Chinnaswamy, A., Srinivasan, R.: Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Innovations in Bio-Inspired Computing and Applications, pp. 229–239. Springer International Publishing (2016)

    Google Scholar 

  49. Tsai, C.F., Hsiao, Y.C.: Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis. Support Syst. 50(1), 258–269 (2010)

    Article  Google Scholar 

  50. Meng, J., Lin, H., Yu, Y.: A two stage feature selection method for text categorization. Comput. Math. Appl. 62(7), 2793–2800 (2011)

    Article  Google Scholar 

  51. Biau, G., Cerou, F., Guyader, A.: On the rate of convergence of the bagged nearest neighbor estimate. J. Mach. Learn. Res. 11(Feb), 687–712 (2010)

    Google Scholar 

  52. Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. Mach. Learn. Knowl. Discov. databases 313–325 (2008)

    Google Scholar 

  53. Adler, J., Parmryd, I.: Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytometry Part A 77(8), 733–742 (2010)

    Article  Google Scholar 

  54. Aziz, A.S., Azar, A.T., Salama, M.A., Hanafy, S.E.: Genetic algorithm with different feature selection techniques for anomaly detectors generation. In: IEEE Federated Conference on Computer Science and Information Systems, pp. 769–774. Poland, 8–11 Sept 2013

    Google Scholar 

  55. Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary ant lion approaches for feature selection. Neurocomputing 213, 54–65 (2016)

    Google Scholar 

  56. Test, A.B.C.: On a Test. J. Test. 88, 100–120 (2000)

    Google Scholar 

  57. Grätzer, G.: Math into LaTeX, 3rd edn, Birkhäuser (2000)

    Google Scholar 

  58. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognit. 33(9), 1455–1465 (2000)

    Article  Google Scholar 

  59. Melanie, M.: An introduction to genetic algorithms. Camb. Mass. Lond. Engl. Fifth Print. 3, 62–75 (1999)

    Google Scholar 

  60. Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)

    Article  Google Scholar 

  61. Zhu, Z., Ong, Y.-S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit. 40(11), 3236–3248 (2007b)

    Article  Google Scholar 

  62. Chen, L.-H., Hsiao, H.-D.: Feature selection to diagnose a business crisis by using a real GA-based support vector machine: an empirical study. Expert Syst. Appl. 35(3), 11451155 (2008)

    Google Scholar 

  63. Bidi, N., Elberrichi, Z.: Feature selection for text classification using genetic algorithms. In: 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 806–810 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rania Saidi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Saidi, R., Bouaguel, W., Essoussi, N. (2019). Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient. In: Hassanien, A. (eds) Machine Learning Paradigms: Theory and Application. Studies in Computational Intelligence, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-030-02357-7_1

Download citation

Publish with us

Policies and ethics