Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Feature selection by recursive binary gravitational search algorithm optimization for cancer classification

  • 44 Accesses

Abstract

DNA microarray technology has become a prospective tool for cancer classification. However, DNA microarray datasets typically have very large number of genes (usually more than tens of thousands) and less number of samples (often less than one hundred). This raises the issue of getting the most relevant genes prior to cancer classification. In this paper, we have proposed a two-phase feature selection method for cancer classification. This method selects a low-dimensional set of genes to classify biological samples of binary and multi-class cancers by integrating ReliefF with recursive binary gravitational search algorithm (RBGSA). The proposed RBGSA refines the gene space from a very coarse level to a fine-grained one at each recursive step of the algorithm without degrading the accuracy. We evaluate our method by comparing it with state-of-the-art methods on 11 benchmark microarray datasets of different cancer types. Comparison results show that our method selects only a small number of genes while yielding substantial improvements in accuracy over other methods. In particular, it achieved up to 100% classification accuracy for 7 out of 11 datasets with a very small size of gene subset (up to < 1.5%) for all 11 datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853

  2. Bababdani BM, Mousavi M (2013) Gravitational search algorithm: a new feature selection method for QSAR study of anticancer potency of imidazo[4,5-b]pyridine derivatives. Chemom Intell Lab Syst 122(5):1–11

  3. Bala R, Agrawal RK (2012) clustering in conjunction with wrapper approach to select discriminatory genes for microarray dataset classification. Comput Inform 31(5):921–938

  4. Baranovsky A, Daems D (1995) Design of one-dimensional chaotic maps with prescribed statistical properties. Int J Bifurc Chaos 5(06):1585–1598

  5. Brazma A, Vilo J (2000) Gene expression data analysis. FEBS Lett 480(1):823

  6. Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: ECAI, vol 90, pp 147–149

  7. Chen X (2003) Gene selection for cancer classification using bootstrapped genetic algorithms and support vector machines. In: Bioinformatics conference. IEEE Computer Society, p 504

  8. Chen KH, Wang KJ, Wang KM et al (2014) Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 24(C):773–780

  9. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

  10. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the bioinformatics conference, 2003. CSB 2003. IEEE, pp 523–528

  11. Dwivedi AK (2018) Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput Appl 29(12):1545–1554

  12. Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004

  13. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

  14. Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422

  15. Hall M (1998) Correlation-based feature selection for machine learning. PhD thesis, Waikato University, p 19

  16. Han XH, Chang XM, Quan L et al (2014) Feature subset selection by gravitational search algorithm optimization. Inf Sci 281:128–146

  17. Hong JH, Cho SB (2008a) A probabilistic multi-class strategy of one-vs.-rest support vector machines for cancer classification. Neurocomputing 71(16–18):3275–3281

  18. Hong JH, Cho SB (2008b) Ensemble neural networks with novel gene-subsets for multiclass cancer classification. In: Neural information processing

  19. Huerta EB, Duval B, Hao JK (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Workshops on applications of evolutionary computation. Springer, Berlin, Heidelberg, pp 34–44

  20. Hwang KB, Cho DY, Park SW et al (2008) Applying machine learning techniques to analysis of gene expression data: cancer diagnosis. In: Liu SM, Johnson KF (eds) Methods of microarray data analysis. Springer, Boston, pp 167–182

  21. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

  22. James G, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer, New York

  23. Kira K, Rendell LA (1992) A practical approach to feature selection. In: International workshop on machine learning. Morgan Kaufmann Publishers Inc., pp 249–256

  24. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc., pp 1137–1143

  25. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

  26. Kong X, Zheng C, Wu Y et al (2008) Improving tumor clustering based on gene selection. In: Advanced intelligent computing theories and applications with aspects of theoretical and methodological issues, international conference on intelligent computing, ICIC 2008, Shanghai, China, Sept 15–18, 2008, Proceedings, pp 39–46

  27. Kononenko I (1994) Estimating attributes: analysis and extension of relief. In: Proceedings of the seventh European conference in machine learning. Springer, pp 171–182

  28. Kumar PG, Victoire TAA, Renukadevi P et al (2012) Design of fuzzy expert system for microarray data classification using a novel genetic swarm algorithm. Expert Syst Appl 39(2):1811–1821

  29. Labani M, Moradi P, Ahmadizar F et al (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

  30. Lee ZJ (2008) An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer. Artif Intell Med 42(1):81

  31. Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213

  32. Lee CP, Lin WS, Chen YM et al (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/K-nearest neighbor method. Expert Syst Appl 38(5):4661–4667

  33. Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform 7(1):108–117

  34. Li W, Yang Y (2002) How many genes are needed for a discriminant microarray data analysis. In: Methods of microarray data analysis. Springer, Boston, MA, pp 137–149

  35. Li F, Yang Y (2005) Analysis of recursive gene selection approaches from microarray data. Bioinformatics 21(19):3741–3747

  36. Li YJ, Zhang L, Speer MC et al (2002a) Evaluation of current methods of testing differential gene expression and beyond. In: Methods of microarray data analysis II. Springer, Boston, MA, pp 185–194

  37. Li L, Pedersen LG, Darden TA et al (2002b) Computational analysis of leukemia microarray expression data using the GA/KNN method. In: Methods of microarray data analysis. Springer, Boston, MA, pp 81–95

  38. Li J, Duan Y, Xiaogang R (2008) A novel hybrid approach to selecting marker genes for cancer classification using gene expression data. In: The international conference on bioinformatics and biomedical engineering. IEEE, pp 264–267

  39. Liu S, Xu C, Zhang Y et al (2018) Feature selection of gene expression data for cancer classification using double RBF-kernels. BMC Bioinform 19(1):396

  40. Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):31–37

  41. Nagpal S, Arora S, Dey S et al (2017) Feature selection using gravitational search algorithm for biomedical data. Procedia Comput Sci 115:258–265

  42. Nemati S, Basiri ME, Ghasem-Aghaee N et al (2009) A novel ACO–GA hybrid algorithm for feature selection in protein function prediction. Expert Syst Appl 36(10):12086–12094

  43. Okun O, Priisalu H (2009) Dataset complexity in gene expression based cancer classification using ensembles of K-nearest neighbors. Artif Intell Med 45(2–3):151

  44. Perou CM, Jeffrey SS, Van De Rijn M et al (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 96(16):9212–9217

  45. Prasad Y, Biswas KK, Hanmandlu M (2018) A recursive PSO scheme for gene selection in microarray data. Appl Soft Comput 71:213–225

  46. Purnami SW, Rahayu SP, Embong A (2008) Feature selection and classification of breast cancer diagnosis based on support vector machines. In: International Symposium on Information Technology, IEEE

  47. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

  48. Rashedi E, Nezamabadi-Pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221

  49. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248

  50. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) BGSA: binary gravitational search algorithm. Nat Comput 9(3):727–745

  51. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1–2):23–69

  52. Sherlock G (2000) Analysis of large-scale gene expression data. Curr Opin Immunol 12(2):201–205

  53. Tibshirani R, Hastie T, Narasimhan B et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99(10):6567–6572

  54. Ting FF, Tan YJ, Sim KS (2019) Convolutional neural network improvement for breast cancer classification. Expert Syst Appl 120:103–115

  55. Tirumala SS, Narayanan A (2016) Attribute selection and classification of prostate cancer gene expression data using artificial neural networks. In: Pacific-Asia conference on knowledge discovery and data mining. vol 9794. Springer International Publishing, Cham, pp 26–34

  56. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

  57. Wang Y, Makedon F (2004) Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data. In: Computational systems bioinformatics conference, 2004. CSB 2004. Proceedings. IEEE, pp 497–498

  58. Wang J, Liu JX, Zheng CH et al (2017) A mixed-norm Laplacian regularized low-rank representation method for tumor samples clustering. IEEE/ACM Trans Comput Biol Bioinform 99:1–3

  59. Xiang J, Han XH, Duan F et al (2015) A novel hybrid system for feature selection based on an improved gravitational search algorithm and K-NN method. Appl Soft Comput 31(C):293–307

  60. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Eighteenth international conference on machine learning, pp 601–608

  61. Xiong M, Fang X, Zhao J (2001) Biomarker identification by feature wrappers. Genome Res 11(11):1878–1887

  62. Xu R, Anagnostopoulos GC, Wunsch DC (2007) Multiclass cancer classification using semisupervised ellipsoid ARTMAP and particle swarm optimization with gene expression data. IEEE/ACM Trans Comput Biol Bioinform 4(1):65–77

  63. Yang CS, Chuang LY, Ho CH et al (2008) Microarray data feature selection using hybrid GA-IBPSO. In: Trends in intelligent systems and computer engineering. Springer, Boston, MA, pp 243–253

  64. Yoo SH, Cho SB (2004) Optimal gene selection for cancer classification with partial correlation and K-nearest neighbor classifier. In: Pacific Rim international conference on artificial intelligence. Springer, Berlin, pp 713–722

  65. Yu Z, You J, Chen H et al (2012) Tumor clustering based on hybrid cluster ensemble framework. IEICE Trans Commun 88(2):575–584

  66. Zhang H, Wang H, Dai Z et al (2012) Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform 13(1):1–20

  67. Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40(11):3236–3248

Download references

Funding

This study was funded by Shanxi Natural Science Foundation (201801D121136), National Natural Science Foundation of China (61772358) and International Cooperation Project of Shanxi Province of China (201603D421014).

Author information

Correspondence to Xiaohong Han.

Ethics declarations

Conflict of interest

All the authors in this study declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Han, X., Li, D., Liu, P. et al. Feature selection by recursive binary gravitational search algorithm optimization for cancer classification. Soft Comput 24, 4407–4425 (2020). https://doi.org/10.1007/s00500-019-04203-z

Download citation

Keywords

  • Gene selection
  • Cancer classification
  • Gravitational search algorithm