Skip to main content

Advertisement

Log in

Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ratan ZA et al (2018) CRISPR-Cas9: a promising genetic engineering approach in cancer research. Ther Adv Med Oncol 10:1758834018755089

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Hartshorn CM et al (2018) Nanotechnology strategies to advance outcomes in clinical cancer care. ACS Nano 12(1):24–43

    Article  CAS  PubMed  Google Scholar 

  3. Halder A, Kumar A (2019) Active learning using rough fuzzy classifier for cancer prediction from microarray gene expression data. J Biomed Inform 92:103136

    Article  PubMed  Google Scholar 

  4. Rana HK et al (2020) Machine learning and bioinformatics models to identify pathways that mediate influences of welding fumes on cancer progression. Sci Rep 10(1):1–15

    Article  CAS  Google Scholar 

  5. Shilo S, Rossman H, Segal E (2020) Axes of a revolution: challenges and promises of big data in healthcare. Nat Med 26(1):29–38

    Article  CAS  PubMed  Google Scholar 

  6. Cammarota G et al (2020) Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat Rev Gastroenterol Hepatol 17(10):635–648

    Article  PubMed  Google Scholar 

  7. Qaraad M et al (2021) A hybrid feature selection optimization model for high dimension data classification. IEEE Access 9:42884–42895

    Article  Google Scholar 

  8. Gumaei A et al (2021) Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Inform J 27(1):1460458221989402

    Article  Google Scholar 

  9. Lee J, Choi IY, Jun C-H (2021) An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data. Expert Syst Appl 166:113971

    Article  Google Scholar 

  10. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  11. Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl-Based Syst 126:8–19

    Article  Google Scholar 

  12. Aziz R, Verma C, Srivastava N (2016) A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics Data 8:4–15

    Article  PubMed  PubMed Central  Google Scholar 

  13. Motwani A, Shukla PK, Pawar M (2021) Novel framework based on deep learning and cloud analytics for smart patient monitoring and recommendation (SPMR). J Ambient Intell Humaniz Comput 1:1–16

  14. Lalwani P, Mishra MK, Chadha JS, Sethi P (2021) (pp 608-619). system: a machine learning approach. Computing 104(2):1–24

  15. Aziz R, Verma CK, Srivastava N (2017) Dimension reduction methods for microarray data: a review. AIMS Bioeng 4(2):179–197

    Article  Google Scholar 

  16. Valdez F, Castillo O, Peraza C (2020) Fuzzy logic in dynamic parameter adaptation of harmony search optimization for benchmark functions and fuzzy controllers. Int J Fuzzy Syst 22:1198–1211

    Article  Google Scholar 

  17. Olivas F et al (2019) Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm. Inf Sci 476:159–175

    Article  Google Scholar 

  18. Sanchez D, Melin P, Castillo O (2020) Comparison of particle swarm optimization variants with fuzzy dynamic parameter adaptation for modular granular neural networks for human recognition. J Intell Fuzzy Syst 38(3):3229–3252

    Article  Google Scholar 

  19. Castillo O et al (2019) Comparative study in fuzzy controller optimization using bee colony, differential evolution, and harmony search algorithms. Algorithms 12(1):9

    Article  Google Scholar 

  20. Lodh A, Saxena U, khan A, Motwani A, Shakkeera L, Sharmasth VY (2020) Prototype for integration of face mask detection and person identification model–COVID-19. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology, IEEE

  21. Castillo O, Melin P (2020) Forecasting of COVID-19 time series for countries in the world based on a hybrid approach combining the fractal dimension and fuzzy logic. Chaos, Solitons Fractals 140:110242

    Article  Google Scholar 

  22. Sanchez MA, Castillo O, Castro JR (2015) Information granule formation via the concept of uncertainty-based information with interval type-2 fuzzy sets representation and Takagi–Sugeno–Kang consequents optimized with Cuckoo search. Appl Soft Comput 27:602–609

    Article  Google Scholar 

  23. Khan ZA et al (2019) Hybrid meta-heuristic optimization based home energy management system in smart grid. J Ambient Intell Humaniz Comput 10(12):4837–4853

    Article  Google Scholar 

  24. Singh RK, Sivabalakrishnan M (2015) Feature selection of gene expression data for cancer classification: a review. Procedia Comput Sci 50:52–57

    Article  Google Scholar 

  25. Mafarja M et al (2020) Efficient hybrid nature-inspired binary optimizers for feature selection. Cogn Comput 12(1):150–175

    Article  Google Scholar 

  26. Venkatesh B, Anuradha J (2019) A review of feature selection and its methods. Cybern Inform Technol 19(1):3–26

    Google Scholar 

  27. Sowmiya C, Sumitra P (2020) A hybrid approach for mortality prediction for heart patients using ACO-HKNN. J Ambient Intell Humaniz Comput 5(2021):1–8

  28. Peng W et al (2020) Interval type-2 fuzzy logic based transmission power allocation strategy for lifetime maximization of WSNs. Eng Appl Artif Intell 87:103269

    Article  Google Scholar 

  29. Ochoa P, Castillo O, Soria J (2020) Optimization of fuzzy controller design using a differential evolution algorithm with dynamic parameter adaptation based on type-1 and interval type-2 fuzzy systems. Soft Comput 24(1):193–214

    Article  Google Scholar 

  30. Semwal VB, Gaud N, Lalwani P, Bijalwan V, Alok Ak (2021) Pattern identification of different human joints for different human walking styles using inertial measurement unit (IMU) sensor. Artif Intell Rev 55(2):1–21

  31. Castillo O, Hidalgo D, Cervantes L, Melin P,  Soto RM (2020) Fuzzy parameter adaptation in genetic algorithms for the optimization of fuzzy integrators in modular neural networks for multimodal biometry. Comput Sistemas 24(3):1093–105.

  32. Tarek S, Abd Elwahab R, Shoman M (2017) Gene expression based cancer classification. Egypt Inform J 18(3):151–159

    Article  Google Scholar 

  33. Gao L, Ye M, Wu C (2017) Cancer classification based on support vector machine optimized by particle swarm optimization and artificial bee colony. Molecules 22(12):2086

    Article  PubMed Central  CAS  Google Scholar 

  34. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  35. Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

    Article  CAS  PubMed  Google Scholar 

  36. Mahapatra B, Nayyar A (2019) Swarm intelligence and evolutionary algorithms for cancer diagnosis. In: Swarm Intelligence and Evolutionary Algorithms in Healthcare and Drug Development, vol 19

    Google Scholar 

  37. Sampathkumar A et al (2020) An efficient hybrid methodology for detection of cancer-causing gene using CSC for micro array data. J Ambient Intell Humaniz Comput 11(11):4743–4751

    Article  Google Scholar 

  38. Gu S, Cheng R, Jin Y (2018) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22(3):811–822

    Article  Google Scholar 

  39. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4-5):411–430

    Article  PubMed  Google Scholar 

  40. Musheer RA, Verma CK, Srivastava N (2019) Novel machine learning approach for classification of high-dimensional microarray data. Soft Comput 23(24):13409–13421

    Article  Google Scholar 

  41. Kong W et al (2008) A review of independent component analysis application to microarray gene expression data. Biotechniques 45(5):501–520

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Arowolo MO et al (2020) A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access 8:182422–182430

    Article  Google Scholar 

  43. Fan L, Poh K-L, Zhou P (2009) A sequential feature extraction approach for naïve bayes classification of microarray data. Expert Syst Appl 36(6):9919–9923

    Article  Google Scholar 

  44. Mollaee M, Moattar MH (2016) A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybernetics Biomed Eng 36(3):521–529

    Article  Google Scholar 

  45. Mahdavi K, Labarta J, Gimenez J (2019) Unsupervised feature selection for noisy data. In International Conference on Advanced Data Mining and Applications (pp. 79-94). Springer, Cham.

  46. Aziz R et al (2017) Artificial neural network classification of microarray data using new hybrid gene selection method. Int J Data Min Bioinform 17(1):42–65

    Article  Google Scholar 

  47. Aziz R, Verma CK, Srivastava N (2017) A novel approach for dimension reduction of microarray. Comput Biol Chem 71:161–169

    Article  CAS  PubMed  Google Scholar 

  48. Aziz R, Srivastava N, Verma CK (2015) T-independent component analysis for svm classification of dna-microarray data. Int J Bioinform Res, 3(2015):0975–3087

  49. Pandey AC, Rajpoot DS, Saraswat M (2020) Feature selection method based on hybrid data transformation and binary binomial cuckoo search. J Ambient Intell Humaniz Comput 11(2):719–738

    Article  Google Scholar 

  50. Cui Z et al (2019) A hybrid many-objective cuckoo search algorithm. Soft Comput 23(21):10681–10697

    Article  Google Scholar 

  51. Peng H et al (2021) Multi-strategy serial cuckoo search algorithm for global optimization. Knowl-Based Syst 214:106729

    Article  Google Scholar 

  52. Pandey AC, Rajpoot DS (2019) Spam review detection using spiral cuckoo search clustering method. Evol Intel 12(2):147–164

    Article  Google Scholar 

  53. Cristin R, Kumar BS, Priya C, Karthick K (2020) Deep neural network based rider-cuckoo search algorithm for plant disease detection. Artif Intell Rev 53(7):1–26

  54. Song P-C, Pan J-S, Chu S-C (2020) A parallel compact cuckoo search algorithm for three-dimensional path planning. Appl Soft Comput 94:106443

    Article  Google Scholar 

  55. Zhang Z, Ding S, Jia W (2019) A hybrid optimization algorithm based on cuckoo search and differential evolution for solving constrained engineering problems. Eng Appl Artif Intell 85:254–268

    Article  Google Scholar 

  56. Coleto-Alcudia V, Vega-Rodríguez MA (2020) Artificial bee colony algorithm based on dominance (ABCD) for a hybrid gene selection method. Knowl-Based Syst 205:106323

    Article  Google Scholar 

  57. Wang X-h et al (2020) Multi-objective feature selection based on artificial bee colony: an acceleration approach with variable sample size. Appl Soft Comput 88:106041

    Article  Google Scholar 

  58. Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput 38:548–560

    Article  Google Scholar 

  59. Hsu C-C, Chen M-C, Chen L-S (2010) Integrating independent component analysis and support vector machine for multivariate process monitoring. Comput Ind Eng 59(1):145–156

    Article  Google Scholar 

  60. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Citeseer

    Google Scholar 

  61. Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling, (Article ID 604910), Biomed Res Int, volume (2015):1-16,

  62. Abdel-Basset M, Hessin A-N, Abdel-Fatah L (2018) A comprehensive study of cuckoo-inspired algorithms. Neural Comput & Applic 29(2):345–361

    Article  Google Scholar 

  63. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2):131–163

    Article  Google Scholar 

  64. Hall M (2006) A decision tree-based attribute weighting filter for naive Bayes. In International conference on innovative techniques and applications of artificial intelligence (pp. 59-70). Springer, London.

  65. Chen J et al (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435

    Article  Google Scholar 

  66. Sandberg R et al (2001) Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res 11(8):1404–1409

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Fan L, Poh K-L, Zhou P (2010) Partition-conditional ICA for Bayesian classification of microarray data. Expert Syst Appl 37(12):8188–8192

    Article  Google Scholar 

  68. De Campos LM, Cano A, Castellano JG, Moral S (2011) Bayesian networks classifiers for gene-expression data. In 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 1200-1206. IEEE

  69. Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  CAS  PubMed  Google Scholar 

  71. Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209

    Article  CAS  PubMed  Google Scholar 

  72. Nutt CL et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607

    CAS  PubMed  Google Scholar 

  73. Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967

    CAS  PubMed  Google Scholar 

  74. Armstrong SA et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47

    Article  CAS  PubMed  Google Scholar 

  75. Aziz R, Verma C, Srivastava N (2015) A weighted-SNR feature selection from independent component subspace for nb classification of microarray data. Int J Adv Biotechnol Res 6:245–255

    Google Scholar 

  76. Xi M et al (2016) Cancer feature selection and classification using a binary quantum-behaved particle swarm optimization and support vector machine. Comput Math Methods Med 2016

  77. Akay B, Karaboga D, (2009) Parameter tuning for the artificial bee colony algorithm. In International conference on computational collective intelligence. Springer, Berlin, Heidelberg  pp 608–619

  78. Varghese MP, Amudha A (2018) Artificial Bee Colony and Cuckoo Search Algorithm for Cost Estimation with Wind Power Energy. Int J Simul Syst Sci Technol 19(6). https://doi.org/10.5013/IJSSST.a.19.06.18

  79. Raczko E, Zagajewski B (2017) Comparison of support vector machine, random forest and neural network classifiers for tree species classification on airborne hyperspectral APEX images. Eur J Remote Sens 50(1):144–154

    Article  Google Scholar 

  80. Huang M-W et al (2017) SVM and SVM ensembles in breast cancer prediction. PLoS One 12(1):e0161501

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Nahar J, Ali S, Chen Y-PP (2007) Microarray data classification using automatic SVM kernel selection. DNA Cell Biol 26(10):707–712

    Article  CAS  PubMed  Google Scholar 

  82. Aziz R, Verma CK, Srivastava N (2018) Artificial neural network classification of high dimensional data with novel optimization approach of dimension reduction. Ann Data Sci 5(4):615–635

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rabia Musheer Aziz.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aziz, R.M. Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 60, 1627–1646 (2022). https://doi.org/10.1007/s11517-022-02555-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-022-02555-7

Keyword

Navigation