Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization

Abstract

High dimensionality and sample imbalance of gene expression data promote the development of effective algorithms for classifying gene expression data. To improve the ability to distinguish different subtypes of gene expression data, we devise a hypervolume-based discrete evolutionary optimization algorithm (HYBDEOA) in this paper. Four objectives, namely the number of genes, the accuracy, the relevance, and the redundancy, are optimized simultaneously to guide the evolution. Firstly, binary encoding is used to choose some features, projecting data onto different subspaces. After that, a discrete neighborhood operation is conducted to generate a new binary-mapped population. Combining the new population with the current population, we employ the hypervolume-based mechanism to select the Pareto solutions. Finally, a discrete mutation method is proposed to find promising solutions in the binary search space. To demonstrate the performance of HYBDEOA, we apply HYBDEOA to 55 synthetic datasets and 35 cancer gene expression datasets. Extensive experiments are also conducted to reveal the effectiveness and efficiency of HYBDEOA. The experimental results demonstrate that our proposed method is a parameter-less and robust algorithm, which can group gene expression data with a finer and more informative classification.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. 1.

    Heller MJ (2002) Dna microarray technology: devices, systems, and applications. Ann Rev Biomed Eng 4(1):129–153

    Article  Google Scholar 

  2. 2.

    Dağlıyan O, Üney-Yüksektepe F, Kavaklı IH, Türkay M (2011) Optimization based tumor classification from microarray gene expression data. PLoS One 6(2):e14579

    Article  Google Scholar 

  3. 3.

    Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1):39–50

    Article  Google Scholar 

  4. 4.

    Marisa L, de Reyniès A, Duval A, Selves J, Gaub MP, Vescovo L, Etienne-Grimaldi M-C, Schiappa R, Guenot D, Ayadi M et al (2013) Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med 10(5):e1001453

    Article  Google Scholar 

  5. 5.

    Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

    Article  Google Scholar 

  6. 6.

    Huijuan L, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62

    Article  Google Scholar 

  7. 7.

    Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004

    MathSciNet  MATH  Article  Google Scholar 

  8. 8.

    Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Article  Google Scholar 

  9. 9.

    Mukhopadhyay A, Mandal M (2014) Identifying non-redundant gene markers from microarray data: a multiobjective variable length PSO-based approach. IEEE/ACM Trans Comput Biol Bioinform TCBB 11(6):1170–1183

    Article  Google Scholar 

  10. 10.

    Annavarapu CSR, Dara S, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. EXCLI J 15:460

    Google Scholar 

  11. 11.

    Mohamad MS, Omatu S, Deris S, Misman MF, Yoshioka M (2009) A multi-objective strategy in genetic algorithms for gene selection of gene expression data. Artif Life Robot 13(2):410–413

    Article  Google Scholar 

  12. 12.

    Chakraborty G, Chakraborty B (2013) Multi-objective optimization using pareto ga for gene-selection from microarray data for disease classification. In: 2013 IEEE international conference on systems, man, and cybernetics. IEEE, pp 2629–2634

  13. 13.

    Lv J, Peng Q, Chen X, Sun Z (2016) A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl 59:13–19

    Article  Google Scholar 

  14. 14.

    Wang Y, Liu B, Ma Z, Wong K-C, Li X (2019) Nature-inspired multiobjective cancer subtype diagnosis. IEEE J Transl Eng Health Med 7:1–12

    Article  Google Scholar 

  15. 15.

    Reza Bonyadi Mohammad, Zbigniew Michalewicz, Boukhelifa N, Bezerianos A, Cancino W, Lutton E, Mehrdad Amirghasemi, Reza Zamani, Dymond Antoine S, Schalk Kok et al (2014) Particle swarm optimization for single objective continuous space problems: a review. Evolut Comput 1530:9304

    Google Scholar 

  16. 16.

    Lambora A, Gupta K, Chopra K (2019) Genetic algorithm-a literature review. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 380–384

  17. 17.

    Binitha S, Sathya SS et al (2012) A survey of bio inspired optimization algorithms. Int J Soft Comput Eng 2(2):137–151

    Google Scholar 

  18. 18.

    Brazma A, Vilo J (2000) Gene expression data analysis. FEBS Lett 480(1):17–24

    Article  Google Scholar 

  19. 19.

    Li X, Zhang J, Yin M (2014) Animal migration optimization: an optimization algorithm inspired by animal migration behavior. Neural Comput Appl 24(7–8):1867–1877

    Article  Google Scholar 

  20. 20.

    Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276

    Article  Google Scholar 

  21. 21.

    Karakaya G, Galelli S, Ahipasaoglu SD, Taormina R (2016) Identifying (quasi) equally informative subsets in feature selection problems for classification: a max-relevance min-redundancy approach. IEEE Trans Cybern 46(6):1424–1437

    Article  Google Scholar 

  22. 22.

    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  23. 23.

    Deng J, Zhang Q (2019) Approximating hypervolume and hypervolume contributions using polar coordinate. IEEE Trans Evolut Comput 23:913–918

    Article  Google Scholar 

  24. 24.

    Brockhoff D, Zitzler E (2007) Improving hypervolume-based multiobjective evolutionary algorithms by using objective reduction methods. In: 2007 IEEE congress on evolutionary computation. IEEE, pp 2086–2093

  25. 25.

    Bader J, Zitzler E (2011) Hype: an algorithm for fast hypervolume-based many-objective optimization. Evolut Comput 19(1):45–76

    Article  Google Scholar 

  26. 26.

    Das S, Suganthan PN (2010) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evolut Comput 15(1):4–31

    Article  Google Scholar 

  27. 27.

    Chang HY, Nuyten DSA, Sneddon JB, Hastie T, Tibshirani R, Sørlie T, Dai H, He YD, van’t Veer LJ, Bartelink H et al (2005) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci 102(10):3738–3743

    Article  Google Scholar 

  28. 28.

    Liu H, Zhao R, Fang H, Cheng F, Yun F, Liu Y-Y (2017) Entropy-based consensus clustering for patient stratification. Bioinformatics 33(17):2691–2698

    Article  Google Scholar 

  29. 29.

    Li X, Zhang S, Wong K-C (2018) Single-cell rna-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics 10:e1056

    Google Scholar 

  30. 30.

    Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  31. 31.

    Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279

    Article  Google Scholar 

  33. 33.

    Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549

    Article  Google Scholar 

  34. 34.

    Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6(2):182–197

    Article  Google Scholar 

  35. 35.

    Laumanns M (2002) SPEA2: improving the strength pareto evolutionary algorithm. Technical report gloriastrasse

  36. 36.

    Deb K, Jain H (2014) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans Evolut Comput 18(4):577–601

    Article  Google Scholar 

  37. 37.

    Denœux T (2008) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813

    Article  Google Scholar 

  38. 38.

    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  39. 39.

    Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626

    Article  Google Scholar 

  40. 40.

    Moustakidis S, Mallinis G, Koutsias N, Theocharis JB, Petridis V (2011) SVM-based fuzzy decision trees for classification of high spatial resolution remote sensing images. IEEE Trans Geosci Remote Sens 50(1):149–169

    Article  Google Scholar 

  41. 41.

    Cheeseman PC, Self M, Kelly J, Taylor W, Freeman D, Stutz JC (1988) Bayesian classification. AAAI 88:607–611

    Google Scholar 

  42. 42.

    Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class adaboost. Stat Interface 2(3):349–360

    MathSciNet  MATH  Article  Google Scholar 

  43. 43.

    Lande R, Barrowdough G (1987) Effective population size, genetic variation, and their use in population. In: Soule M (ed) Viable populations for conservation. Cambridge University Press, Cambridge, p 87

    Google Scholar 

  44. 44.

    Alander JT (1992) On optimal population size of genetic algorithms. In: CompEuro 1992 Proceedings computer systems and software engineering. IEEE, pp 65–70

  45. 45.

    Das S, Mullick SS, Suganthan PN (2016) Recent advances in differential evolution-an updated survey. Swarm Evolut Comput 27:1–30

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China under Grant No. 61603087, funded by the Natural Science Foundation of Jilin Province under Grant No. 20190103006JH, and the Science and Technology Development Planning of Jilin Province No. 20160204043GX. The work described in this paper was substantially supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region [CityU 11203217] and [CityU 11200218] and the funding from Hong Kong Institute for Data Science (HKIDS) at City University of Hong Kong. The work described in this paper was partially supported by a grant from City University of Hong Kong (CityU 11202219).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xiangtao Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 198 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Li, S., Wang, L. et al. Cancer molecular subtype classification from hypervolume-based discrete evolutionary optimization. Neural Comput & Applic 32, 15489–15502 (2020). https://doi.org/10.1007/s00521-020-04846-2

Download citation

Keywords

  • Classification
  • Multiobjective optimization
  • Animal migration optimization algorithm
  • Gene expression data