Gene selection of non-small cell lung cancer data for adjuvant chemotherapy decision using cell separation algorithm


Since recommended treatment for Non-small cell lung cancer (NSCLC) after surgery is chemotherapy, the prediction of effectiveness or futileness of adjuvant chemotherapy (ACT) in early stage is important for future decision. Classification of NSCLC in gene expression data is performed to predict effectiveness or futileness of ACT. Selection of genes highly correlated with the class attribute, affects the classification accuracy. In this paper, a new cell separation algorithm is proposed which it imitates the action of cell separation using differential centrifugation process involving multiple centrifugation steps and increasing the rotor speed in each step. The CSA uses the application of centrifugal force to separate the solutions based on their objective function in different steps while the velocity is increased in each step. The CSA contributes to automatic trade-off between exploration and exploitation by control of selection rate during the search process. To examine the CSA, 25 test functions were used first and then the CSA was applied to predict effectiveness or futileness of ACT. The number of genes in candidate subsets is handled by increasing the subset size if after a certain number of iterations there is no improvement in fitness of the subset. This contributes to less time consideration and memory usage. In this experiment, the NSCLC data contain 280 samples collected from four institutes are used. As results, the minimum number of five genes with dependency degree equal to one and classification accuracy of higher than 94% for SVM, KNN and MLP classifiers is obtained.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Siegel R, Naishadham D, Jemal A (2013) Cancer statistics, 2013. CA Cancer J Clin 63:11–30

    Google Scholar 

  2. 2.

    Scagliotti GV (2005) The ALPI trial: the Italian/European experience with adjuvant chemotherapy in resectable non–small lung cancer. Clin Cancer Res 11:5011s–5016s

    Google Scholar 

  3. 3.

    Waller D, Fairlamb DJ, Gower N, Milroy R, Peake MD, Rudd RM, Spiro SG, Stephens RJ (2003) O-179 the big lung trial (BLT): determining the value of cisplatin-based chemotherapy for all patients with non-small cell lung cancer (NSCLC). Preliminary results in the surgical setting. Lung Cancer 41:S54

    Google Scholar 

  4. 4.

    Arriagada R, Dunant A, Pignon J-P, Bergman B, Chabowski M, Grunenwald D, Kozlowski M, Le Péchoux C, Pirker R, Pinel M (2010) Long-term results of the international adjuvant lung cancer trial evaluating adjuvant Cisplatin-based chemotherapy in resected lung cancer. J Clin Oncol 28:35–42

    Google Scholar 

  5. 5.

    Zhu C-Q, Ding K, Strumpf D, Weir BA, Meyerson M, Pennell N, Thomas RK, Naoki K, Ladd-Acosta C, Liu N (2010) Prognostic and predictive gene signature for adjuvant chemotherapy in resected non–small-cell lung cancer. J Clin Oncol 28:4417–4424

    Google Scholar 

  6. 6.

    Douillard J-Y (2009) Adjuvant chemotherapy for non–small-cell lung cancer: it does not always fade with time. J Clin Oncol 28(1):3–5

  7. 7.

    Butts CA, Ding K, Seymour L, Twumasi-Ankrah P, Graham B, Gandara D, Johnson DH, Kesler KA, Green M, Vincent M (2010) Randomized phase III trial of vinorelbine plus cisplatin compared with observation in completely resected stage IB and II non–small-cell lung cancer: updated survival analysis of JBR-10. J Clin Oncol 28:29–34

    Google Scholar 

  8. 8.

    Tang H, Xiao G, Behrens C, Schiller J, Allen J, Chow C-W, Suraokar M, Corvalan A, Mao J, White MA (2013) A 12-gene set predicts survival benefits from adjuvant chemotherapy in non–small cell lung cancer patients. Clin Cancer Res 19:1577–1586

    Google Scholar 

  9. 9.

    Chen D-T, Hsu Y-L, Fulp WJ, Coppola D, Haura EB, Yeatman TJ, Cress WD (2011) Prognostic and predictive value of a malignancy-risk gene signature in early-stage non–small cell lung cancer. J Natl Cancer Inst 103:1859–1870

    Google Scholar 

  10. 10.

    Rosell R, Taron M, Massuti B, Mederos N, Magri I, Santarpia M, Sanchez JM (2011) Predicting response to chemotherapy with early-stage lung cancer. Cancer J 17:49–56

    Google Scholar 

  11. 11.

    Van Laar RK (2012) Genomic signatures for predicting survival and adjuvant chemotherapy benefit in patients with non-small-cell lung cancer. BMC Med Genet 5:30

    Google Scholar 

  12. 12.

    Xie Y, Minna JD (2010) Non–small-cell lung Cancer mRNA expression signature predicting response to adjuvant chemotherapy. J Clin Oncol 28(29):4404–4407

  13. 13.

    Chen Y-C, Chang Y-C, Ke W-C, Chiu H-W (2015) Cancer adjuvant chemotherapy strategic classification by artificial neural network with gene expression data: an example for non-small cell lung cancer. J Biomed Inform 56:1–7

    Google Scholar 

  14. 14.

    Naftchali RE, Abadeh MS (2017) A multi-layered incremental feature selection algorithm for adjuvant chemotherapy effectiveness/futileness assessment in non-small cell lung cancer. Biocybernet Biomed Eng 37:477–488

    Google Scholar 

  15. 15.

    Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Google Scholar 

  16. 16.

    Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437

    Google Scholar 

  17. 17.

    Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39:2383–2392

    Google Scholar 

  18. 18.

    Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24:412–419

    Google Scholar 

  19. 19.

    Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 12:1039–1048

    Google Scholar 

  20. 20.

    Mohapatra P, Chakravarty S, Dash P (2016) Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm Evol Comput 28:144–160

    Google Scholar 

  21. 21.

    Soergel DA, Dey N, Knight R, Brenner SE (2012) Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J 6:1440–1444

    Google Scholar 

  22. 22.

    Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62

    Google Scholar 

  23. 23.

    Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68

    Google Scholar 

  24. 24.

    Nguyen T, Nahavandi S (2016) Modified AHP for gene selection and cancer classification using type-2 fuzzy logic. IEEE Trans Fuzzy Syst 24:273–287

    Google Scholar 

  25. 25.

    Zhao Y, Wang G, Yin Y, Li Y, Wang Z (2016) Improving ELM-based microarray data classification by diversified sequence features selection. Neural Comput & Applic 27:155–166

    Google Scholar 

  26. 26.

    Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC hybrid). Swarm Evol Comput 36:27–36

    Google Scholar 

  27. 27.

    Inza I, Larrañaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103

    Google Scholar 

  28. 28.

    Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839

    MathSciNet  MATH  Google Scholar 

  29. 29.

    Erguzel TT, Tas C, Cebi M (2015) A wrapper-based approach for feature selection and classification of major depressive disorder–bipolar disorders. Comput Biol Med 64:127–137

    Google Scholar 

  30. 30.

    Davis CN, Phillips H, Tomes JJ, Swain MT, Wilkinson TJ, Brophy PM, Morphew RM (2019) The importance of extracellular vesicle purification for downstream analysis: a comparison of differential centrifugation and size exclusion chromatography for helminth pathogens. PLoS Negl Trop Dis 13:e0007191

    Google Scholar 

  31. 31.

    Mafarja MM, Mirjalili S (2019) Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection. Soft Comput 23:6249–6265

    Google Scholar 

  32. 32.

    Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312

    Google Scholar 

  33. 33.

    Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Google Scholar 

  34. 34.

    Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

    Google Scholar 

  35. 35.

    Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036

    Google Scholar 

  36. 36.

    Wang H, Tan L, Niu B (2019) Feature selection for classification of microarray gene expression cancers using bacterial Colony optimization with multi-dimensional population. Swarm Evol Comput 48:172–181

    Google Scholar 

  37. 37.

    Banka H, Dara S (2015) A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recogn Lett 52:94–100

    Google Scholar 

  38. 38.

    Uma S, Kirubakaran E, Sathya Devi S (2016) Microarray image based Cancer prediction: an genetic invasive weed optimization approach for feature selection. J Med Imaging Health Inf 6:1934–1938

    Google Scholar 

  39. 39.

    Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19:32–51

    Google Scholar 

  40. 40.

    Zibakhsh A, Abadeh MS (2013) Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function. Eng Appl Artif Intell 26:1274–1281

    Google Scholar 

  41. 41.

    Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356

    MATH  Google Scholar 

  42. 42.

    Hernandez JCH, Duval B, Hao J-K (2007) A genetic embedded approach for gene selection and classification of microarray data. In: European conference on evolutionary computation, Machine Learning and Data Mining in Bioinformatics, Springer, pp. 90–101

  43. 43.

    Glaab E, Bacardit J, Garibaldi JM, Krasnogor N (2012) Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7:e39932

    Google Scholar 

  44. 44.

    Yu H, Ni J, Dan Y, Xu S (2012) Mining and integrating reliable decision rules for imbalanced cancer gene expression data sets. Tsinghua Sci Technol 17:666–673

    Google Scholar 

  45. 45.

    Hengpraprohm S (2013) GA-based classifier with SNR weighted features for cancer microarray data classification. Int J Signal Process Syst 1:29–33

    Google Scholar 

  46. 46.

    Gunavathi C, Premalatha K (2014) Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. Int J Comput Electr Autom Control Inf Eng 8:1490–1497

    Google Scholar 

  47. 47.

    Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Hidden Markov models for cancer classification using gene expression profiles. Inf Sci 316:293–307

    Google Scholar 

  48. 48.

    Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Mohammad Saniee Abadeh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jaddi, N.S., Saniee Abadeh, M. Gene selection of non-small cell lung cancer data for adjuvant chemotherapy decision using cell separation algorithm. Appl Intell (2020).

Download citation


  • Gene selection
  • Classification
  • Non-small cell lung cancer
  • Adjuvant chemotherapy decision
  • Cell separation algorithm