A machine learning evolutionary algorithm-based formula to assess tumor markers and predict lung cancer in cytologically negative pleural effusions


Malignant pleural effusion is diagnostically challenging in presence of negative cytology. The assessment of tumor markers in serum has become a standard tool in cancer diagnosis, while pleural fluid sampling has not met universal consensus. The evaluation of a panel of markers both in serum and pleural fluid may be crucial to improve the diagnostic accuracy. Using a machine learning-based approach, we provide a mathematical formula capable to express the complex relation existing among the expressed markers in serum and pleural effusion and the presence of lung cancer. The formula indicates CEA and CYFRA21-1 in pleural fluid as the best diagnostic markers, with 97% accuracy, 98% sensitivity, 95% specificity, 96% area under curve, 98% positive predictive value, and 92% MCC (Matthews correlation coefficient).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


CA 15-3:

Carbohydrate antigen 15-3


Carbohydrate antigen 19-9

CA 125:

Carbohydrate antigen 125

CYFRA 21-1:

Cytocheratin fragment 21-1


Carcinoembryonic antigen


Neuron-specific enolase


  1. Alataş F, Alataş O, Metintaş M, Colak O, Harmanci E, Demir S (2001) Diagnostic value of CEA, CA 15–3, CA 19–9, CYFRA 21–1, NSE and TSA assay in pleural effusions. Lung Cancer 31(1):9–16

    Article  Google Scholar 

  2. Antonangelo L, Sales RK, Cora AP, Acencio MM, Teixeira LR, Vargas FS (2015) Pleural fluid tumour markers in malignant pleural effusion with inconclusive cytologic results. Curr Oncol 22(5):e336–341. https://doi.org/10.3747/co.22.2563

    Article  Google Scholar 

  3. Arnold DT, De Fonseka D, Perry S, Morley A, Harvey JE, Medford A, Brett M, Maskell NA (2018) Investigating unilateral pleural effusions: the role of cytology. Eur Respir J 52(5):1801254

    Article  Google Scholar 

  4. Bennett R, Maskell N (2005) Management of malignant pleural effusions. Curr Opin Pulm Med 11(4):296–300

    Google Scholar 

  5. Bibby AC, Maskell NA (2016) Pleural biopsies in undiagnosed pleural effusions; Abrams vs image-guided vs thoracoscopic biopsies. Curr Opin Pulm Med 22(4):392–398

    Article  Google Scholar 

  6. Cedrés S, Nuñez I, Longo M, Martinez P, Checa E, Torrejón D, Felip E (2011) Serum tumor markers cea, cyfra21-1, and ca-125 are associated with worse prognosis in advanced non-small-cell lung cancer (nsclc). Clinical Lung Cancer 12(3):172–179. https://doi.org/10.1016/j.cllc.2011.03.019

    Article  Google Scholar 

  7. D’Angelo G, Rampone S (2014) Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications. BMC Bioinf 15 Suppl 5:S2. https://doi.org/10.1186/1471-2105-15-S5-S2

    Article  Google Scholar 

  8. D’Angelo G, Rampone S (2018) A nat traversal mechanism for cloud video surveillance applications using websocket. Multimed Tool Appl 77(19):25861–25888. https://doi.org/10.1007/s11042-018-5821-z

    Article  Google Scholar 

  9. D’Angelo G, Tipaldi M, Glielmo L, Rampone S (2017) Spacecraft autonomy modeled via markov decision process and associative rule-based machine learning. In: 2017 IEEE international workshop on metrology for aerospace (MetroAeroSpace), pp 324–329. https://doi.org/10.1109/MetroAeroSpace.2017.7999589

  10. D’Angelo G, Pilla R, Dean J, Rampone S (2018) Toward a soft computing-based correlation between oxygen toxicity seizures and hyperoxic hyperpnea. Soft Comput 22(7):2421–2427. https://doi.org/10.1007/s00500-017-2512-z

    Article  Google Scholar 

  11. D’Angelo G, Palmieri F, Rampone S (2019a) Detecting unfair recommendations in trust-based pervasive environments. Inf Sci 486:31–51. https://doi.org/10.1016/j.ins.2019.02.015

    Article  Google Scholar 

  12. D’Angelo G, Pilla R, Tascini C, Rampone S (2019b) A proposal for distinguishing between bacterial and viral meningitis using genetic programming and decision trees. Soft Comput. https://doi.org/10.1007/s00500-018-03729-y

    Article  Google Scholar 

  13. D’Angelo G, Tipaldi M, Palmieri F, Glielmo L (2019c) A data-driven approximate dynamic programming approach based on association rule learning: spacecraft autonomy as a case study. Inf Sci 504:501–519. https://doi.org/10.1016/j.ins.2019.07.067

    MathSciNet  Article  Google Scholar 

  14. Davies HE, Sadler RS, Bielsa S, Maskell NA, Rahman NM, Davies RJ, Ferry BL, Lee YC (2009) Clinical impact and reliability of pleural fluid mesothelin in undiagnosed pleural effusions. Am J Respir Crit Care Med 180(5):437–444

    Article  Google Scholar 

  15. DeCamp MM, Mentzer SJ, Swanson SJ, Sugarbaker DJ (1997) Malignant effusive disease of the pleura and pericardium. Chest 112(4 Suppl):291S–295S

    Article  Google Scholar 

  16. Elia S, Loprete S, De Stefano A, Hardavella G (2019) Does aggressive management of solitary pulmonary nodules pay off? Breathe (Sheffield, England) 15(1):15–23. https://doi.org/10.1183/20734735.0275-2018

    Article  Google Scholar 

  17. Elia S, Massoud R, Guggino G, Cristino B, Cortese C, De Massimi AR, Zenobi R (2008) Tumor type M2-pyruvate-kinase levels in pleural fluid versus plasma in cancer patients: a further tool to define the need for invasive procedures. Eur J Cardiothorac Surg 33(4):723–727

    Article  Google Scholar 

  18. Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09(01):1–16. https://doi.org/10.4236/jilsa.2017.91001

    Article  Google Scholar 

  19. Feng M, Zhu J, Liang L, Zeng N, Wu Y, Wan C, Shen Y, Wen F (2017) Diagnostic value of tumor markers for lung adenocarcinoma-associated malignant pleural effusion: a validation study and meta-analysis. Int J Clin Oncol 22(2):283–290. https://doi.org/10.1007/s10147-016-1073-y

    Article  Google Scholar 

  20. Gu P, Huang G, Chen Y, Zhu C, Yuan J, Sheng S (2007) Diagnostic utility of pleural fluid carcinoembryonic antigen and CYFRA 21–1 in patients with pleural effusion: a systematic review and meta-analysis. J Clin Lab Anal 21(6):398–405

    Article  Google Scholar 

  21. Gu Y, Qiao X, Wang L, Fu X (2017) The diagnostic value of parallel detection of cytokeratin 19 fragment-based tumor markers in malignant pleural effusion: a systematic review and meta-analysis. Biomed Res (India) 28(18):8105–8114

    Google Scholar 

  22. Gwiazda TD (2006) Crossover for single-objective numerical optimization problems. Tomasz Gwiazda. http://www.tomaszgwiazda.com/Genetic_algorithms_reference_first_40_pages.pdf. Accessed 12 Sept 2019

  23. Heffner JE, Klein JS (2008) Recent advances in the diagnosis and management of malignant pleural effusions. Mayo Clin Proc 83(2):235–250

    Article  Google Scholar 

  24. Holdenrieder S, Wehnl B, Hettwer K, Simon K, Uhlig S, Dayyani F (2017) Carcinoembryonic antigen and cytokeratin-19 fragments for assessment of therapy response in non-small cell lung cancer: a systematic review and meta-analysis. Br J Cancer 116(8):1037–1045. https://doi.org/10.1038/bjc.2017.45

    Article  Google Scholar 

  25. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. vol 2, IJCAI’95, pp 1137–1143. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. http://dl.acm.org/citation.cfm?id=1643031.1643047. Accessed 12 Sept 2019

  26. Korczynski P, Krenke R, Safianowska A, Gorska K, Abou Chaz MB, Maskey-Warzechowska M, Kondracka A, Nasilowski J, Chazan R (2009) Diagnostic utility of pleural fluid and serum markers in differentiation between malignant and non-malignant pleural effusions. Eur J Med Res 14(Suppl 4):128–133. https://doi.org/10.1186/2047-783x-14-s4-128

    Article  Google Scholar 

  27. Koza JR (1994) Genetic programming as a means for programming computers by natural selection. Stat Comput 4(2):87–112. https://doi.org/10.1007/BF00175355

    Article  Google Scholar 

  28. Leu F, Ko C, You I, Choo KKR, Ho CL (2018) A smartphone-based wearable sensors for monitoring real-time physiological data. Comput Electr Eng 65:376–392. https://doi.org/10.1016/j.compeleceng.2017.06.031

    Article  Google Scholar 

  29. Light RW (2006) The undiagnosed pleural effusion. Clin Chest Med 27(2):309–319. https://doi.org/10.1016/j.ccm.2005.12.002. Pleural Disease

    Article  Google Scholar 

  30. Mitchell M (1998) An introduction to genetic algorithms. MIT Press, Cambridge

    MATH  Google Scholar 

  31. Neragi-Miandoab S (2006) Malignant pleural effusion, current and evolving approaches for its diagnosis and management. Lung Cancer 54(1):1–9. https://doi.org/10.1016/j.lungcan.2006.04.016

    Article  Google Scholar 

  32. Searson DP (2015) Gptips 2: an open-source software platform for symbolic data mining. In: Gandomi A, Alavi A, Ryan C (eds) Handbook of genetic programming applications. Springer, Cham, pp 551–573. https://doi.org/10.1007/978-3-319-20883-1_22

    Google Scholar 

  33. Sette S, Boullart L (2001) Genetic programming: principles and applications. Eng Appl Artif Intell 14(6):727–736. https://doi.org/10.1016/S0952-1976(02)00013-1

    Article  Google Scholar 

  34. Shamsaei B, Gao C (2016) Comparison of some machine learning and statistical algorithms for classification and prediction of human cancer type. In: 2016 IEEE-EMBS international conference on biomedical and health informatics (BHI), pp 296–299. https://doi.org/10.1109/BHI.2016.7455893

  35. Sharma SK, Bhat S, Chandel V, Sharma M, Sharma P, Gupta S, Sharma S, Bhat AA (2015) Diagnostic utility of serum and pleural fluid carcinoembryonic antigen, and cytokeratin 19 fragments in patients with effusion from nonsmall cell lung cancer. J Carcinog 14:7. https://doi.org/10.4103/1477-3163.170662

    Article  Google Scholar 

  36. Shitrit D, Zingerman B, Shitrit AB, Shlomi D, Kramer MR (2005) Diagnostic value of CYFRA 21–1, CEA, CA 19–9, CA 15–3, and CA 125 assays in pleural effusions: analysis of 116 cases and review of the literature. Oncologist 10(7):501–507

    Article  Google Scholar 

  37. Sriram KB, Relan V, Clarke BE, Duhig EE, Yang IA, Bowman RV, Lee YC, Fong KM (2011) Diagnostic molecular biomarkers for malignant pleural effusions. Future Oncol 7(6):737–752

    Article  Google Scholar 

  38. Topolcan O, Holubec L, Polivkova V, Svobodova S, Pesek M, Treska V, Safranek J, Hajek T, Bartunek L, Rousarova M, Finek J (2007) Tumor markers in pleural effusions. Anticancer Res. 27(4A):1921–1924

    Google Scholar 

  39. Trape J, Sant F, Franquesa J, Montesinos J, Arnau A, Sala M, Bernadich O, Martin E, Perich D, Perez C, Lopez J, Ros S, Esteve E, Perez R, Aligue J, Gurt G, Catot S, Domenech M, Bosch J, Badal JM, Bonet M, Molina R, Ordeig J (2017) Evaluation of two strategies for the interpretation of tumour markers in pleural effusions. Respir Res 18(1):103. https://doi.org/10.1186/s12931-017-0582-1

    Article  Google Scholar 

  40. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco. https://doi.org/10.1016/C2009-0-19715-5

    Book  Google Scholar 

  41. Yang Y, Liu YL, Shi HZ (2017) Diagnostic accuracy of combinations of tumor markers for malignant pleural effusion: an updated meta-analysis. Respiration 94(1):62–69

    Article  Google Scholar 

  42. Zhang XS, Leu FY, Yang CW, Lai LS (2018) Healthcare-based on cloud electrocardiogram system: a medical center experience in middle taiwan. J Med Syst 42(3):39. https://doi.org/10.1007/s10916-018-0892-y

    Article  Google Scholar 

Download references


This study was partially funded by Italian Ministry of Education, University and Research as a National Interest Research Project (PRIN) No. 20083YAR35 granted to the University of Rome Tor Vergata.

Author information




GD, AD and SE conceived the study and wrote the paper. RM, RS and CC provided data of tumor marker concentrations. AD performed the tumor markers assay. RS and SE performed the statistical analysis, and the feature selection. GD conceived and implemented the classifier based on the genetic programming. Moreover, GD carried out all the genetic programming-based experiments, and conceived the final formula. GD, FP, AD and SE contributed in the interpretation of the meaning of results, and in writing the Discussion and Conclusion sections. GH and FP revised the final draft of manuscript before submission.

Corresponding author

Correspondence to Stefano Elia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Furthermore, the study design was carried out according to the protocol “Tumor marker in pleural effusion and biopsies” approved by the ethical committee of Fondazione PTV Policlinico Tor Vergata, Rome, Italy, on November 6, 2018 (authorization Nr.171/18), and all patients gave their informed consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 462 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Elia, S., D’Angelo, G., Palmieri, F. et al. A machine learning evolutionary algorithm-based formula to assess tumor markers and predict lung cancer in cytologically negative pleural effusions. Soft Comput 24, 7281–7293 (2020). https://doi.org/10.1007/s00500-019-04344-1

Download citation


  • Machine learning
  • Genetic programming
  • Genetic algorithm
  • Evolutionary algorithm
  • Pleural effusion
  • Biochemical tumor marker
  • Thoracentesis
  • Thoracoscopy
  • Video-assisted thoracic surgery