Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks

  • P. Mohamed ShakeelEmail author
  • Amr TolbaEmail author
  • Zafer Al-Makhadmeh
  • Mustafa Musa Jaber
Intelligent Biomedical Data Analysis and Processing


Today, most of the people are affected by lung cancer, mainly because of the genetic changes of the tissues in the lungs. Other factors such as smoking, alcohol, and exposure to dangerous gases can also be considered the contributory causes of lung cancer. Due to the serious consequences of lung cancer, the medical associations have been striving to diagnose cancer in its early stage of growth by applying the computer-aided diagnosis process. Although the CAD system at healthcare centers is able to diagnose lung cancer during its early stage of growth, the accuracy of cancer detection is difficult to achieve, mainly because of the overfitting of lung cancer features and the dimensionality of the feature set. Thus, this paper introduces the effective and optimized neural computing and soft computing techniques to minimize the difficulties and issues in the feature set. Initially, lung biomedical data were collected from the ELVIRA Biomedical Data Set Repository. The noise present in the data was eliminated by applying the bin smoothing normalization process. The minimum repetition and Wolf heuristic features were subsequently selected to minimize the dimensionality and complexity of the features. The selected lung features were analyzed using discrete AdaBoost optimized ensemble learning generalized neural networks, which successfully analyzed the biomedical lung data and classified the normal and abnormal features with great effectiveness. The efficiency of the system was then evaluated using MATLAB experimental setup in terms of error rate, precision, recall, G-mean, F-measure, and prediction rate.


Computer-aided diagnosis Neural computing Biomedical ELVIRA Biomedical Data Set Repository Minimum repetition and Wolf heuristic features Discrete AdaBoost optimized ensemble learning generalized neural networks 



The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research Group No. (RG-1438-027).

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflict of interest.


  1. 1.
    World Cancer Report (2014) World Health Organization, Chapter 5.1. ISBN 92-832-0429-8Google Scholar
  2. 2.
  3. 3.
    Lung Cancer—Patient Version. NCI. Archived from the original on 9 March 2016. Retrieved 5 Mar 2016Google Scholar
  4. 4.
    Horn L, Lovly CM, Johnson DH (2015) Chapter 107: neoplasms of the lung. In: Kasper DL, Hauser SL, Jameson JL, Fauci AS, Longo DL, Loscalzo J (eds) Harrison’s principles of internal medicine, 19th edn. McGraw-Hill, New York. ISBN 978-0-07-180216-1Google Scholar
  5. 5.
    Alberg AJ, Brock MV, Samet JM (2016) Chapter 52: epidemiology of lung cancer. In: Murray & Nadel’s textbook of respiratory medicine, 6th edn. Saunders Elsevier, Amsterdam. ISBN 978-1-4557-3383-5Google Scholar
  6. 6.
    Collins LG, Haines C, Perkel R, Enck RE (2007) Lung cancer: diagnosis and management. Am Fam Phys 75(1):56–63. PMID 17225705. Archived from the original on 29 September 2007Google Scholar
  7. 7.
    Rance B, Canuel V, Countouris H, Laurent-Puig P, Burgun A (2016) Integrating heterogeneous biomedical data for cancer research: the CARPEM infrastructure. Appl Clin Inform 7(2):260–274. CrossRefGoogle Scholar
  8. 8.
    Lee SLA, Kouzani AZ, Hu EJ (2010) Random forest based lung nodule classification aided by clustering. Comput Med Imaging Graph 34:535–542CrossRefGoogle Scholar
  9. 9.
    Jinsa K, Gunavathi K (2014) Lung cancer classification using neural networks for CT images. Comput Methods Programs Biomed 113:202–209CrossRefGoogle Scholar
  10. 10.
    Diaz JM, Pinon RC, Solano G (2014) Lung cancer classification using genetic algorithm to optimize prediction models, IISA 2014. In: The 5th international conference on information, intelligence, systems and applications in IEEEGoogle Scholar
  11. 11.
    Fang L, Zhao H, Wang P, Yu M, Yan J, Cheng W, Chen P (2015) Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data. Biomed Signal Process Control 21:82–89CrossRefGoogle Scholar
  12. 12.
    Seelan LJ, Padma Suresh L, Krishna Veni SH (2016) Automatic extraction of Lung lesion by using optimized toboggan based approach with feature normalization and transfer learning methods. In: International conference on emerging technological trends (ICETT) in IEEEGoogle Scholar
  13. 13.
    Kohad R, Ahire V (2015) Application of machine learning techniques for the diagnosis of lung cancer with ANT colony optimization. Int J Comput Appl. Google Scholar
  14. 14.
    Rebouças Filho PP, da Silva Barros AC, Ramalho GLB, Pereira CR, Papa JP (2017) Automated recognition of lung diseases in CT images based on the optimum-path forest classifier. Neural Comput Appl. Google Scholar
  15. 15.
    de Rebouças ES, Marques RCP, Braga AM, Oliveira SAF (2018) New level set approach based on Parzen estimation for stroke segmentation in skull CT images. Soft Comput. Google Scholar
  16. 16.
    Rebouças Filho PP, Cortez PC, da Silva Barros AC, Albuquerque VHC, Tavares JMRS (2017) Novel and powerful 3D adaptive crisp active contour method applied in the segmentation of CT lung images. Med Image Anal 35:503–516. CrossRefGoogle Scholar
  17. 17.
    Emarya E, Zawbaabc HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381CrossRefGoogle Scholar
  18. 18.
    Sun T, Wanga J, Li X, Lv P, Liu F, Luo Y, Gao Q, Zhu H, Guo X (2013) Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. Comput Methods Progr Biomed 111(2):519–524CrossRefGoogle Scholar
  19. 19.
    Bhattacharjee A et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS 98(24):13790–13795CrossRefGoogle Scholar
  20. 20.
    Pandey KK, Pradhan N (2014) An analytical and comparative study of various data preprocessing method in data mining. Int J Emerg Technol Adv Eng 4(10):174–180Google Scholar
  21. 21.
    Dodge Y (2003) The Oxford dictionary of statistical terms. Oxford University Press, Oxford. ISBN 0-19-920613-9 (entry for normalization of scores)Google Scholar
  22. 22.
    Claypo N, Jaiyen S (2014) Opinion mining for Thai restaurant reviews using neural networks and mRMR feature selection. In: Computer science and engineering conference (ICSEC) 2014 international, pp 394–397Google Scholar
  23. 23.
    Nguyen H, Franke K, Petrovic S (2010) Towards a generic feature-selection measure for intrusion detection. In: Proceeding of the international conference on pattern recognition (ICPR), IstanbulGoogle Scholar
  24. 24.
    Einicke GA (2018) Maximum-entropy rate selection of features for classifying changes in knee and ankle dynamics during running. IEEE J Biomed Health Inform 28(4):1097–1103CrossRefGoogle Scholar
  25. 25.
    Peng HC, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. (PMID 16119262. Program) CrossRefGoogle Scholar
  26. 26.
    Korayem L, Khorsid M, Kassem SS (2015) Using grey Wolf algorithm to solve the capacitated vehicle routing problem. In: Proceedings of the 3rd international conference on manufacturing, optimization, industrial and material engineering (MOIME’15). Institute of Physics Publishing, BaliGoogle Scholar
  27. 27.
    Mohamed A-AA, El-Gaafary AAM, Mohamed YS, Hemeida AM (2015) Design static VAR compensator controller using artificial neural network optimized by modify Grey Wolf optimization. In: Proceedings of the international joint conference on neural networks (IJCNN’15), AnchorageGoogle Scholar
  28. 28.
    Kégl B (2013) The return of AdaBoost.MH: multi-class Hamming trees. arXiv:1312.6086
  29. 29.
    Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Technical Report, Freie University, BerlinGoogle Scholar
  30. 30.
    Xia J, Yokoya N, Iwasaki Y (2017) A novel ensemble classifier of hyperspectral and LiDAR data using morphological features. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6185–6189.
  31. 31.
    Mgbe CO, Mom JM, Igwue GA (2015) Performance evaluation of generalized regression neural network path loss prediction model in macrocellular environment. Perform Eval 2(2):204–208Google Scholar
  32. 32.
    Shakeel PM, Baskar S, Dhulipala VRS, Mishra S, Jaber MM (2018) Maintaining security and privacy in health care system using learning based deep-Q-networks. J Med Syst 42:186CrossRefGoogle Scholar
  33. 33.
  34. 34.
    Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967Google Scholar
  35. 35.
    Beer DG et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–823CrossRefGoogle Scholar
  36. 36.
    Wigle DA et al (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005–3008Google Scholar
  37. 37.
    Luque-Baena RM, Urda D, Subirats JL, Franco L, Jerez JM (2014) Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theor Biol Med Model 11(Suppl 1):S7. CrossRefGoogle Scholar
  38. 38.
    Ayshwarya SS (2018) Lung cancer prediction using feed forward back propagation neural networks with optimal features. Int J Appl Eng Res 13(1):318–325. ISSN 0973-4562Google Scholar
  39. 39.
    Zhao Z, Feng J, Jing K, Shi E (2017) A hybrid ACOR algorithm for pattern classification neural network training. In: International conference on computing intelligence and information system (CIIS) in IEEEGoogle Scholar
  40. 40.
    Geng Y, Zhang L, Sun Y, Zhang Y, Yang N, Wu J (2016) Research on ant colony algorithm optimization neural network weights blind equalization algorithm. Int J Secur Appl 10(2):95–104. Google Scholar
  41. 41.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks (PDF). Adv Neural Inf Process Syst 1:1097–1105Google Scholar
  42. 42.
    Nunes TM, Coelho AL, Lima CA, Papa JP, de Albuquerque VHC (2014) EEG signal classification for epilepsy diagnosis via optimum path forest—a systematic assessment. Neurocomputing 136:103–123CrossRefGoogle Scholar
  43. 43.
    Rebouças Filho PP, Cortez PC, da Silva Barros AC, De Albuquerque VHC (2014) Novel adaptive balloon active contour method based on internal force for image segmentation—a systematic evaluation on synthetic and real images. Expert Syst Appl 41(17):7707–7721CrossRefGoogle Scholar
  44. 44.
    De Albuquerque VHC, Nunes TM, Pereira DR et al (2018) Robust automated cardiac arrhythmia detection in ECG beat signals. Neural Comput Appl 29:679. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Information and Communication TechnologyUniversiti Teknikal Malaysia MelakaDurian TunggalMalaysia
  2. 2.Computer Science Department, Community CollegeKing Saud UniversityRiyadhSaudi Arabia
  3. 3.Mathematics and Computer Science Department, Faculty of ScienceMenoufia UniversityShebin El-KomEgypt
  4. 4.Department of Computer ScienceDijlah University CollegeBaghdadIraq

Personalised recommendations