Advertisement

Accurate Identification of Tomograms of Lung Nodules Using CNN: Influence of the Optimizer, Preprocessing and Segmentation

  • Cecilia Irene Loeza Mejía
  • R. R. BiswalEmail author
  • Eduardo Rodriguez-Tello
  • Gilberto Ochoa-Ruiz
Conference paper
  • 174 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12088)

Abstract

The diagnosis of pulmonary nodules plays an important role in the treatment of lung cancer, thus improving the diagnosis is the primary concern. This article shows a comparison of the results in the identification of computed tomography scans with pulmonary nodules, through the use of different optimizers (Adam and Nadam); the effect of the use of pre-processing and segmentation techniques using CNNs is also thoroughly explored. The dataset employed was Lung TIME which is publicly available. When no preprocessing or segmentation was applied, training accuracy above 90.24% and test accuracy above 86.8% were obtained. In contrast, when segmentation was applied without preprocessing, a training accuracy above 97.19% and test accuracy above 95.07% were reached. On the other hand, when preprocessing and segmentation was applied, a training accuracy above 96.41% and test accuracy above 94.71% were achieved. On average, the Adam optimizer scored a training accuracy of 96.17% and a test accuracy of 95.23%. Whereas, the Nadam optimizer obtained 96.25% and 95.2%, respectively. It is concluded that CNN has a good performance even when working with images with noise. The performance of the network was similar when working with preprocessing and segmentation than when using only segmentation. Also, it can be inferred that, the application of preprocessing and segmentation is an excellent option when it is required to improve accuracy in CNNs.

Keywords

Lung nodule CNN Lung TIME Tomograms Adam optimizer Nadam optimizer 

1 Introduction

At present, there has been an incredible growth in the use of machine learning techniques in medical research, mainly applied to genetics [1], disease detection, biomedical image segmentation [2, 3] and classification, thus showing the efficacy of machine learning in clinical decisions and monitoring systems [4]. The use of convolutional neural networks (CNN) in deep learning has helped in the automatic detection of various diseases particularly through the processing of biomedical images and clinical data. Recently, CNN research related to lung cancer, has focused on the automatic diagnosis of cancer [5, 6], lung segmentation [7, 8, 9], segmentation of pulmonary nodules [10, 11, 12, 13], lung nodules detection [14, 15], cancer classification [16], nodule categorization [17] and nodule malignancy assessment [18, 19, 20, 21, 22, 23, 24, 25, 26]. Various investigations, related to lung nodules, report the influence of the data augmentation [14, 24, 26], number of input channels [20] and the use of dropout [8, 14, 18, 20, 21, 24, 26], in order to improve the accuracy of the network and to avoid overfitting. Likewise, some other researchers report the influence of the number of parameters [20, 23] and training time [20]. Nonetheless, the use of preprocessing and segmentation has been little explored; the same applies to the effect of various available optimizers. The main goal of this investigation is to evaluate the influence of the optimizer (Adam [27] and Nadam [28]), preprocessing and segmentation in CNN for the precise identification of tomograms with pulmonary nodules. The evaluation was carried out considering both in precision and training time. The experiments were carried out on the Lung TIME [29] dataset, which is publicly available. In continuation, the paper is organized as follows: Sect. 2 deals with the materials and methods used while in Sect. 3 the results obtained are discussed and finally in the Sect. 4, the conclusions and future work are presented.
Fig. 1.

Pipeline of Methodology (Color figure online)

2 Materials and Methods

Three scenarios (see Fig. 1, the yellow color indicates the first case analyzed, while the blue color illustrates the second and the green color denotes the third) were considered to carry out the identification of tomograms with lung nodules applying a convolutional neural network: (i) to rescale the tomograms to \(96\times 96\) pixels and pass them as an input to CNN. (ii) to segment the tomograms to obtain the pulmonary regions and rescale them, then pass them as input to CNN. (iii) to preprocess the tomograms by applying filters (median and Gaussian), then the preprocessed image was binarized, subsequently the tomograms were scaled, which were taken as input to CNN. The motivation to perform the downsampling of the tomograms was to decrease the training time.

2.1 Dataset Used in the Study

In this study, CT thorax scans in DICOM format with annotations of the pulmonary nodules in XML format of Lung TIME [29] was used. 62 CT thorax scans were chosen, which had 2003 tomograms with nodules and 12934 without nodules. To validate the results, 70% of the tomograms was randomly selected and utilized for training and the rest for testing.

2.2 Preprocessing

To improve the quality of the tomograms, the median filter and afterwards the Gaussian filter were applied, as discussed in [31] to eliminate salt-and-pepper noise, and the mottled noise from the image. The applied median filter mask was \(5\times 5\) pixels. On the other hand, standard deviation for Gaussian kernel was equal to 2.

2.3 Segmentation

To perform the segmentation, the thresholding technique was chosen. Thresholding is a simple and efficient technique for partitioning an image into a foreground and background [30]. According to Alakwaa et al. [16] it produces the best lung segmentation compared to clustering techniques (K-means and Mean Shift) and Watershed. Binarization was performed with a threshold of −350 HU as suggested by Pulagam et al. [32] to separate the pulmonary region tomography. Finally, the components connected to the edge of the binarized image were removed.
Table 1.

CNN architecture

Layers

Conv2D

MaxPooling2D

Conv2D

Conv2D

MaxPooling2D

Flatten

Dense

Softmax

2.4 CNN

The description of the layers of the CNN architecture is indicated in Table 1, which consists of multiple convolutional layers with ReLU activation, maxpooling, flatten, dense and a final fully connected softmax layer to carry out the classification between tomograms with nodules and tomograms without nodules. Table 2 shows the CNN architecture using the Dropout layer, which helps selectively ignore neurons during training [33]. Both architectures were tested with Adam [27] and Nadam [28] optimizers. A batch size of 32, 5 epochs and a sparse categorical crossentropy loss function [34] was applied.
Table 2.

CNN architecture with Dropout

Layers

Conv2D

MaxPooling2D

Conv2D

Conv2D

MaxPooling2D

Flatten

Dense

Dropout 0.0002

Softmax

2.5 Computer Equipment

To implement CNN, Tensor Flow 2.0 was utilized in Python 3.7. The imageio [35] library was employed to read the DICOM images. For the preprocessing SciPy [36] library was used, while for segmentation of tomograms the scikit-image [37] library was used. The equipment on which the tests were performed has the following characteristics:
  • Operating System: Windows 10 Home 64-bit (Build 18362)

  • Processor: Intel(R) Core(TM) i3-5015U CPU @ 2.10 GHz

  • Memory: 6 GB

Fig. 2.

(a, b) original images of slices, and (c, d) images obtained after application of filters

3 Experimental Results and Analysis

Figure 2 shows an example of the application of filters (first the median filter and then the Gaussian) to the tomograms. The use of preprocessing significantly increases image quality, thus helping to reduce both salt and pepper and the mottled noises from the images.

Figure 3 shows examples of binarization in the tomograms. By means of the segmentation, the pulmonary region could be obtained, which allowed to improve the performance of the CNN.
Fig. 3.

(a, b) original images of slices, and (c, d) binarized slices

Table 3.

Results obtained from the comparison experiments without using Dropout

# tomograms with nodules

# tomograms without nodules

Pre-processing

Segmentation

Optimizer

Training ACC

Test ACC

Runtime in minutes

2003

2003

No

No

Adam

0.9472

0.9567

10.13

2003

2003

No

No

Nadam

0.9704

0.9642

9.98

2003

2003

No

Yes

Adam

0.9843

0.9734

13.19

2003

2003

No

Yes

Nadam

0.9847

0.9676

12.18

2003

2003

Yes

Yes

Adam

0.9800

0.9617

24.43

2003

2003

Yes

Yes

Nadam

0.9861

0.9709

24.40

2003

4158

No

No

Adam

0.9024

0.8680

14.53

2003

4158

No

No

Nadam

0.9685

0.9676

12.53

2003

4158

No

Yes

Adam

0.9819

0.9697

18.85

2003

4158

No

Yes

Nadam

0.9863

0.9659

22.17

2003

4158

Yes

Yes

Adam

0.9826

0.9708

33.05

2003

4158

Yes

Yes

Nadam

0.9814

0.9762

35.68

2003

12934

No

No

Adam

0.9269

0.9346

37.60

2003

12934

No

No

Nadam

0.9415

0.9449

38.60

2003

12934

No

Yes

Adam

0.9753

0.9610

45.78

2003

12934

No

Yes

Nadam

0.9763

0.9589

44.12

2003

12934

Yes

Yes

Adam

0.9652

0.9525

77.24

2003

12934

Yes

Yes

Nadam

0.9682

0.9485

78.49

Table 3 gives a summary of the experiments performed without using Dropout while Table 4 reports the experiments carried out with a 0.0002 Dropout rate. Also tests were performed with/without preprocessing, with/without segmentation and with different number of tomograms. Performance was compared between Adam and Nadam optimizers. When carrying out the segmentation, better results were obtained, however, the execution time increased. In most tests (both using the Dropout layer and without using it), in which preprocessing was not carried out, better results were observed using the Nadam optimizer and a shorter runtime. When Dropout was not applied, preprocessing was performed and the Nadam optimizer was used, in some cases the runtime increased, compared to the Adam optimizer. So when the Dropout layer is not used, it is recommended to use the Nadam optimizer on images that have not been preprocessed, instead the Adam optimizer is suggested for images that were preprocessed.
Table 4.

Results obtained from the comparison experiments using Dropout

# tomograms with nodules

# tomograms without nodules

Pre-processing

Segmentation

Optimizer

Training ACC

Test ACC

Runtime in minutes

2003

2003

No

No

Adam

0.9419

0.9434

13.93

2003

2003

No

No

Nadam

0.9693

0.9676

9.48

2003

2003

No

Yes

Adam

0.9832

0.9667

11.86

2003

2003

No

Yes

Nadam

0.9850

0.9692

12.25

2003

2003

Yes

Yes

Adam

0.9807

0.9676

24.74

2003

2003

Yes

Yes

Nadam

0.9843

0.9709

23.77

2003

4158

No

No

Adam

0.9522

0.9562

17.35

2003

4158

No

No

Nadam

0.9817

0.9605

14.35

2003

4158

No

Yes

Adam

0.9840

0.9719

19.80

2003

4158

No

Yes

Nadam

0.9835

0.9735

18.92

2003

4158

Yes

Yes

Adam

0.9789

0.9713

37.40

2003

4158

Yes

Yes

Nadam

0.9824

0.9703

35.04

2003

12934

No

No

Adam

0.9079

0.9141

45.74

2003

12934

No

No

Nadam

0.9473

0.9460

44.44

2003

12934

No

Yes

Adam

0.9719

0.9507

42.50

2003

12934

No

Yes

Nadam

0.9747

0.9589

46.27

2003

12934

Yes

Yes

Adam

0.9641

0.9520

87.01

2003

12934

Yes

Yes

Nadam

0.9688

0.9471

86.14

Fig. 4.

Influence of the optimizer, preprocessing and segmentation in the accurate identification of tomograms of lung nodules

Figure 4 shows the average accuracy of training and testing in the experiments performed. On average, the Adam optimizer obtained a training accuracy of 96.17%, test accuracy of 95.23% and training time of 31.95 min in \(96\times 96\) pixel images. In contrast, the Nadam optimizer obtained 96.25%, 95.2% and 33.23 min respectively. It was observed that when using the Nadam optimizer slightly better results are obtained than when those furnished by Adam. In addition, accuracy using only segmentation is better than when it is combined with preprocessing.

4 Conclusions

An experimental analysis was performed through the preprocessing, segmentation and optimizer on images of Lung TIME dataset resized to \(96\times 96\) pixels. It is concluded that convolutional neural networks have excellent performance in the identification of tomograms with nodules, obtaining training accuracy above 90.24% and test accuracy above 86.8%, even when working with images with noise. It is suggested that when working with CT thorax scans, no preprocessing be applied and only segmentation can be performed, since better results were observed in this case (a training accuracy above 97.19% and test accuracy above 95.07% were obtained), compared to applying preprocessing and segmentation (a training accuracy above 96.41% and test accuracy above 94.71% were obtained). In addition, the use of preprocessing significantly increases runtime. On average, the Adam optimizer obtained a training accuracy of 96.17%, test accuracy of 95.23% and training time of 31.95 min. In contrast, the Nadam optimizer obtained 96.25%, 95.2% and 33.23 min, respectively. When Dropout is not applied and preprocessing is performed, it is recommended to use the Adam optimizer. On the contrary, the Nadam optimizer is recommended when no preprocessing on the tomogram is performed. Applying segmentation is an excellent option when accurate results are required. We would like to remark that the model obtained can be used as part of a computer-assisted diagnostic system on lung cancer research. As future work, the location of the nodules in the tomograms identified is proposed. In addition, it would be interesting to compare the performance of different preprocessing techniques.

References

  1. 1.
    Holder, L.B., Haque, M.M., Skinner, M.K.: Machine learning for epigenetics and future medical applications. Epigenetics 12(7), 505–514 (2017)CrossRefGoogle Scholar
  2. 2.
    Lenchik, L., et al.: Automated segmentation of tissues using CT and MRI: a systematic review. Acad. Radiol. 26(12), 1695–706 (2019)CrossRefGoogle Scholar
  3. 3.
    Rizwan-i-Haque, I., Neubert, J.: Deep learning approaches to biomedical image segmentation. Inf. Med. Unlocked 18, 100297 (2020)CrossRefGoogle Scholar
  4. 4.
    Zhang, Z., Sejdić, E.: Radiological images and machine learning: trends, perspectives, and prospects. Comput. Biol. Med. 108, 354–370 (2019)CrossRefGoogle Scholar
  5. 5.
    Polat, H., Mehr, H.: Classification of pulmonary CT images by using hybrid 3D-Deep convolutional neural network architecture. Appl. Sci. 9(5), 940 (2019)CrossRefGoogle Scholar
  6. 6.
    Simie, E., Kaur, M.: Lung cancer detection using convolutional neural network (CNN). Int. J. Adv. Res. Ideas Innov. Technol. 5(4), 284–292 (2019)Google Scholar
  7. 7.
    Zhu, J., Zhang, J., Qiu, B., Liu, Y., Liu, X., Chen, L.: Comparison of the automatic segmentation of multiple organs at risk in CT images of lung cancer between deep convolutional neural network based and atlas-based techniques. Acta Oncol. 58(2), 257–264 (2019)CrossRefGoogle Scholar
  8. 8.
    Abdullah-Al-Zubaer, I., Hatamizadeh, A., Ananth, S.P., Ding, X., Tajbakhsh, N., Terzopoulos, D.: Fast and automatic segmentation of pulmonary lobes from chest CT using a progressive dense V-network. Comput. Methods Biomech. Biomed. Eng. Imaging Vis., 1–10 (2019) Google Scholar
  9. 9.
    Geng, L., Zhang, S., Tong, J., Xiao, Z.: Lung segmentation method with dilated convolution based on VGG-16 network. Comput. Assist. Surg. 24(S2), 27–33 (2019)CrossRefGoogle Scholar
  10. 10.
    Hamidian, S., Sahiner, B., Petrick, N., Pezeshk, A.: 3D convolutional neural network for automatic detection of lung nodules in chest CT. In: Proceedings SPIE International Society for Optical Engineering (2017)Google Scholar
  11. 11.
    Dey, R., Lu, Z., Hong, Y.: Diagnostic classification of lung nodules using 3D neural networks. In: IEEE International Symposium on Biomedical Imaging (2018)Google Scholar
  12. 12.
    Tong, G., Li, Y., Chen, H., Zhang, Q., Jiang, H.: Improved U-NET network for pulmonary nodules segmentation. Optik - Int. J. Light Electron Opt. 174, 460–469 (2018)CrossRefGoogle Scholar
  13. 13.
    Huang, X., Sun, W., Tseng, T., Li, C., Qian, W.: Fast and fully-automated detection and segmentation of pulmonary nodules in thoracic CT scans using deep convolutional neural networks. Comput. Med. Imaging Graph. 74, 25–36 (2019)CrossRefGoogle Scholar
  14. 14.
    Setio, A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016)CrossRefGoogle Scholar
  15. 15.
    Xie, H., Yang, D., Sun, N., Chen, Z., Zhang, Y.: Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recogn. 85, 109–119 (2019)CrossRefGoogle Scholar
  16. 16.
    Alakwaa, W., Nassef, M., Badr, A.: Lung cancer detection and classification with 3D convolutional neural network (3D-CNN). Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8(8), 99–110 (2017)Google Scholar
  17. 17.
    Tu, X., et al.: Automatic categorization and scoring of solid, part-solid and non-solid pulmonary nodules in CT images with convolutional neural network. Nature 7(1–10), 8533 (2017)Google Scholar
  18. 18.
    Tajbakhsh, N., Suzuki, K.: Comparing two classes of end-to-end machine-learning models in lung nodule detection and classification: MTANNs vs CNNs. Pattern Recogn. 63(2017), 476–486 (2017)CrossRefGoogle Scholar
  19. 19.
    Yan, X., et al.: Classification of lung nodule malignancy risk on computed tomography images using convolutional neural network: a comparison between 2D and 3D strategies. In: ACCV 2016. LNCS, vol. 10118, pp. 91–101. Springer, Heidelberg (2017).  https://doi.org/10.1007/978-3-319-54526-47
  20. 20.
    Kang, G., Liu, K., Hou, B., Zhang, N.: 3D multi-view convolutional neural networks for lung nodule classification. PLoS ONE 12(11), e0188290 (2017)CrossRefGoogle Scholar
  21. 21.
    Zhao, X., Liu, L., Qi, S., Teng, Y., Li, J., Qian, W.: Agile convolutional neural network for pulmonary nodule classification using CT images. Int. J. Comput. Assist. Radiol. Surg. 13(4), 585–595 (2018).  https://doi.org/10.1007/s11548-017-1696-0CrossRefGoogle Scholar
  22. 22.
    Causey, J., et al.: Highly accurate model for prediction of lung nodule malignancy with CT scans. Nature 8(1–12), 9286 (2018)Google Scholar
  23. 23.
    Liu, Y., Hao, P., Zhang, P., Xu, X., Wu, J., Chen, W.: Dense convolutional binary-tree networks for lung nodule classification. IEEE Access 30(6), 49080–49088 (2018)CrossRefGoogle Scholar
  24. 24.
    Gruetzemacher, R., Gupta, A., Paradice, D.: 3D deep learning for detecting pulmonary nodules in CT scans. J. Am. Med. Inf. Assoc. 25(10), 1301–1310 (2018) CrossRefGoogle Scholar
  25. 25.
    Zia, M.B., Juan, Z.J., Rehman, Z.U., Javed, K., Rauf, S.A., Khan, A.: The utilization of consignable multi-model in detection and classification of pulmonary nodules. Int. J. Comput. Appl. 177(27), 0975–8887 (2019)Google Scholar
  26. 26.
    Onishi, Y., et al.: Automated pulmonary nodule classification in computed tomography images using a deep convolutional neural network trained by generative adversarial networks. J. 2(5), 99–110 (2019)Google Scholar
  27. 27.
    Kingma, D., Lei, J.: Adam: A method for stochastic optimization. In: 3rd International Conference for Learning Representations, San Diego (2015)Google Scholar
  28. 28.
    Dozat, T.: Incorporating nesterov momentum into Adam. In: International Conference on Learning Representations (2016)Google Scholar
  29. 29.
    Dolejsi, M., Kybic, J., Polovincak, M., Tuma, S.: The Lung TIME: annotated lung nodule dataset and nodule detection framework. In: Proceedings SPIE 7260, Medical Imaging 2009: Computer-Aided Diagnosis, vol. 7260 (2009)Google Scholar
  30. 30.
    Khan, S., Hussain, S., Yang, S., Iqbal, K.: Efective and reliable framework for lung nodules detection from CT scan images. Nature 9, 1–4 (2019)Google Scholar
  31. 31.
    Makaju, S., Prasad, P., Alsadoon, A., Singh, A., Elchouemi, A.: Lung cancer detection using CT scan images. Proc. Comput. Sci. 125(2018), 107–114 (2018)CrossRefGoogle Scholar
  32. 32.
    Pulagam, A., Rao, V., Inampudi, R.: Automated pulmonary lung nodule detection using an optimal manifold statistical based feature descriptor and SVM classifier. Biomedical & Pharmacology Journal 10(3), 1311–1324 (2017)CrossRefGoogle Scholar
  33. 33.
    Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(2014), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  34. 34.
  35. 35.
    Imageio. https://imageio.readthedocs.io/en/stable/. Accessed 21 Feb 2019
  36. 36.
    Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020)CrossRefGoogle Scholar
  37. 37.
    Van der Walt, S., et al.: scikit-image: image processing in Python. PeerJ 2, e453 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Departamento de PosgradoInstituto Tecnológico Superior de MisantlaMisantlaMexico
  2. 2.Tecnologico de Monterrey, Escuela de Ingenieria y CienciasZapopanMexico

Personalised recommendations