Improving the Segmentation of Anatomical Structures in Chest Radiographs Using U-Net with an ImageNet Pre-trained Encoder

Frid-Adar, Maayan; Ben-Cohen, Avi; Amer, Rula; Greenspan, Hayit

doi:10.1007/978-3-030-00946-5_17

Maayan Frid-Adar⁴²,
Avi Ben-Cohen⁴³,
Rula Amer⁴³ &
…
Hayit Greenspan^42,43

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11040))

Included in the following conference series:

2449 Accesses
40 Citations
12 Altmetric

Abstract

Accurate segmentation of anatomical structures in chest radiographs is essential for many computer-aided diagnosis tasks. In this paper we investigate the latest fully-convolutional architectures for the task of multi-class segmentation of the lungs field, heart and clavicles in a chest radiograph. In addition, we explore the influence of using different loss functions in the training process of a neural network for semantic segmentation. We evaluate all models on a common benchmark of 247 X-ray images from the JSRT database and ground-truth segmentation masks from the SCR dataset. Our best performing architecture, is a modified U-Net that benefits from pre-trained encoder weights. This model outperformed the current state-of-the-art methods tested on the same benchmark, with Jaccard overlap scores of 96.1% for lung fields, 90.6% for heart and 85.5% for clavicles.

You have full access to this open access chapter, Download conference paper PDF

Segmentation of Multiple Structures in Chest Radiographs Using Multi-task Fully Convolutional Networks

An Ensemble of Deep Neural Networks for Segmentation of Lung and Clavicle on Chest Radiographs

Weakly-Supervised Segmentation for Disease Localization in Chest X-Ray Images

Keywords

1 Introduction

Approximately 3.6 billion diagnostic radiological examinations, such as radiographs (x-rays), are performed globally every year [1]. Chest radiographs are performed to evaluate the lungs, heart and thoracic viscera. They are crucial for diagnosing various lung disorders in all levels of health care. Computer-aided diagnostic (CAD) tools serve an important role to assist the radiologists with the growing number of chest radiographs. Accurate segmentation of anatomical structures in chest radiographs is essential for many analysis tasks in CAD. For example: segmentation of the lungs field can help detecting lung diseases and shape irregulars; segmentation of the heart outline can help to predict cardiomegaly; and the segmentation of clavicles can improve the diagnosis of pathologies near the apex of the lung.

Evaluating a chest radiograph is a challenging task due to the high variability between patients, unclear and overlapping organs borders, and image artifacts. A clear and high quality radiograph is not easy to acquire. This challenge drew many researchers over the years to improve the segmentation of anatomical structures in chest radiographs [2,3,4,5]. An open benchmark dataset that was provided by Ginneken et al. [6] facilitated over the years an objective comparison between the different segmentation methods. Classic approaches include active shape and appearance models, pixel classification methods, hybrid models and landmark based models. More recently deep learning approaches were suggested [2, 3] based on the successful employment of convolutional neural networks (CNNs) on various detection and segmentation tasks in the medical imaging domain [7].

CNN architectures for semantic segmentation usually incorporate encoder and decoder networks [8, 9] that reduce the resolution of the image to capture the most important details and then restore the resolution of the image. Another semantic segmentation approach is to keep the resolution of the network by incorporating dilated convolutions [10] that enlarge the global receptive field of the CNN to larger context information. In both approaches, the CNN can output single-class or multiple-class segmentation masks. The resolution of the output mask is the same as the input radiograph image. The training process of each CNN is affected by several training features: One is the selection of the loss function that guides the optimization process during the training process (with different loss functions effecting differently the final output segmentation performance results); The other is the initialization of the network weights - random initialization or weights transferred from another trained network (transfer learning from a totally different task).

In this paper, we explore the segmentation of anatomical structures in chest radiographs, namely the lungs field, the heart and the clavicles, using a set of the most advanced CNN architectures for multi-class semantic segmentation. We propose an improved encoder-decoder style CNN with pre-trained weights of the encoder network and show its superiority over other state of the art CNN architectures. We further examine the use of multiple loss functions for training the best selected network and the effect of multi-class vs. single-class training. We present qualitative and quantitative comparisons on a common benchmark data, based on the JSRT database [11]. Our best performing model, the U-net with an ImageNet pre-trained encoder, outperformed the currently state-of-the-art segmentation methods for all anatomical structures.

2 Methods

2.1 Fully Convolutional Neural Network Architectures

Fully convolutional networks (FCN) are extensively used for semantic segmentation tasks. In this study, four different state of the art architectures have been tested as follows:

FCN - The first FCN architecture that we used in this work is based on the FCN-8s net that uses the VGG-16 layer net [9, 12]. The VGG-16 net is converted into an FCN by decapitating the final classification layer and converting fully connected layers into convolution. Deconvolution layers are then used to upsample the coarse outputs to pixel-dense outputs. Skip connections are used to merge output from previous pooling layers in the network which was shown to improve the segmentation quality [9].

Fully Convolutional DenseNet - The second network architecture that was tested is based on the fully convolutional DenseNet shown in [13]. DenseNet architecture [14] proposes intensive layer fusion. Each dense block consists of a set of convolution layers using a similar scale where each convolution layer processes the concatenation of all its previous layers thus enabling the fusion of numerous representation levels. For the fully convolutional DenseNet architecture a decoding path is added to generate the segmentation output. The fusion between different layers consists of intra dense block layers fusion as well as the concatenation of the preceding high level feature maps and the ones coming from the encoding block at the same scale.

Dilated Residual Networks - The dilated residual network (DRN) [10] uses dilated convolution [15] to increase the resolution of output feature maps without reducing the receptive field of individual neurons. It was shown to improve the performance compared to the standard residual networks presented in [16]. We have implemented the DRN-C-26 as stated in [10].

U-Net with VGG-16 Encoder - The U-Net architecture [8] has been extensively used for different image-to-image tasks in computer vision with a major contribution to the image segmentation task. The U-Net includes a contracting path (the encoder) with several layers of convolution and pooling for down-sampling. The second half of the network includes an expansion path (the decoder) that uses up-sampling and convolution layers sequentially to generate an output with a similar size as the input image. Additionally, the U-Net architecture combines the encoder features with the decoder features in different levels of the network using skip connections. Iglovikov et al. [17] proposed to use a VGG11 [12] as an encoder which was pre-trained on ImageNet [18] dataset and showed that it can improve the standard U-Net performance in binary segmentation of buildings in aerial images. A similar concept was used in the current study with the more advanced VGG16 [12] as an encoder. Figure 1 shows a diagram of our proposed network. The chest X-ray image is duplicated to obtain an input image with 3 channels similar to the RGB images that are used as input to the VGG-16 net (which is the encoder in the proposed architecture).

2.2 Objective Loss Functions

The loss function is used to guide the training process of a convolutional network by measuring the compatibility between the network prediction and the ground truth label. Let us denote S as the estimated segmentation mask and G as the ground truth mask. In a multi-class semantic segmentation task including $C = \{c_1,...,c_m\}$ classes, the total loss (TS) between S and G is defined as the sum of losses in every class:

$$\begin{aligned} TL(S,G)=\sum _{c=1}^{m}L_c(S,G) \end{aligned}$$

(1)

In this study we explore the influence of using different loss functions in the FCNs training process. The Dice similarity coefficient (DSC) and Jaccard similarity coefficient (JSC) are two well known measures in segmentation and can be used as objective loss functions in training. These segmentation measures between S and G are defined as:

$$\begin{aligned} DSC(S,G)= & {} 2\frac{|SG|}{|S|+|G|} \end{aligned}$$

(2)

$$\begin{aligned} JSC(S,G)= & {} \frac{|SG|}{|S|+|G|-|SG|} \end{aligned}$$

(3)

when used as loss in training, both measures weights FP and FN detections equally. The Tversky loss [19] introduces weighting into the loss function for highly imbalanced data, where we want to segment small objects. The Tversky index is defined as:

$$\begin{aligned} Tversky(S,G;\alpha ,\beta )=\frac{|SG|}{|SG|+\alpha |S / G|+\beta |G / S|} \end{aligned}$$

(4)

where $\alpha $ and $\beta $ control the magnitude of penalties for FPs and FNs, respectively. In our study we used $\alpha =0.3$ and $\beta =0.7$.

An additional loss function tested is the Binary Cross-Entropy (BCE). BCE was calculated separately for each class segmentation map. For each pixel $s_i\in S$ and pixel $g_i\in G$ that share the same pixel position i, the loss is averaged over all pixels N as follows:

$$\begin{aligned} BCE(S,G)=\frac{1}{N}\sum _{i=1}^N g_i\log (s_i) + (1-g_i)\log (1-s_i) \end{aligned}$$

(5)

3 Segmentation of Anatomical Structures

3.1 Dataset

Evaluation of the chest anatomical structures segmentation was done on chest radiographs from the JSRT database [11]. This public database includes 247 posterior-anterior (PA) chest radiograph images of size $2048\times 2048$ pixels, 0.175 mm pixel spacing and 12-bit gray levels. Ginneken et al. [6] publicized the Segmentation in Chest Radiographs (SCR) database, a benchmark set of segmentation masks for the lungs field, heart and clavicles (see Fig. 2). The annotations were made by two human observers and a radiologist consultant. The segmentations of the first observer generate the ground-truth segmentation masks and the other - human observer results. The benchmark data is split into two folds of 124 and 123 cases, each containing equal amount of normal cases and cases with lung nodules. Following the suggested instructions for comparison between the segmentation results, images in one fold were used for training and images from the other fold were used for testing, and vise versa. The final evaluation is defined as the average performance over the two folds.

For training, we resize the images to $224\times 224$ pixels and normalize each image by its mean and standard deviation. The networks are trained using Adam optimizer with initial learning rate of $10^{-5}$ and default parameters for 100 epochs. We use augmentations of scaling, translation and small rotations. In testing, We threshold the output score maps with $threshold = 0.25$ to generate binary segmentation masks of each anatomical structure.

3.2 Performance Measures

To measure the performance of the proposed architectures and compare to state-of-the-art results, we use well accepted metrics for segmentation: Dice similarity coefficient, jaccard index (also known as intersection over union) and mean absolute contour distance (MACD). MACD is a measure of distance between two contours. For each point on contour A, the closest point on contour B is computed by the euclidean distance $d(a_{i},B) = min_{b_{j}\in B}||{b_{j} - a_{i}}||$. The distance values are then averaged over all points. Since distances from A to B are not the same as B to A, we derive a common average between the two averages as follows:

$$\begin{aligned} MACD(A,B)=\frac{1}{2}(\frac{\sum _{i=1}^{n} d(a_{i},B)}{n} + \frac{\sum _{i=1}^{m} d(b_{i},A)}{m}) \end{aligned}$$

(6)

Because MACD measure is given in millimeters, we multiply the original pixel spacing by a factor of 2048 / 224 to match the target image resolution.

3.3 Experimental Results

Table 1 compares the segmentation performance of the four state of the art fully convolutional networks for semantic segmentation as listed in Sect. 2.1. All models are trained for multi-class segmentation into three classes: $lungs \ field, heart, \ clavicles$. We use the sigmoid activation function after the last layer of each network with Dice as the loss function. An additional column in Table 1 shows if the network is fine-tunned (FT) from a pre-trained network.

The results show that the best performing architecture for the segmentation of all anatomical structures in chest radiograph, is the U-Net including the VGG16 encoder pre-trained on ImageNet. This architecture achieved the highest segmentation overlap scores (Jaccard) of 0.961, 0.906 and 0.855 for the Lungs field, Heart and Clavicles respectively. It is noticeable that between all four architectures, the fine-tuned networks performed better than the networks trained from scratch.

Table 1. Segmentation results of four compared architectures trained with multi-class Dice loss showing the Dice (D), Jaccard (J) and MACD metrics. Fine tuned (FT) architectures include a pre-trained VGG16 as an initial encoder.

Full size table

For the top performing architecture, the U-Net based network, we further analyzed several training features. Table 2 summarizes the multi-class segmentation performance using different objective loss functions. It is evident that structures with smaller pixel area, like the clavicles, benefits from loss metrics with pixel weighing such as Tversky loss function. We also tested the performance of training a single-class network for each of the three classes vs. the multi-class training. For the lungs, the single class training did not resolve in significant improvement. However, for the heart and clavicles, the Dice and Jaccard scores in a single-class training were improved each by 1% in comparison to the multi-class training. The last improvement in performance of the multi-class segmentation was achieved using post-processing including small objects removal and hole fill. While the Dice and Jaccard metrics were not improved, the MACD metric showed an improvement from 1.121, 2.569 and 0.871 [mm] for the lungs, heart and clavicles to 1.019, 2.549 and 0.856 [mm] respectively. Figure 3 shows a few segmentation examples of our best performing model. A comparison of our U-Net based model trained with multi-class dice loss to existing state-of-the-art methods, validated on the same benchmark of chest radiographs and a human observer, is presented in Table 3.

Table 2. Multi-class segmentation results using different loss functions including DSC, JSC, Tversky and BCE (rows). The Dice (D), Jaccard (J) and MACD are used as metrics (columns) for each anatomical structure

Full size table

Table 3. Our best performing architecture compared to state-of-the-art models; “-” means that the score was not reported; (*) used different data split than suggested in SCR benchmark

Full size table

4 Discussion and Conclusion

Segmentation of anatomical structures in chest radiographs is a challenging task that attracted considerable interest over the years. The advantages of newly introduced CNN architectures, together with the public benchmark dataset provided in [6] on the JSRT images, motivated further studies in this field. Some of the recent studies focused only on the problem of lung segmentation, and a few have also dealt with the problem of heart and clavicles segmentation. In this paper, we employed and evaluated the segmentation performance of four top FCN architectures [9, 10, 13, 17] for semantic segmentation for all three anatomical structures, using multi-class dice loss.

The network architectures presented in this study are well known and showed promising results in many computer vision semantic segmentation tasks. The FCN [9] and the U-Net [8] are considered classical approaches while the FC DenseNet and the DRN are more advanced and relatively new approaches for semantic segmentation. Hence, it was interesting to see in Table 1 that the classic U-Net and FCN showed superior segmentation performance over the more advanced approaches. The advantage of using pre-trained networks for medical imaging tasks has already been shown in several studies [7], and even though only the encoder part of the FCN and U-Net (VGG16 encoder) networks was pre-trained using the ImageNet database in our case, it seemed to be advantageous. The best segmentation performance was obtained using the proposed U-Net based architecture including the pre-trained VGG16 encoder (Table 1).

Next, we explored the effect of training multi-class segmentation model using different loss functions (Table 2). We demonstrated that small structures such as the clavicles can benefit from weighted loss functions such the Tversky loss function while the larger structures (lung and heart) achieved the best segmentation results using Dice or Binary Cross-Entropy loss functions. Applying additional minor post-processing resulted in further decrease of the MACD measure with cleaner and more precise segmentations for all three structures as displayed in Fig. 3.

Table 3 presents the final comparison between our top selected model, the multi-class U-Net VGG16 with dice loss, to state-of-the-art methods [2,3,4,5,6] and human observer segmentations [6]. Our model outperformed all state-of-the-art methods tested in this study and the human observer for the lungs and heart segmentation. For the clavicles segmentation, fewer studies were conducted. Novikov et al. [2] reported results on different data split than the benchmark recommendation so its not an objective comparison. However, our proposed network outperformed an additional top reported method [6].

In conclusion, we presented an experimental study in which four top segmentation architectures and several losses were compared for the task of segmenting anatomical structures on chest X-Ray images. Results were evaluated quantitatively with qualitative examples of our best performing model. Improving the segmentation of the lung field, heart and clavicles is the foundation for better CAD tools and the development of new applications for medical thoracic images analysis.

References

United Nations. Scientific Committee on the Effects of Atomic Radiation. Report of the United Nations Scientific Committee on the Effects of Atomic Radiation: Fifty-sixth Session (10–18 July 2008) (No. 46). United Nations Publications (2008)
Google Scholar
Novikov, A.A., Lenis, D., Major, D., Hladuvka, J., Wimmer, M., Bühler, K.: Fully convolutional architectures for multiclass segmentation in chest radiographs. IEEE Trans. Med. Imaging 37(8), 1865–1876 (2018)
Article Google Scholar
Hwang, S., Park, S.: Accurate lung segmentation via network-wise training of convolutional networks. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 92–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_11
Chapter Google Scholar
Ibragimov, B., Likar, B., Pernu, F., Vrtovec, T.: Accurate landmark-based segmentation by incorporating landmark misdetections. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 1072–1075. IEEE (2016)
Google Scholar
Yang, W., et al.: Lung field segmentation in chest radiographs from boundary maps by a structured edge detector. IEEE J. Biomed. Health Inf. 22(3), 842–851 (2018)
Article Google Scholar
Van Ginneken, B., Stegmann, M.B., Loog, M.: Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Med. Image Anal. 10(1), 19–40 (2016)
Article Google Scholar
Greenspan, H., van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35(5), 1153–1159 (2016)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Computer Vision and Pattern Recognition, vol. 1 (2017)
Google Scholar
Shiraishi, J., et al.: Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am. J. Roentgenol. 174(1), 71–74 (2000)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Jgou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1175–1183. IEEE (2017)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, p. 3 (2017)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Iglovikov, V., Shvets, A.: TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation. arXiv preprint arXiv:1801.05746 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Salehi, S.S.M., Erdogmus, D., Gholipour, A.: Tversky loss function for image segmentation using 3D fully convolutional deep networks. In: Wang, Q., Shi, Y., Suk, H.-I., Suzuki, K. (eds.) MLMI 2017. LNCS, vol. 10541, pp. 379–387. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67389-9_44
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

RADLogics Ltd., Tel Aviv, Israel
Maayan Frid-Adar & Hayit Greenspan
Faculty of Engineering, Department of Biomedical Engineering, Medical Image Processing Laboratory, Tel Aviv University, 69978, Tel Aviv, Israel
Avi Ben-Cohen, Rula Amer & Hayit Greenspan

Authors

Maayan Frid-Adar
View author publications
You can also search for this author in PubMed Google Scholar
Avi Ben-Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Rula Amer
View author publications
You can also search for this author in PubMed Google Scholar
Hayit Greenspan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maayan Frid-Adar .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Imperial College London, London, UK
Bernhard Kainz
University of Adelaide, Adelaide, SA, Australia
Gabriel Maicas
University of Iowa, Iowa City, IA, USA
Reinhard R. Beichel
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
Visulytix Ltd. Screenworks, London, UK
Kanwal Bhatia
King’s College London, London, UK
Tom Vercauteren
Imperial College London, London, UK
Ozan Oktay
University of Adelaide, Adelaide, SA, Australia
Gustavo Carneiro
Queensland University of Technology, Brisbane, QLD, Australia
Andrew P. Bradley
University of Lisbon, Lisbon, Portugal
Jacinto Nascimento
University of Queensland, Brisbane, QLD, Australia
Hang Min
University of California Los Angeles, Los Angeles, CA, USA
Matthew S. Brown
Radboud University Medical Center, Nijmegen, The Netherlands
Colin Jacobs
Fraunhofer Institute for Medical Image Computing (MEVIS), Bremen, Germany
Bianca Lassen-Schmidt
Nagoya University, Nagoya, Japan
Kensaku Mori
University of Copenhagen, Copenhagen, Denmark
Jens Petersen
Harvard Medical School, Boston, MA, USA
Raúl San José Estépar
Philips (Germany), Hamburg, Germany
Alexander Schmidt-Richberg
University College London, London, UK
Catarina Veiga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frid-Adar, M., Ben-Cohen, A., Amer, R., Greenspan, H. (2018). Improving the Segmentation of Anatomical Structures in Chest Radiographs Using U-Net with an ImageNet Pre-trained Encoder. In: Stoyanov, D., et al. Image Analysis for Moving Organ, Breast, and Thoracic Images. RAMBO BIA TIA 2018 2018 2018. Lecture Notes in Computer Science(), vol 11040. Springer, Cham. https://doi.org/10.1007/978-3-030-00946-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-00946-5_17
Published: 12 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00945-8
Online ISBN: 978-3-030-00946-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving the Segmentation of Anatomical Structures in Chest Radiographs Using U-Net with an ImageNet Pre-trained Encoder

Abstract

Similar content being viewed by others

Segmentation of Multiple Structures in Chest Radiographs Using Multi-task Fully Convolutional Networks

An Ensemble of Deep Neural Networks for Segmentation of Lung and Clavicle on Chest Radiographs

Weakly-Supervised Segmentation for Disease Localization in Chest X-Ray Images

Keywords

1 Introduction

2 Methods