Keywords

1 Introduction

Diabetic retinopathy (DR) is one of the major ocular health problems worldwide, the leading cause of visual impairment in the working-age population in developed countries [1]. The early diagnosis of DR, in which retina vessel segmentation plays an important role, is critical for best patient care.

A number of automated retinal vessel segmentation algorithms have been published in the literature. Many of them formulate the segmentation into a retinal pixel classification problem, in which various visual features are extracted to characterize each pixel. As an typical example, Lupascu et al. [2] jointly employed filters with different scales and directions to extract 41-dimensional visual features and applied the AdaBoosted decision trees to those features for vessel segmentation. Meanwhile, the prior knowledge about retinal vessels is indispensable to address the difficulties caused by intensity inhomogeneity and low contrast. For instance, Staal et al. [3] introduced the vessel centerlines and other heuristics to the segmentation process, and Lam et al. [4] incorporated the shape prior into vessels detection.

Recent years have witnessed the success of deep learning in medical image processing, including the retinal vessel segmentation. Liskowski et al. [5] trained a deep neural network using the augmented blood vessel data at variable scales for vessel segmentation. Li et al. [6] adopted an auto-encoder to initialize the neural network for vessel segmentation without preprocessing retinal images. Ronneberger et al. [7] proposed a fully convolutional network (FCN) called U-Net to segment retinal vessels.

Despite their success, these algorithms still generate less accurate segmentation of retinal vessels, particularly the capillaries, which have smaller diameter and lower contrast than major arteries and veins in retinal images. Since the regions that contain vessels with different diameters have diverse visual appearance (see Fig. 1), we suggest applying multiscale models to segment multi-width retinal vessels. Furthermore, although deep convolutional neural networks (DCNNs) have a strong ability to learn image representations, they can hardly incorporate the spatial information of pixels into the pixel classification process, resulting in poor connectedness of the segmented retinal vessels. Traditionally, we can apply a conditional random field (CRF) to the pixel features learned by a DCNN to address this drawback, but this makes it impossible to learn pixel features and the classifier in a unified network.

Fig. 1.
figure 1

A fundus retinal image: (upper right) the macular area, (bottom right) optic disc region, (upper left) low contrast patch and (bottom left) high contrast patch.

Therefore, we propose a multiscale network followed network (MS-NFN) model for blood vessel segmentation in color retinal images. The main uniqueness of this model includes: (1) there are an ‘up-pool’ NFN submodel, in which up-sampling layers are in front of max-pooling layers, and a ‘pool-up’ NFN submodel, in which max-pooling layers are in front of up-sampling layers; (2) each NFN submodel consists of two identical multiscale networks: one, with an auxiliary loss, converts an image patch into a probabilistic retinal vessel map, and the other further refines the map. Each multiscale network has a U-Net structure, in which up-sampling layers and max-pooling layers generate multiscale feature maps. The refined probabilistic retinal vessel maps produced by both NFNs are averaged to construct the segmentation result. We evaluated our MS-NFN model on the digital retinal images for vessel extraction (DRIVE) dataset [3] against eight algorithms and on the child heart and health study dataset [8, 9] against five algorithms, and achieved the current state-of-the-art performance.

2 Datasets

The DRIVE dataset comes from a diabetic retinopathy screening program initiated in Netherlands. It consists of 20 training and 20 testing fundus retinal color images of size \(584\times 565\). These images were taken by optical camera from 400 diabetic subjects, whose ages are 25–90 years. Among them, 33 images do not have any pathological manifestations and the rest have very small signs of diabetes. Each image is equipped with the mask and ground truth from the manual segmentation of two experts.

The dataset consists of 28 retinal images taken from both eyes of 14 school children. Usually, the first 20 images are used for training and the rest 8 images are for testing [6]. The size of each image is \(999\times 960\). The binary field of view (FOV) mask and segmentation ground truth were obtained by manual methods [10].

3 Method

The proposed MS-NFN model (see Fig. 2) can be applied to retinal vessel segmentation in five steps: (1) retinal image preprocessing, (2) patch extraction, (3) feeding each patch simultaneously into the ‘up-pool’ NFN and ‘pool-up’ NFN for segmentation, (4) averaging the output of both NFNs, and (5) segmentation result reconstruction. We now delve into the major steps.

3.1 Images Pre-processing and Patch Extraction

Each color retinal image is converted into an intensity image to avoid the impact of hue and saturation. Then, the contrast limited adaptive histogram equalization (CLAHE) algorithm [11] and gamma adjusting algorithm are used to improve image contrast and suppress noise. Next, partly overlapped patches of size \(48\times 48\) are randomly extracted in each image, resulting in 190,000 patches from the DRIVE dataset and 400,000 patches from the dataset for training our model.

Fig. 2.
figure 2

Illustration of the MS-NFN model-based retinal vessel segmentation

3.2 Training Two NFN Models

Extracted patches are fed into two NFN submodels for independent training. Each NFN (see Fig. 3) consists of two identical multiscale networks: the first network inputs an extracted patch and outputs the probabilistic map of retinal vessels, and the second network inputs the probabilistic vessel map generated by the first one and then outputs a refined probabilistic vessel map. The NFN is trained in an ‘end-to-end’ manner to minimize the cross entropy loss L\(_{1}\). The first network also has an auxiliary cross entropy loss L\(_{2}\), which is added to the back propagated error with a weight of \(\lambda \) = 0.8. The mini-batch stochastic gradient descent (mini-batch SGD) algorithm with a batch size of 32 is adopted as the optimizer. The maximum iteration number is empirically set to 100 and the learning rate is set to 0.01.

Fig. 3.
figure 3

Illustration of the NFN submodel-based patch segmentation.

The ‘up-pool’ NFN consists of two multiscale ‘up-pool’ networks (see Fig. 4 (top)), each having a symmetrical architecture and containing consequently an up-sampling and max-pooling module, a U-Net module [7], and another up-sampling and max-pooling module. Similarly, the ‘pool-up’ NFN consists of two multiscale ‘pool-up’ networks (see Fig. 4 (bottom)), each containing consequently a max-pooling and up-sampling module, a U-Net module, and another max-pooling and up-sampling module.

3.3 Testing the MS-NFN Model

Each test image is first pre-processed using the method described in the Subsect. 3.1. Then, partly overlapped patches of size \(48\times 48\) are extracted with a stride of 5 along both horizontal and vertical directions. Next, each patch is fed into two NFN submodels for segmentation, and the obtained probabilistic vessel maps are averaged. Since the patches are heavily overlapped, each pixel may appear in multiple patches, and its probabilistic values in these patches are further averaged. Finally, the averaged probabilistic vessel map is binarized with a threshold 0.5 to form the retinal vessel segmentation result.

Fig. 4.
figure 4

Architecture of the ‘up-pool’ network (top) and ‘pool-up’ network (bottom)

4 Results

Performance Gain Caused by NFN: To demonstrate the performance gain caused by the NFN structure, we implemented a MS model, which has a similar architecture to MS-NFN except that each NFN is replaced with a single multiscale network. Figures 5 and 6 show an example test image from DRIVE and , respectively, together with the segmentation results obtained by using the MS model and proposed MS-NFN model, and the ground truth. It reveals that our MS-NFN model, with the NFN structure, can detect more retinal vessels than the MS model.

Fig. 5.
figure 5

A test image from the DRIVE dataset (1\(^{st}\) column), the segmentation results obtained by using the MS model (2\(^{nd}\) column) and our MS-NFN model (3\(^{rd}\) column), and ground truth (4\(^{th}\) column).

Table 1. Performance of nine segmentation methods on the dataset.

Comparison to Existing Methods: Tables 1 and 2 give the average accuracy, specificity, sensitivity and area under the ROC curve (AUC) obtained by applying several existing retinal vessel segmentation methods, the MS model, and the proposed MS-NFN model to the DRIVE dataset and dataset, respectively. It shows that the overall performance of our MS-NFN model is superior to those competing methods on both datasets. Particularly, our method achieved a substantially improved AUC value (i.e., 0.60% higher than the second best AUC on DRIVE and 1.09% higher than the second best AUC on . Considering the size of each retinal image, such improvement leads to a large number of retinal vessel pixels being correctly classified.

Fig. 6.
figure 6

A test image from the dataset (1\(^{st}\) column), the segmentation results obtained by using the MS model (2\(^{nd}\) column) and our MS-NFN model (3\(^{rd}\) column), and ground truth (4\(^{th}\) column).

Table 2. Performance of six segmentation methods on the dataset.

Computational Complexity: It took more than 16 h to train the proposed MS-NFN model on the DRIVE dataset and more than 30 h to train it on the dataset (Intel Xeon E5-2640 V4 CPU, NVIDIA Titan Xp GPU, 512 GB Memory, and Keras 1.1.0). However, applying our MS-NFN model to retinal vessel segmentation is relatively fast, as it took less than 10 s to segment a \(584\times 565\) retinal image on average.

5 Conclusions

We propose the MS-NFN model for retinal vessel segmentation and evaluated it on the DRIVE and datasets. Our results indicate that the NFN structure we designed is able to produce performance gain and the proposed MS-NFN model achieved, to our knowledge, the most accurate retinal vessel segmentation on both datasets.