Keywords

1 Introduction

Anterior segment optical coherence tomography (AS-OCT) can assist the diagnosis of many eye diseases, such as glaucoma and cataract [2]. The measurement is made without contact and with low risk of infection. The role of AS-OCT in research and clinical care continues to accelerate [3].

AS-OCT nuclear density measurement is a repeatable and reliable objective cataract grading method, It is correlated with the Lens Opacity Classification System Version III (LOCS III) grading [5, 8]. The nuclear density is got by delineates the lens nucleus and calculates the total average pixel intensity. If nucleus can be automatically segmented, cataract grading can be automatically acquired. To the best of our knowledge, no previous work focus on automatically segmenting the nucleus in AS-OCT images.

In this paper, we propose a pipeline to automatically segment cortex and nucleus in AS-OCT images. The proposed pipeline consists of a U-shaped network followed by a shape template. The U-shaped network predicts a preliminary mask for cortex and nucleus. However, the predicted boundary of nucleus is arbitrary because the boundary between cortex and nucleus is weak. To solve this problem, we design a shape template based on the physiological structure of nucleus to refine the boundary of nucleus. The basic idea of the refinement is to find a template in the training set to replace the boundary of the prediction. After the refinement, the boundary of nucleus satisfies the physiological structure of nucleus.

We summarize the contributions of this work as follows:

  • We propose a simple and effective pipeline to segment cortex and nucleus by using a U-shaped network and a shape template. This method integrates both structure information and appearance information.

  • We design a shape template that imitates the intrinsic concentric layers structure of nucleus. By using the template to refine the boundary of nucleus, the final prediction satisfies the physiological structure of nucleus.

Fig. 1.
figure 1

The proposed pipeline: 1, AS-OCT image. 2, Region of interest (ROI). We locate the lens and divide the lens into three sub-regions according to the ground truth annotations (blue lines), pink: cortex region, yellow: nucleus region, cyan: capsule region. 3, U-shaped network with side-output predicts a mask for each region. 4, Shape template refines the boundary of nucleus. (Color figure online)

2 Proposed Method

The proposed pipeline is shown in Fig. 1. First, we find the lens region using the Canny edge detector and divide the lens region into three sub-regions: capsule region, cortex region and nucleus region. We train a U-shaped network to predict a mask for each region. However, the output of the U-shaped network has no regular shape, especially for the nucleus which has a weak boundary. So we design a shape template to model the structure of nucleus and use the template to refine the boundary of nucleus.

2.1 Network Architecture

Motivated by [6]. We design a U-shaped network to predict a preliminary mask for capsule, cortex and nucleus. The U-shaped network can obtain a high-resolution mask with a clear boundary by using skip connections to restore the information loss caused by pooling layers.

Our network mainly includes two modules: encoding module and decoding module. The encoding module consists of six blocks. Each block contains two or three convolutional layers, and each convolutional layer is followed by a rectified linear unit (ReLU) and a 2\(\,\times \,\)2 max pooling operation with stride 2. The decoding module is also composed of six blocks. Each block consists of a concatenation with the corresponding feature maps from the contracting path, a spatial upsampling of the feature maps with a factor of 2 followed by two convolutions and a ReLU layer. At the final layer, a 1\(\,\times \,\)1 convolution maps each 24-dim feature vector to the desired number of classes (here 4).

Motivated by [1], we add a side-output layer which acts as a classifier that produces a companion local output for early layers and also integrates different level information. Cross-entropy loss is used for each side-output layer. There are M side-outputs in the network. The loss function of the side-output layer is given as:

$$\begin{aligned} L_{side-output} = \frac{1}{M}\sum _{m=1}^{M}L_{cross-entropy}(y,y'), \end{aligned}$$
(1)

\(L_{cross-entropy}\) is the cross entropy loss:

$$\begin{aligned} L_{cross-entropy}=-\sum _{i}( y'_{i}\log (y_{i})), \end{aligned}$$
(2)

\(y_{i}\) is the predicted probability value for class i and \(y'_{i}\) is the true probability for that class. The overall structure of the U-shaped network with side-outputs is shown in Fig. 1.

2.2 Shape Template

As shown in Fig. 2 (left). The U-shaped network tends to misclassify the areas similar to nucleus as nucleus because of the weak boundary between nucleus and cortex. To solve the problem, we design a shape template based on the physiological structure of nucleus to refine the boundary of nucleus. The basic idea is to find the closest shape of nucleus in the training set to replace the boundary of prediction.

Fig. 2.
figure 2

Blue lines are ground truth annotations. Left: AS-OCT image, region in white rectangle shows weak contrast boundary. Mid: Prediction of ours U-shaped network, the U-shaped network misclassify the regions similar to nucleus as nucleus, shown in white rectangle. Right: Shape template refine the boundary of nucleus. (Color figure online)

Fig. 3.
figure 3

Left: The structure of the lens, the lens fibres are arranged in concentric layers. Right: The shape template of nucleus, \(p_{middle}\) is the intersection between boundary and alignment axis, \(b_{dis}\) is the distance from center point c to point \(p_{middle}\), the middle part of the template is shown in green. (Color figure online)

Lens fibers form the bulk of the lens. The lens fibers stretch lengthwise from the posterior to the anterior poles and are arranged in concentric layers rather like the layers of an onion when cutting horizontally [4], as shown in Fig. 3 (left). Motivated by the ray feature [7], the structure of concentric layers can be represented by a center point and the distance between the center point and the nearest point on the boundary, as shown in Fig. 3 (right). Different layers share the same center point and different distances to the center point.

Fig. 4.
figure 4

Three nucleus templates of the top part (left) and the bottom part (right). Different templates are shown in different colors.

The boundary of nucleus is represented by n points: \(S_{i} = {\{x_{i},y_{i}\}}\). The center point of nucleus is defined as \(c=(\sum {\frac{x_{i}}{n}},\sum {\frac{y_{i}}{n}})\). Then the boundary of nucleus is encoded by c and the distance from c to the nearest point \(p_{\theta }\) on the boundary of nucleus in direction \(\theta \):

$$\begin{aligned} f(I,c,\theta ) = \frac{\Vert c-p_{\theta }\Vert }{z}, \end{aligned}$$
(3)

where \(z=\Vert c-p_{middle}\Vert =b_{dis}\) is a normalization factor. \(p_{middle}\) is the intersection of alignment axis and the boundary of nucleus as shown in Fig. 3 (right). \(p_{middle}\) is different in each image due to the eye movement between AS-OCT images mentioned in [9]. Normalization not only ease the rotational misalignment problem but also eliminates scale effects. We sample \(\theta \) every 5 degrees. In this way all the shapes are encoded by K points \(\{\theta _{k},f(I,c,\theta _{k})\}\).

The training procedure is shown in Algorithm 1. The purpose of training is to learn all possible shapes of nucleus. For each shape in the training set, we encode it into K points \(\{\theta _{k},f(I,c,\theta _{k})\}\) and then cluster them into v templates. In the experiments, we only learn the middle part of shape and use quadratic curve fitting to get the other part because the shape of middle part is relatively stable. The middle part is divided into a top part and a bottom part as shown in Fig. 3 (right). For each part, we cluster the corresponding shapes into \(v'\) templates using K-means. Figure 4 shows three normalized templates of the top part and the bottom part.

figure a
figure b

The refinement procedure is shown in Algorithm 2, s is the boundary of nucleus predicted by the U-shaped network. We calculate the center point \(c'\), alignment axis and \(p'_{middle}\) of s. The next step is to find a template closest to s. For each template \(T_{v}\), we finetune the template by multiply \(T_{v}\) by z, where \(z=\Vert c'-p'_{middle}\Vert \). After the multiplication, \(p_{middle}\) of \(T_{v}\) is coincide with \(p'_{middle}\) of s. Then we add an offset to \(T_{v}\). The positive offset means the template move to outer layer. For each transformed template B, we calculate the similarity between B and s. We find a most similar template \(B'\) to replace s. B and s can be represent by K points: \(\{\theta _{k},f_{B}(I,c,\theta _{k})\}\) and \(\{\theta _{k},f_{s}(I,c,\theta _{k})\}\). The similarity between B and s is defined as:

$$\begin{aligned} Similarity(B,s) = \varSigma _{k=1}^{K}{\mid f_{B}(I,c,\theta _{k})-f_{s}(I,c,\theta _{k})\mid }. \end{aligned}$$
(4)

3 Performance Evaluation and Discussion

We acquire the data from CASIA-2000 anterior SS-OCT produced by Tomey Co. Ltd. The dataset contains 20 eyes from 10 people, 8 images per eye. We select 7 people (120 images) for training and 3 people (40 images) for testing. All the images are annotated by one experienced ophthalmologist. The accuracy is measured by the normalized mean squared error (NMSE) between the predicted shape \(S_{p} = {\{x_{i},y_{i}\}}\) and the ground truth \(S_{g} = {\{x_{j},y_{j}\}}\), it is defined as

$$\begin{aligned} NMSE = \frac{\sum _{n_{g}}{\sqrt{(x_{i}-x_{j})^{2}+(y_{i}-y_{j})^{2}}}}{n_{g}}, \end{aligned}$$
(5)

where \(n_{g}\) is the number of annotation points. The results are shown in Table 1. Nucleus top is the top boundary of nucleus. Nucleus bottom is the bottom boundary of nucleus. The same thing is conducted for the cortex.

Table 1. Accuracy of the segmentation
Fig. 5.
figure 5

Example results. From left to right: Input image, ground truth, prediction of the U-shape Network, prediction after side-output layer, prediction after the refinement. Blue lines are ground truth annotations. (Color figure online)

The U-shaped network is trained from scratch. The initial learning rate is 0.001. The network is trained for 70 epochs. The origin image size is 2130\(\,\times \,\)1864, we resize the image into 1024\(\,\times \,\)1024. For shape template, we learn 40 templates for nucleus top and 40 templates for nucleus bottom from the training set. The offset range of template is set to \([-10,10]\). As shown in Table 1, the U-shaped network is better than U-Net [6] and M-Net [1] because of the larger reception field. Side-output layer adds the supervision to mid layers and eases the difficulty to train the network. Side-output layer also integrates different scale information and obviously improve the results. However, multi-scale structure such as M-Net shows no improvement. The shape template refines the boundary of nucleus predicted by the U-shaped network with side-output. The entire refinement process can be seen as finding the most similar template in the training set to replace the boundary of nucleus. Using shape template to refine the nucleus boundary not only improves the performance but also makes the prediction consistent with the physiological structure of nucleus. More results are shown in Fig. 5.

4 Conclusions

In this paper, we use a U-shaped network to produce a preliminary mask and we design a shape template to refine the mask. The experiments show the effectiveness of our method. After the refinement, the result satisfies the physiological structure of the lens.