1 Introduction

Since its introduction in 1991, Optical Coherence Tomography has established itself as an invaluable diagnostic tool. Early publications focused on the segmentation of healthy retina scans. Various approaches have been devised, rendering this problem more or less solved. Consequently, recent publications shifted their attention to the pathological case, with the aim of detecting, assessing and monitoring theses diseases as accurately as possible.

Related work. AMD, a much addressed pathology, e.g. [1], is characterized by cellular debris called drusen, that accumulates between the retinal pigment epithelium (RPE) and the underlying choroid, with size and number indicating the stage of the disease (example in Fig. 1(b)). While leading to considerable deformations of the retina, cell layers themselves stay more or less intact in the early and intermediate stages, considered in the segmentation literature.

Two recent works addressed the more involved problem of segmenting individual layers in the presence of DME [2, 3]. Here fluid leaking from damaged capillaries leads to a swelling of the macular, possibly leading to the destruction of some layers in the retina (Fig. 1(c)). The dataset used in both publications above is very challenging in that regard, and constitutes a harder problem than intermediate AMD. Both approaches rely on ground truth to a) train a classifier to find fluid regions and exclude them from the segmentation [2] or b) train DME specific appearance models [3].

Contribution. We present an approach that is adaptable to different pathologies without the need for ground truth. This is done by using a locally adaptive extended version of a graphical model trained on healthy data. Based on a sum-product network that enables tractable globally optimal inference, we find the optimal combination of regions and modifications and combine them into a final segmentation. We demonstrate its capabilities by segmenting three different pathologies. Besides obtaining accurate segmentations, we are also able to localize the pathological regions. To our knowledge, this is the first time, that a single approach was tested and evaluated for more than one pathology. The basis of our approach is sketched in Sect. 2, followed by introducing our approach in Sect. 3 and discussing results in Sect. 4.

Fig. 1.
figure 1

Three pathologies of different difficulty all segmented by the same approach, by adding pathology-specific shape information to a model for healthy scans.

2 Probabilistic Graphical Model

Adopting our previous approach [4], we model an OCT scan \(y \in \mathbb {R}^{M \times N}\) (N A-Scans with M pixels each) and its segmentations b and c respectively. Here \(c \in \mathbb {N}^{K\cdot N}\) (K boundaries) denotes the discretized version of the continuous boundary vector \(b \in \mathbb {R}^{K \cdot N}\), which is the connection between the discrete pixel domain of y and the continuous boundary domain of b. The graphical model is given by

$$\begin{aligned} p(y,c,b) = p(y|c) p(c|b) p(b). \end{aligned}$$
(1)

We will briefly discuss each component, and refer to [4] for more details.

Fig. 2.
figure 2

Workflow: 1. We segment many local graphical models (introduced in Sect. 2) being either modified (red) or unmodified (green) (see (8)) for various subregions of the B-Scan. 2. We then find the globally optimal combination \(\hat{L}_{1,n}(\varTheta )\) (10) using SPNs (Sect. 3, first part). 3. Finally, we fuse the local models of \(\hat{L}_{1,n}(\varTheta )\) into a smooth segmentation (Sect. 3, second part).

Appearance \(\varvec{p(y|c)}\). Appearance of boundaries and layers is modeled via local class-specific Gaussian densities: The probability of pixel \(y_{i,j}\) belonging to a class \(x_{i,j} \in \{l_1, \ldots , l_n, t_1, \ldots , t_{n-1}\}\) (see Fig. 3(a)) is modeled as Gaussian,

$$\begin{aligned} p(y|c) = \prod _{i=1}^M \prod _{j=1}^N p(y_{i,j}|c), \qquad p(y_{i,j}|c) = \mathcal {N}(\tilde{y}_{i,j}| \mu _{x_{i,j}}, \varSigma _{x_{i,j}}), \end{aligned}$$
(2)

where the class-label \(x_{i,j}\) is determined by the boundary configuration c and \(\tilde{y}_{i,j}\) is a patch around pixel \(y_{i,j}\).

Shape \(\varvec{p(b)}\). The global shape prior captures typical variations of cell layer boundaries. The shape vector b is determined by a linear Gaussian model

$$\begin{aligned} b = Ws + \mu + \epsilon , \quad s \sim \mathcal {N}(0,I), \quad \epsilon \sim \mathcal {N}(0,\sigma ^2). \end{aligned}$$
(3)

The matrix \(W \in \mathbb {R}^{K\cdot N \times m}\) maps the low-dimensional vector \(s \in \mathbb {R}^m\) onto b. Each column of W denotes a certain shape variation that gets added to the mean shape \(\mu \). Given n training segmentations \(X \in \mathrm {R}^{n \times N\cdot K}\), W is obtained by the first m eigenvectors of \(\text {cov}(X)\) weighted by the corresponding eigenvectors, and \(\mu \) simply is \(\overline{X}\). The marginal distribution of b can then be shown to be

$$\begin{aligned} p(b) = \mathcal {N}(b; \mu ,\varSigma = WW^T + \sigma ^2I). \end{aligned}$$
(4)

MRF Regularization \(\varvec{p(c|b)}\). Shape and appearance interact in a Markov random field over the discrete variable c. It is composed of column-wise chain models that allow for parallel inference (with more details to be found in [4])

$$\begin{aligned} p(c|b) = \prod _{j=1}^N p(c_j|b), \qquad p(c_j|b) = p(c_{1,j}|b) \prod _{k=2}^K p(c_{k,j}|c_{k-1,j},b). \end{aligned}$$
(5)

Inference. In [4] we proposed a variational scheme: Design a tractable graphical model q(cb) by adding conditional independences, then infer the full distribution q(bc) by minimizing the Kullback-Leibler (KL) divergence to p(cb|y). We decoupled the discrete and continuous model components, \(q(c,b) = q_c(c)q_b(b)\) while keeping the remaining structure intact: That is \(q_c(c)\) are column-wise MRFs as in (5) and \(q(b) = \mathcal {N}(b;\bar{\mu },\bar{\varSigma })\). Infering q(cb) then corresponds to minimizing the following non-convex optimization problem

$$\begin{aligned} \min _{q_c,\bar{\mu },\bar{\varSigma }} \text {KL} (q(c,b)\Vert p(c,b|y)) = \int _b \sum _c q(c,b) \log \frac{q(c,b)}{p(c,b|y)}. \end{aligned}$$
(6)

Plugging in the definitions of q(cb) and p(cb|y), one can find explicit update equations for the parameters of \(q_c\) and \(q_b\). Of interest for this work is the update step for \(\bar{\mu }\), which is of the form

$$\begin{aligned} A(\bar{\mu }-\mu ) = \mathbb {E}_{q_c}[c]-\mu \quad \Longrightarrow \quad A\bar{\mu } = (A - I) \mu + \mathbb {E}_{q_c}[c]. \end{aligned}$$
(7)

It links the mean of \(q_c\) (\(\mathbb {E}_{q_c}[c]\)) to the mean of \(q_b\) (\(\bar{\mu }\)) via the linear mapping A determined from \(\varSigma \). We will revert to this equation at the end of the next Section, see (12). The optimization alternates between solving the MRF \(q_c\) and updating the parameters of \(q_b\) until convergence.

3 Locally Adaptive Priors

The model described above, when trained on healthy data, is not sufficiently flexible to adapt to unseen pathologies with large deformations. We address this problem by finding a global optimal combination of locally modified submodels using the principle of maximum-likelihood and dynamic programming.

Sum-Product Networks. We assume that models of pathological structure are translation invariant, local and approximately independent. Independence and locality allow to factorize the full distribution p(ybc) into local distributions, an assumption necessary for SPNs. Translation invariance implies that the pathology can appear at any horizontal position in the image.

Recall that W in (3) contains typical shape variations of healthy retina layers. We adapt the graphical model towards an illness, by adding translation-invariant pathology-specific modes \(W^{\text {ill}}\) to W:

$$\begin{aligned} \theta ^{\text {ill}}_{m,n} := \begin{pmatrix}W_{m,n}&W_{m,n}^{\text {ill}}\end{pmatrix}, \qquad \theta ^{\text {healthy}}_{m,n} := W_{m,n}. \end{aligned}$$
(8)

Here subscript mn denote the pruning of W to the region [mn]. Segmenting various such models for different regions [mn] with ill and healthy parameters constitutes step 1 in our workflow (Fig. 2).

Let \(L_{m,n}(\theta ^z_{m,n})\) be the log-likelihood of the segmentation for region [mn]:

$$\begin{aligned} L_{m,n}(\theta ^z_{m,n}) := \log q\left( c_{m,n},b_{m,n}|y_{m,n},\theta ^z_{m,n}\right) , \qquad z \in \{\text {healthy},\text {ill}\}. \end{aligned}$$
(9)

Now let \(X=\left\{ 1,x_{1},x_{2},...,x_{H},N\right\} \) denote the division of y into \(H+1\) regions and let \(\varTheta =\left\{ \theta ^{z_1}_{1,x_{1}},\theta ^{z_2}_{x_{1},x_{2}},...,\theta ^{z_{H+1}}_{x_{H},N}\right\} \) denote the corresponding set of shape modifications. We want to find the combination of submodels with maximal total log-likelihood

$$\begin{aligned} \hat{L}_{1,N}(\varTheta ) = \mathop {\text {argmax}}\limits _{H,X,\theta } \;L_{1,x_{1}}(\theta ^{z_1}_{x_1,x_2}) + \ldots + L_{x_{H},N}(\theta ^{z_{H+1}}_{x_H,N}). \end{aligned}$$
(10)

The global optimum of this combinatorial problem can be found with dynamic programming. Let \(\hat{L}_{m,n}\) be the optimal selection of X and \(\varTheta \) in region [mn]. It can be computed recursively as:

$$\begin{aligned} \hat{L}_{m,n}=\max \left( \max _{x \in \{m, m+1, \ldots ,n\}}\left( \hat{L}_{m,x} + \hat{L}_{x,n}\right) ,\max _{z \in \{\text {ill},\text {healthy}\}} L_{m,n}(\theta ^{z}_{m,n} )\right) , \end{aligned}$$
(11)

which is the maximum between the single best model over region [mn] and the optimal factorization in two adjacent areas. To compute \(\hat{L}_{m,n}\) for regions of width w, we need quantities \(\hat{L}_{m,x},\hat{L}_{x,n}\) for all regions of width \(<w\). Given \(\hat{L}_{m,x}\) and \(\hat{L}_{x,n}\), the complexity is dominated by evaluating \(L_{m,n}(\theta ^{z}_{m,n})\) (9).

Assuming a minimal width \(w_{min}\), this suggests an iterative algorithm: first, compute \(\hat{L}_{m,n}\) for regions of width \(w_{min}\). Then, recursively compute (11) for regions of increasingly higher w. We can reduce complexity even further, by increasing and shifting windows with some fixed step size \(s > 1\). Due to the nature of dynamic programming, many terms \(\hat{L}_{m,n}\) get reused during the optimization. To favor more compact subregions, we add a regularization to (10) to punish models of small size. This algorithm implements globally optimal MAP-inference in a SPN [5] and constitutes step 2 in our workflow (Fig. 2).

Combining Local Models. Because submodels are found independently, they usually constitute a non-smooth segmentation of y. To obtain a smooth solution, we solve a modified version of the full graphical model p(ybc) taking into account the optimal solution (10), corresponding to step 3 in Fig. 2.

The MAP estimate (10) can be interpreted as a graphical model p(ycb) without coupling between subregions. This can be enforced by setting all entries in \(\varSigma \) to zero that belong to boundary positions in two different regions. Furthermore, for subregions identified as ill, we use the modified shape modes \(\theta ^{\text {ill}}_{m,n}\) to calculate the submatrix of \(\varSigma \) via (4). Solving the full graphical model with such a modified covariance matrix would yield the same segmentation as \(\hat{L}_{1,N}(\varTheta )\).

Fig. 3.
figure 3

(a) The names of the segmented retina layers. Surfaces 1–9 lie in between layers \(l_1, \ldots , l_{10}\). (b) A SPN estimate and its smoothed version (c). Note that Bruch’s membrane (surface 9) gets fitted in a post-processing step, described in the results section. (d)–(f) Example segmentations from the DME dataset.

Now to enforce smoothing while staying close to the SPN solution, we replace the system of linear Eq. (7) by the constrained least-squares problem:

$$\begin{aligned} \min _{\bar{\mu }} \Vert \tilde{A}\bar{\mu } - (A - I) \mu - \mathbb {E}_{q_c}[c] \Vert ^2, \qquad \text {subject to} \quad B\bar{\mu } \le \delta \mathbb {1}, \end{aligned}$$
(12)

where \(\mathbb {1}\) is a vector of ones. Each row in the constraint matrix B selects two neighboring entries in \(\bar{\mu }\) belonging to two different subregions and restricts their difference to be less than \(\delta \). This enforces a weak coupling between subregions. Solving the full graphical model with the sparse \(\varSigma \) and the modified update step for \(\bar{\mu }\) then yields a smooth segmentation, as Fig. 3(b) and (c) demonstrates.

4 Results

We demonstrate the flexibility of our approach by segmenting three different pathologies, ranging from minor deformations to severe distortions of the retina structure. We will use the same graphical model for all pathologies, only adapting the pathological shape modes we add. During inference the graphical model can pick the strength and the sign of any mode freely.

We trained the healthy model on the same 35 labeled volumes also used in [4]. As prediction we used the expectation \(\mathbb {E}_{q_c}[c]\) of c. The error metric is the unsigned error between labels and the prediction, averaged over all B-Scans in a volume and A-Scans therein. We used \(\delta = 2\) pixel (see (12)) throughout our experiments. Table 1 summarizes all results.

Table 1. Unsigned error for all tested datasets in \(\mu m\) (1px = 3.87 \(\mu m\)). Surface numbers 1–9 correspond to Fig. 3(a). ‘–’ marks the absence of labels.

Diabetic Retinopathy. The dataset of [6] contains 10 subjects (5 B-Scans each) affected by mild non-proliferative diabetic retinopathy (RP). As only small deformations occur, we used our graphical model of [4]. Since the dataset lacked relative positions of B-Scans inside the volume, which we require to select a shape prior, we estimated the position as following. For each B-Scan we tested all shape priors and (a) used the one with the largest model likelihood and (b) the one with the smallest error. This yielded a lower and upper bound on the true error, if the information would have been available, which we averaged for the final result.

AMD. We used an in-house dataset with 8 Spectralis volumes of early and intermediate AMD and labels for surfaces 1, 8 and 9 for all 19 B-Scans. Surface 8 was labeled by a physician. We added one mode with the sine function evaluated between 0 and \(\pi \) for surfaces 6–9, simulating the effect of those layers being pushed up by a circular-shaped fluid deposit underneath. While Bruch’s membrane (surface 9) is supposed to lie beneath the fluid region, better segmentations where obtained if it was included in the shape mode. The final segmentation for this surface was given by the conditional mean \(\mu _{a|b} = \mu _a - (K_{aa})^{-1}K_{ab}(x_b-\mu _b)\) of (4), where \(x_b\) denotes the part of the segmentation identified as healthy.

DME. The dataset published by Chiu et al. [2] consists of 10 Spectralis volumes with 11 labeled B-Scans per volume. While volumes 6–10 are mild and intermediate cases, volumes 1–5 constitute advanced DME cases, with disappearing layers (Fig. 1(c)) and advanced texture artifacts due to highly reflective regions characteristic for DME (Fig. 3(c)).

To reduce sensitivity to the texture artifacts, we added patches of size \(7 \times 7\) and \(3 \times 3\) (besides the standard \(15\times 15\) patches). To deal with the disappearing layers, we dropped the segmentation in regions of low intensity if the difference between surface 1 and 9 exceeded a threshold. As pathology-specific modes we added a set of connected linear functions to boundaries 1–5, which could only be adjusted jointly. Furthermore, as DME can be accompanied by a swelling of the nerve fiber layer (NFL), we added linear functions to surfaces 1 and 2.

Karri et al. [3] also tested their approach on this dataset, but only published results for volumes 6–10, using the first 5 volumes for training. Using their published code (https://github.com/ultrai/Chap_1), we could reproduce their results for volumes 6–10, as well as reverse training and test set to obtain results for volumes 1–5. Results are the displayed in the lower half of Table 1. For a fair comparison, we also applied the mechanism for dropping segmentations.

Fig. 4.
figure 4

Estimates of fluid regions due to the pathological modes \(W^{\text {ill}}_{m,n}\) used.

In general, less difficult volumes 6–10 yield lower errors for all approaches as expected. Karri’s and our approach perform best. The situation changes for the more difficult volumes 1–5. Now Chiu’s and our approach perform on par, beating the one of Karri et al., which lacks sufficient shape regularization [3].

Pathology Hinting. Figure 4 demonstrates another benefit of using a shape prior. Given a segmentation b, one can calculate the latent variable s, which indicates how much each mode was utilized (3). The red surfaces indicate the usage of pathological modes \(W_{m,n}^{\text {ill}}\), plotted below the lowest boundary affected.

5 Discussion

We presented a method for the segmentation of pathological OCT data, combining a graphical model and sum-product networks. While our approach yields state-of-the-art performance, it does not require labeled ground truth data. Furthermore, it can segment several pathologies. To our knowledge, this is a feature not demonstrated yet by any other approach. Last but not least, it can localize the pathological area, which could be valuable for practitioners. An evaluation of this feature will be part of our future work

The current approach was evaluated in 2-D, requiring between 30 and 60 s per B-Scan. While all parts of our workflow naturally extend to 3-D, the number of submodels in step 1 grows exponentially, making a direct conversion too costly. Future work may include mechanisms to prune the SPN search, reducing the amount of tested submodels. This would benefit the current 2-D as well as any potential 3-D approach.