Keywords

1 Introduction

Children are sensitive to radiation dose. The use of ionizing radiation such as x-ray on children needs extra care. In some settings, segmenting of the liver from x-ray images may be required. Conventional CT images from dedicated CT scanners are typically involved due to high image quality but radiation dose is a concern. Combined PET-CT scanners generate CT images with lower radiation dose but the image has lower contrast. Automatic segmentation of organs from these low contrast CT images is challenging. A review of liver segmentation techniques can be found in [1, 2].

A probabilistic atlas has been widely used for liver segmentation [3,4,5,6] and has produced good segmentation outcomes. In [3] Linguraru et al. have investigated the use of a probabilistic atlas for liver segmentation in low-contrast CT images. Based on a 20 patients’ datasets (10 for training and 10 for testing) a Dice index of \(88.2\pm 3.7\) was achieved. The use of probabilistic atlas information can be augmented with other information. Li et al. [6] supplements the probabilistic atlas with primary liver shape and localization obtained from the PET scans. The probabilistic atlas was built using 60 CT studies from dedicated CT scanners. The PET-guided probabilistic atlas approach was applied on 35 PET-CT studies with a volume overlap percentage (VOP) of \(92.9\% \pm 2.1\). In this approach, larger number of training data lead to improved segmentation results [3, 6, 7].

In more recent studies, statistical shape model (SSM) has attracted a lot of interest [8,9,10,11,12,13]. A review of the statistical shape models for 3D medical image segmentation can be found in [14]. The results are promising but they also require large datasets for the SSM. For instance, in [13], 120 cases were used to develop the SSM. Annotation of the liver in the large training set is laborious.

The Statistical Region Merging (SRM) technique [15] is founded on Probability and Statistical theory and has been proposed for natural scene image segmentaion. The technique merges pixels into statistically homogenous regions (superpixels) to be regrouped into target objects/organs. Lee et al. [16] employed the SRM method [15] for multi-organ segmentation on non-contrast CT images. The technique has also been extended to 3D-SRM [17] for the spatial connectivity of volume CT data. Medical image segmentation based on the SRM method does not require large dataset for developing probabilistic atlas or statistical shape model. It does employ a prior knowledge of shape and location but primary segmentation of the liver on PET scans such as in [10] or the use of a simple model [18] suffices. In this paper, an adaptive kernel-based SRM (kernel-SRM) is proposed for segmentation of low contrast CT images. The method uses a kernel regressor and employs regional statistics. Results are compared with that of the SRM method.

2 Proposed Method

2.1 Adaptive Kernel-SRM

Consider a gray level intensity image of size \(M\times N\)

$$ I: \{1,2,\ldots ,M\}\times \{1,2,\ldots N\}\rightarrow [0,255) $$

where \(I(m,n)=f(m,n)+\epsilon \), with f being the true intensity value and \(\epsilon \) the noise. The task is to estimate the unknown function f. In 1964 Nadaraya [19] (and also Watson [20]) proposed the following non-parametric estimator of the regression function

$$\begin{aligned} f(x,y) = E(I(X,Y)|(X,Y)=(x,y)), \end{aligned}$$
(1)

where E denotes (conditional) expectation, and ((XY), I(XY)) is the observed couple of random variables.

In order to estimate the regression function in Eq. (1), a non-parametric kernel-based estimator of Nadaraya-Watson type, which combines estimation and smoothing of the regression function, is commonly used. In this study, we consider a local version of the estimator defined for a given region in the image. The Nadaraya-Watson local estimator of the regression function f(mn), for a given region \(\mathcal{R}\), can be defined as

$$\begin{aligned} \widehat{f}(m,n) = \sum _{{(m_i,n_i)\in \mathcal {R}}} w_i I(m_i,n_i), \end{aligned}$$
(2)
$$\begin{aligned} w_i = \frac{K\left( I^{(m,n)}_i\right) }{\sum _{{(m_i,n_i)\in \mathcal {R}}} K\left( I^{(m,n)}_i\right) }, \end{aligned}$$
(3)
$$\begin{aligned} I^{(m,n)}_i = \frac{I(m,n)-I(m_i,n_i)}{h_I}, \end{aligned}$$
(4)

where \((m_i,n_i)\in M\times N\), K is a kernel function and \(h_I\) is the smoothing parameter for a given image I. Observe that (3) gives a weighted contribution from \(I(m_i,n_i)\) to the estimated (true) value at \((m_i,n_i)\).

As an example, consider the normal distribution \(N_{\mu ,\sigma }(m,n)\) with mean \(\mu \) and standard deviation \(\sigma \)

$$\begin{aligned} N_{\mu ,\sigma }(m,n) = \frac{1}{\sigma \sqrt{2\pi }}\exp \left( -\frac{1}{2}\left( \frac{I(m,n)-\mu }{\sigma }\right) ^2\right) . \end{aligned}$$
(5)

The kernel is then defined as

$$\begin{aligned} K(I^{(m,n)}_i) = \frac{1}{\sigma \sqrt{2\pi }}\exp \left( -\frac{1}{2}\left( \frac{|I^{(m,n)}_i|-\mu }{\sigma }\right) ^2\right) \end{aligned}$$
(6)

where |.| denotes absolute value.

From the definition of weights (3), one can observe that for a given image pixel the pixel’s estimated (new) value will be most influenced by those local pixels whose intensities differ from the given one by the expected value of the intensity across the region. In other words, pixels having intensities very different from the given pixel intensity (with difference significantly bigger or smaller than the average intensity of the region) - noise pixels - will have little or no impact on the estimated intensity value (value of the regression function). Hence, assuming that the noise pixels have the distribution following the kernel function distribution the formula (2) can be effectively used to reduce noise in the image.

Using the notation in [15], the Statistical Region Merging (SRM) method allows merging of two regions \(R, R'\) if

$$\begin{aligned} |{\bar{R}}-{\bar{R}'}|\le \sqrt{b^2(R) + b^2(R')} \end{aligned}$$
(7)

where

$$\begin{aligned} b(R) = g\sqrt{\frac{1}{2Q|R|}\ln \frac{2}{\delta }}, \end{aligned}$$
(8)

|.| denotes cardinality of a set, \(\bar{R}\) denotes the average intensity across the region R, Q is a parameter which controls coarseness of the segmentation, \(\delta = \frac{1}{6\,|I|^2}\) and g is the number of image intensity levels.

By incorporating an appropriate kernel function into the regional expectation \({\bar{R}}\) one can alleviate the effect of noise. Each time two pixels are considered for merging Eq. (2) is used to modify intensities in spatial neighbourhoods of these pixels. The radius for this neighbourhood was fixed to 2 pixels. This proposed method shall be called Adaptive Kernel-Based Statistical Region Merging method, and abbreviated as “kernel-SRM”.

2.2 Determination of the Kernel Function

As described in the previous section, in order to successfully alleviate noise the kernel function corresponding to the noise distribution has to be determined. This is achieved by defining a structure-free region outside the human body on the CT image for image noise estimation. The primary variation of the image intensity in this region is due to noise. The histograms are built using the long-standing Sturges’ rule [21] to estimate the number of bins \( k = 1+\log _2(n), \) where n is the number of data points.

The best fit probability density function was selected from the following range of distributions: Rayleigh, normal, Poisson, gamma and the generalized extreme value (GEV) distributions. In determining the most appropriate distribution, the Mean Square Error (MSE) was calculated for each fit and the distribution with the smallest MSE value was selected as the best fit.

3 Experiments and Results

3.1 Data

Thirty-seven paediatric liver CT images acquired from combined PET-CT scanner were included in this retrospective study. The images were de-identified and were obtained from a hospital in Sydney, Australia with ethics approval. The Siemens Emotion Duo scanner was used in acquiring the images with pixel size \(0.98\times 0.98\) mm and slice separation 0.34 mm. Ground truths of the liver regions were delineated by an expert in human anatomy and physiology (MC). CT images acquired from combined PET-CT scanners are of low image contrast and high noise level when compare with CT images acquired from dedicated CT scanners.

3.2 Kernel-SRM Segmentation

Image Pre-processing. Adaptive kernel-based segmentation is a computationally demanding process. To decrease the processing time, each CT image was automatically cropped to the area of the patient body. The CT images were then subsampled by 2 using the nearest-neighbour method to further reduce computational time.

Kernel Function/Image Noise Distribution Estimation. The kernel function (Sect. 2.2) of each CT image was estimated by automatically analysing the noise distribution in that image. In each image, the region comprising the top 120 rows of pixels across the full-width of the image is designated for noise distribution estimation. This region is outside the human body and is structure-free (anatomy-free). The histogram of the pixel intensities in this region was analysed (Sect. 2.2). Table 1 shows that in all but one case, the noise distribution was best estimated using normal distribution. For the remaining single case, the image noise has a generalized extreme value (GEV) distribution. In the proposed kernel-SRM method, the kernel function for each CT image was determined based on the estimated noise distribution of that image. Further analysis of these noise distributions shows that, for the normal distributions, the average of the mean parameter is 24.6 (std 0.45; range 23–25.2) and the average of the standard deviation parameter is 8.9 (std 0.6; range 7.8–9.9). One image has the generalized extreme value distribution with parameter \((\mu , \sigma , \xi ) = (-0.19, 8.80, 19.92)\). Guided by these results, the normal (Gaussian) distribution with \(\mu = 24\) and \(\sigma = 9\) was selected as the kernel function.

Table 1. Determination of the kernel function/image noise distribution estimation. The distributions Rayleigh, normal, Poisson, gamma and GEV were considered for the CT image noise estimation. The best fit with the smallest Mean Square Error (MSE) for individual CT are shown in below.

Kernel Bandwidth Optimizing. The smoothing parameter \(h_I\) (Eq. 4), also known as the kernel bandwidth, was determined experimentally by searching over a wide range [1, 30]. Table 2 shows that the best segmentation result is achieved for \(h_I = 3\). In addition, for \(h_I \in \{2,3,4,5\}\), the segmentation outcomes are similarly good with the average Dice index and average Hausdorff distance (in pixels) over all CT images being (0.84, 0.85, 0.84, 0.84) and (10.04, 8.77, 9.98, 10.48), respectively. This suggests that the performance of the proposed kernel-SRM is robust to small changes in the parameter \(h_I\). The last column in Table 2 reports the number of CT images in which the proposed kernel-SRM method failed to generate a segmentation of the liver (Sect. 3.3).

Table 2. Optimization of the kernel bandwidth \(h_I\). The parameter \(h_I\) was searched over the range [0.6, 30]. For each value of \(h_I\), the average Dice index and the average Hausdorf value over all 37 kernel-SRM segmented livers are shown. The last column shows the number of CT images in which the kernel-SRM failed to segment the liver. The value of \(h_I\) producing the best results is boxed.

3.3 Segmentation Representation and Evaluation Measures

As in the Statistical Region Merging (SRM) method in [15], through the employment of an appropriate value of the parameter Q (Eq. 8), the kernel-SRM was set to over-segmented the CT images, thereby, partitioned the images into statistically homogeneous regions (superpixels). These superpixels are non-overlapping. The union of these superpixels gives the exact image.

Over-Segmentation and Eligible Superpixels. In a perfect segmentation, the segmented liver would be ideally represented by a single superpixel. However, anatomical structures on CT images are not homogeneous. As such, the representation of the liver (target organ/tissue) is relaxed such that the segmented liver is represented by the union of one or more statistically homogeneous regions (superpixels). For a superpixel to be included in the segmented liver, over \(50\%\) of the superpixel must overlap with the ground truth which is unknown but can be estimated using different approaches such as the model-based approach [18]. In order to evaluate the performance of the proposed kernel-SRM against that of the SRM method without the interaction with ground truth estimated, the ground truth is used in this experiment. A superpixel satisfying this condition of \({>}50\%\) overlap is called an ‘eligible’ superpixel. Thus, the kernel-SRM/SRM segmented liver is the union of these ‘eligible’ superpixels. Though the number of the eligible superpixels in the segmented liver does not directly associate with the accuracy of the segmentation, when two segmentation outcomes are of similar accuracies, the one with a smaller number of eligible superpixels is of lower complexity and is a preferred solution. Figures 1a and b show two examples of the kernel-SRM segmentation outcomes. The ground truth was outlined in black and the eligible superpixel(s) that contributed to the kernel-SRM segmentation results are shown in patch(es) of color (false color for visualization). In the first example (Fig. 1a), only one eligible superpixel with a major (\({>}50\%\)) of the superpixel in the ground truth was found. In the second example (Fig. 1b), three eligible superpixels were found. Though two of them are small, over \(50\%\) of each the superpixel is in the ground truth. The kernel-SRM segmented liver is the union of these superpixels.

Failure - Resulting in Empty Set. If the liver is presented on a CT image but no eligible superpixels was found, this means that the liver segmentation on that image failed. The generated segmentation of liver is an empty set and no segmentation result was presented. Figures 1e and f show the SRM segmentation outcomes in two examples. The SRM statistically homogeneous regions (superpixels) were presented in false color for visualization. In Fig. 1e, less than \(50\%\) of the green superpixel overlap with the ground truth whereas in Fig. 1f, less than \(50\%\) of the purple superpixel overlap with the ground truth. Thus, the segmentation outputs are empty sets in both examples.

Segmentation Accuracy - Dice Index and Hausdorff Distance. For segmentation accuracy, Dice index and Hausdorff distance are measured on the segmentation outcome (union of the eligible superpixels), if eligible superpixel(s) is/are found. Dice index measures the agreement between the machine segmentation and the ground truth whereas Hausdorff distance measures the largest deviation between the two. If a segmentation outcome has no eligible superpixel, i.e. the segmentation failed, it follows that Dice index \(= 0\) and the Hausdorff distance cannot be calculated.

Fig. 1.
figure 1

Two Examples. The ground truth is delineation in black in all panels. Example 1 is presented in (a, c, e) and example 2 is presented in (b, d, f). Using the kernel-SRM, (a) 1 eligible superpixel (red) overlap with the liver was found in Example 1 and (b) 3 eligible superpixels (grey, green and dark blue) were found. The green and dark blue superpixels are small but over \(50\%\) is inside the ground truth, making them eligible. (c, d) kernel-SRM segmentation. The Dice index for (c) Example 1 is 0.88 and that for (d) Example 2 is 0.83. (e, f) SRM segmentation failed to produce any eligible superpixels in both examples as less than \(50\%\) of the green superpixel in example 1 and the purple superpixel in example 2 are inside the ground truth. Dice \(=0\) for both examples. (False color for visualization) (Color figure online)

3.4 Results and Discussions

Segmentation results of the proposed kernel-SRM and that of the original SRM are compared in this section. The results are generated with the kernel function a normal distribution with mean \(\mu =24\) and standard deviation \(\sigma =9\) (Sect. 3.2), an optimal bandwidth \(h_I=3\) (Sect. 3.2), and a g value of \(g=256\) for 256 grayscale level images and a Q value (Eq. 8) of \(Q=256\) determined empirically. Table 3 shows the average Dice index, average Hausdorff distance and the number of failures (Sect. 3.3). Kernel-SRM performs better than the SRM. The average Dice index and average Hausdorff distance over all 37 CTs were 0.79 and 19.06 for SRM and 0.85 and 8.77 for kernel-SRM, respectively. Moreover, SRM failed to segment (produced empty sets) in 5 cases while kernel-SRM was successful (no empty sets) in segmenting all images. Figures 2 and 3 show the detail of SRM and kernel-SRM comparisons in Dice index and Hausdorff distance, respectively. The five SRM failed examples are identified with Dice index \(=0\) in Fig. 2. Hausdorff distance cannot be calculated for the 5 failures and are shown with infinite lines in Fig. 3. Both Figs. 2 and 3 show that the kernel-SRM performs better in almost all cases.

Table 3. Comparison of SRM and kernel-SRM segmentation results.
Fig. 2.
figure 2

Dice index - SRM vs. kernel-SRM liver segmentation results.

Fig. 3.
figure 3

Hausdorff distance (in pixel)- SRM vs. kernel-SRM results.

4 Conclusion

Segmentation of abdominal organs in low image contrast CT images generated from combined PET-CT scanners is challenging. This paper extended the well founded statistical region merging (SRM) method with a built-in kernel that handles the high level of image noise adaptively for every pair of regions to be considered for merging. Results showed that the proposed adaptive kernel-based statistical region merging (kernel-SRM) performs significantly better when compared with the original SRM method. The results, however, were found using a small dataset. Future work in validating the results with a larger dataset is required.