1 Introduction

In many occasions there is a strong need for detection of borders of objects which look like patterns of random textures. Such borders could be hardly detected by human visual system when textures differ by their high-order statistics only [1, 2]. Recently, some advanced methods of detecting hardly visible borders between the random image textures have been suggested [1, 3]. Moreover, it was experimentally proven that these methods capitalizing on so-called “generalized gradient” are able to highlight the border which is completely invisible for human eye [4, 5].

This paper addresses the problem of detecting borders of malignant tumors in native lung CT images under conditions of presence of atelectasis. The atelectasis term denotes the collapse of all or part of a lung due to bronchial plugging or the chest cavity being opened to atmospheric pressure. This can happen when the vacuum between the lung and chest wall is broken, allowing the lung to collapse within the chest (e.g., pneumothorax), when the lung is compressed by masses in the chest, or when an airway is blocked, leading to slow absorption of the distal air into the blood without replenishment. In this work we were dealing with the bronchial compression caused by lung cancer tumors, the most common cause of the atelectasis.

Computed tomography (CT) is the primary modality for imaging lung cancer patients. However, the problem is that on CT scans the lung regions with the atelectasis and malignant tumors have quite similar attenuation values. Therefore the visual discrimination and separation of the atelectasis and tumor is hardly possible. Yet, accurate tumor segmentation is strongly necessary by the following two reasons. First, the correct tumor localization, segmentation, and precise measurement of tumor diameter play a crucial role in therapy planning and choosing suitable surgery technique. Second, if the radiation therapy is prescribed, an exact separation of tumor border is required for precise targeting and delivery of the ionizing radiation dose accurately to the tumor but not to the surrounding tissues.

Thus, the purpose of this particular paper is to present results of an experimental study of the ability of the generalized gradient method to highlight hardly visible borders of objects. The study was conducted using three different groups of images. They were comprised by 3D synthetic images and specially-designed physical gelatin phantom made by authors and scanned using Siemens Somatom Definition AS scanner. Finally, the utility of the method was examined on the problem of borders detection between malignant lung tumors and the atelectasis regions based on 3D CT images of 40 lung cancer patients.

The first version of the generalized gradient method was introduced in [2] as so-called classification gradient and slightly improved afterwards. The classification gradient method makes use conventional technique of calculating image gradient at each pixel position by means of comparing pixel/voxel values taken from orthogonal halves of appropriately sized sliding window. However, apart from the traditional approaches where the gradient magnitude is computed simply as the intensity difference (estimated by convolution with one or other matrix of weights), the generalized gradient method treats the voxel values taken from window halves as two samples which need to be compared in a suitable way. Once it is done, the value of the corresponding dissimilarity measure is treated as a “gradient” value at the current sliding window position for a given orientation X, Y or Z.

One may prefer to employ a sophisticated technique of comparing two samples of voxels such as the voxel classification procedure performed with the help of an appropriate classifier [3]. In these circumstances the resultant classification accuracy is treated as the local image gradient magnitude which is varied in the range of 0–100%. Along with recent classifiers, the sets of voxels may, for example, be compared in a statistical manner using conventional t-test. This case the resultant t-value is treated as a measure of dissimilarity that is as the signed local “gradient” value.

It should be noted that despite the fact that t-test also compares mean values of two voxel samples, it proceeds in a more correct way taking into account the variances of two distributions. In addition, the t-test has an inherit threshold of significance at |t| = 1.96; p < 0.05 what is often very problematic to set up in conventional intensity convolutions.

2 Materials

In this study we used three kinds of images containing regions with weak borders which are difficult to detect by human visual system: synthetic 3D images, CT image of the physical gelatin phantom and CT images of chest of 40 patients. Image regions did not form coherent spatial pattern, but rather looked like random textures with difference being the probability density functions of values inside them.

2.1 Synthetic Images

For this experiment, we created a synthetic 3D image with size (512 × 512 × 50) voxels. Inside this volume a parallelepiped was placed with distances along the corresponding volume margins equal to 128, 128 and 12 voxels. The grey values of the voxels of the inner and outer regions were drawn from two Pearson distributions with different parameters, having the same mean value of μ = 200 and standard deviation σ = 20, but different skewness values. The inner part was filled with values to have the skewness ω in to be as close as possible to 1 taken throughout all the image slices, and voxels from the outer part were filled with values to have the global skewness ω out  = 2.

It should be noted that due to the probabilistic technique of values generation the exact equality of their mean, standard deviation and skewness to the expected ones is hardly possible.

2.2 Physical Gelatin Phantom

The purpose of creating physical phantom was to obtain CT image of some real object, consisting of several adjacent parts with low relative contrast (layers). The phantom was supposed to simulate the commonly encountered problem when objects present on radiological images have barely visible boundaries.

To create such a phantom, we used a cylindrical container filled with several horizontal layers of gelatin. Different levels of CT brightness of each layer were obtained by means of dissolving certain pre-calculated amount of radiocontrast agent Omnipaque in liquid gelatin before its solidification. To control the amounts of radiocontrast agent some provisional measurements of Omnipaque solutions’ CT-brightness have been made (see Fig. 1(a) and (b)).

Fig. 1.
figure 1

(a) General view of the installation; (b) cups with different amounts of dissolved Omnipaque solution at the calibration stage; (c) phantom scheme; (d) one slice of the phantom CT image.

To the amounts of dissolved Omnipaque solution were chosen to increase pure gelatin (reference) CT-brightness by 4, 8, 16 and 32 Hounsfield unit (HU) for different layers relative to the brightness of the reference layer. The reference layer was located at the most bottom of the container. The brightest layer was placed next, then the others (see Fig. 1(c)).

Besides, an additional layer of water with Omnipaque solution introduced was poured to the most top. Thus, one more low-contrast border was made between the upper gelatin layer and the liquid layer.

2.3 Malignant Lung Tumors

In this study, we used 40 CT images of thorax of patients with lung cancer and the atelectasis of a portion of the lung as diagnosed by a qualified radiologist and confirmed histologically. Thirty-three of them were males and remaining seven were females. The age of patients ranged from 41 to 80 years with the mean value of 61.7 years and standard deviation of 8.7 years. CT scanning was performed on a multi-slice Volume Zoom Siemens scanner with the standard clinical kV and mA settings during the one-breath hold. The voxel size of 9 tomograms was in the range of 0.65–0.74 mm in the axial image plane with the slice thickness equal to the inter-slice distance of 1.5 mm. The voxel size of 31 remaining tomograms was 0.68 mm in the axial image plane with the slice thickness equal to the inter-slice distance of 5.0 mm. No intravenous contrast agent was administered before the collection of scan data what is a significant detail of present study. Typical examples of original CT image slices are shown in Fig. 2.

Fig. 2.
figure 2

Example slices of typical lung CT images of two patients with atelectasis (ATL) and malignant tumor (TUM). Patient 1 (left image) suffering from the cancer of middle bronchus with atelectasis of the right middle lobe of the lung. Patient 2 (right image) with the cancer of right upper bronchus and atelectasis of the back segment of the upper lung lobe.

3 Methods

The present study was performed in two main stages. The first, exploratory stage was dedicated to experimental assessment of intensity differences between the regions of malignant tumors and atelectasis. In the second stage we examined the abilities of generalized gradient techniques to highlight borders between the two.

3.1 Exploring the Intensity Differences

The approach followed in this stage was to sub-sample image voxels from two types of lung regions at random and to evaluate the significance of the intensity differences as a function of the sample size (i.e., the number of voxels in each voxel subset). In order to ease the interpretability of the results, the sample sizes were selected so that they correspond to the number of voxels in square-shaped image slice patches with the side size of 3, 4,…, 10, 15, 20, and 30 voxels that is 9, 16,…, 100, 225, 400, and 900 sample voxels respectively. This does not mean that the analysis methodology we developing is 2D-oriented, though. In all the occasions image voxels were sampled from the atelectasis and tumor regions at random. All statistical and pattern recognition analyses described in this work were performed using R, a language and environment for statistical computing which is available for free.

The atelectasis and tumor classes were compared by various ways to eliminate possible bias of one singe method. First, the significance of intensity differences between the two classes was assessed statistically using a two-tailed unpaired t-test with the significance level of t-statistics set to p < 0.05. The resultant t-values, which depend on the degree of freedom (sample size) were converted into z-scores to enable direct comparison of statistical significance obtained in different experiments as well as to calculate the mean significance scores over all 40 patients correctly. For each patient and each sample size the procedure consisting of random voxel sub-sampling and performing t-test was replicated 100 times in order to obtain reliable results.

At the second step, the atelectasis and tumor voxel samples (i.e., vectors of voxels sorted in descending order) were clustered using commonly known Hierarchical Clustering, Support Vector Machines, and Random Forests methods. For each sample size and each patient the classifiers were trained on a training sets consisting of 10 atelectasis and 10 tumor samples and tested on the datasets of the same size. Training and test sets were sampled independently. There was no voxels included in both training and test sets simultaneously. The three classifiers were run on exactly the same data. Each test was replicated 100 times in order to obtain statistically representative estimates of the classification accuracy.

The classification accuracy was corrected for agreement by chance using the classAgreement function provided with R. For two classes this particularly means that the minimal accuracy value is 0 but not 50%. The corrected classification accuracy was used as a measure of the dissimilarity of two lung regions as well as the basic value for estimating possible image segmentation accuracy. The total number of performed classification tests was: 40 patients × 11 sample sizes × 3 methods × 100 replications = 132 000.

3.2 Detecting Tumor Borders Using Generalized Gradient

The above informal definition of the generalized gradient gives the essence of the method used in present study. The exact computational procedure is a bit more complicated. A list of key details which needs to be considered for better understanding and correct implementation of the method is given below.

Despite the method may be used for computing generalized gradient maps of 2D images, it is better suited for 3D because it is supposed to deal with relatively larger samples of voxels taken from sliding window halves.

It is clear that with no respect to the nature and underlying mechanism of the procedure used for comparing two voxel sets taken from adjacent window halves, it is highly desirable to have the resultant dissimilarity estimate as precise as possible. In order to achieve this, a bootstrap multi-step meta-procedure can be employed (see, for example, a good tutorial [6] written for non-statisticians). In practice it particularly means that at each computational step not the whole amount but a fraction of voxels should be sub-sampled in a random manner from window halves for executing chosen comparison procedure such as t-test. And this step should be repeated about 100 times.

The final dissimilarity measure is computed as a mean value of corresponding particular dissimilarity values that is as the mean t-value computed over the all 100 particular trials in case the t-test procedure is employed. The same holds true in case the final clustering accuracy value is calculated based on particular classification steps, etc. The natural payment for the increased accuracy of assessing the difference by means of bootstrap is the growth of computational expenses for about two orders. For instance, in case of 3D images the total number of elementary t-tests which need to be performed resides around 300 with about 100 tests accomplished for computing gradient components GX, GY and GZ along each of three orthogonal image axes X, Y and Z.

Once the generalized gradient components GX, GY and GZ are computed using the procedure of voxel set comparison, the gradient magnitude G x,y,z at a particular 3D voxel position (x, y, z) is calculated as the Euclidean norm of the vector. In general, the sliding window may have not three orthogonal orientations of voxel sampling like traditional axes X, Y and Z but some alternative configurations too. In this study we also utilized a bit more sophisticated configuration of sliding window depicted in Fig. 3. It supposes to use six directions equally-spaced in 3D. Sampling in each direction is performed using corresponding spherical sub-windows with radius R. Moreover, the sub-windows are moved apart from the central voxel at the distance d. This was done to address the problem of smooth and wide object borders. Finally, the resulting generalized gradient value at a particular 3D voxel position (x, y, z) is calculated from the particular values in each direction \( G_{i} ,i \in \{ 1, \ldots ,6\} \) as \( G^{x,y,z} = (\sum\nolimits_{i = 1}^{6} {G_{i}^{2} } )^{1/2} \).

Fig. 3.
figure 3

Configuration of the gap sliding window.

4 Results

4.1 The Intensity Differences Discovered

Results of statistical assessment of the significance of intensity differences between the atelectasis and tumor regions of lung CT scans of 40 patients are reported in Fig. 4. As it can be seen from the figure, the fraction of significance different voxel samples and the mean significance scores varied considerably depending on the patient. For instance, for one patient the percentage of significance different samples exceeds notable 60% already on 9 voxels and achieves 100% with the sample size as little as 36 voxels (see the left panel of Fig. 4) while in other it starts close to zero with 9 voxels and finishes at about 10% only. Similarly, for some patients the mean z-score achieves the significance threshold z > 1.96 which is equivalent to p < 0.05 on the sample size of 9–25 voxels (see the right panel of Fig. 4) while for others these values remain insignificantly low even on reasonably large samples consisting of 400–900 voxels.

Fig. 4.
figure 4

Significance of the intensity differences of lung atelectasis and tumor voxel samples for 40 patients (curves) as a function of the voxel sample size. Left panel: percentage of voxel samples for which the intensity difference is statistically significant at p < 0.05. Right panel: the mean value of significance score z. In both occasions image voxels were sampled from atelectasis and tumor regions at random and each measurement is replicated 100 times.

On the contrary, the voxel sample classification results demonstrate much more consistent behavior (see Fig. 5). As it can be revealed from the figure, a very useful property of the classification approach for separating the atelectasis and tumor regions is that the results are converged to 90–100% of the classification accuracy for relatively large samples in each patient.

Fig. 5.
figure 5

Dependence of the classification accuracy on sample size of lung atelectasis and tumor voxels for 40 patients (curves) when using Hierarchical Clustering (top left plot), Support Vector Machines (top right plot), and Random Forests (bottom left plot) clustering methods. Each test was replicated 100 times for the reliability of results. The mean and standard deviation accuracy computed over 40 patients is given on the bottom right panel.

As for the comparative efficiency of the three classification methods, it is easy to see from Fig. 5 that the Hierarchical Clustering algorithm outperforms both SVM and Random Forests for each voxel sample size. Moreover, in case of Hierarchical Clustering, the classification accuracy corrected for the agreement by chance starts from the value above 50% almost for each patient and achieves 90% on the sample size of 225 voxels for all 40 patients except for 2 outliers. The mean and standard deviation values of the classification accuracy computed over 40 patients (see the bottom right plot of Fig. 5) make the superiority of Hierarchical Clustering method evident and renders other two as almost identical in the voxel sample classification task. Considering that the one possible segmentation technique could be based on a direct voxel sample classification using sliding window of suitable size, the mean accuracy threshold should be set to a reasonably high value, say 95%. If so, the minimal sample size should be set to approximately 100–200 voxels. This corresponds to the window size of about 12 × 12 voxels (i.e., the half window size is 4.1 mm) for 2D and less than 6 × 6 × 6 voxels (2.0 mm) for 3D case.

4.2 Detected Tumor Borders

The results of application of generalized gradient to synthetic images are depicted on Fig. 6. This experiment shows the capability of the generalized gradient (GG) maps calculated with different presets to detect weak borders, and the results are as they were expected. Figure 6(c) and (d) show the clear border between inner and outer regions. We used the SVM classification accuracy as the difference measure improved by the bootstrap procedure and the gap sliding window. No a priory information about border orientation, width, smoothness or values distribution was used.

Fig. 6.
figure 6

(a) Original synthetic image; (b) GG map using t-test, R = 4, d = 2; (c) GG map using SVM, gap window’s R = 3, d = 1; (d) GG map using SVM, R = 4, d = 2.

Figure 6(b) depicts the GG map calculated over gelatin phantom using conventional t-test to estimate the dissimilarity measure between values sampled from the gap window halves. Though this map calculation is much faster than of the previous ones, in this particular case it gives no positive outcome, because t-test does not react on the difference of skewness and higher orders moments. However, further we will show that it also provides useful results retaining the same relative advance in speed when used for processing of real images.

The resultant GG maps of the image in Fig. 1(d) are depicted in Fig. 7. Left column contains maps calculated using t-test to estimate dissimilarity measure and gap sliding window, middle column – also t-test and spherical sliding window, right column – spherical sliding window and dissimilarity measure is the difference of mean values sampled from window halves. Sizes of all sliding windows along first and second rows were chosen to have almost the same number of voxels. Unlike the previous synthetic images, the gelatin phantom layers have definite differences of mean HU values, this is why it is fairly easy problem to detect weak borders using different presets of the method. Nevertheless, this figure may help to choose the preferable method’s parameters depending on the desired result. The middle layers of gelatin seem to be interdiffused and there were no detectable borders.

Fig. 7.
figure 7

Gelatin phantom: (a), (d) – GG maps calculated using gap window with R = 4, d = 2 and R = 5, d = 3 respectively, t-test of voxel samples used for dissimilarity measure estimation; (b), (e) – GG maps calculated using spherical window with r = 5 and r = 8 respectively, t-test of voxel samples used for dissimilarity measure estimation; (c), (f) – GG maps calculated using spherical window, dissimilarity measure is the difference of mean values sampled from window halves.

Quantitative assessment of the utility of generalized gradient maps in highlighting lung tumor borders was performed separately for the first subgroup of 31 native CT images with the slice thickness of 5.0 mm and remaining 9 images of the second subgroup with the slice thickness of about 1.5 mm. Typical examples of original CT image ROIs and corresponding gradient map regions are presented in Fig. 8.

Fig. 8.
figure 8

Example ROIs of the original CT images of lungs (left column) and corresponding generalized gradient maps (right column). The first row represent case where the gradient map is definitely useful for detecting tumor border whereas the second and the third rows illustrate cases where the utility of maps is unclear and useless respectively.

As a result of the experiment, on the first subgroup of patients it was revealed that the generalized gradient maps were definitely useful for detecting tumor border in 17 patients (54.8%) whereas in 9 other cases (29.0%) they did not provide any help for solving the problem of separation the malignant tumor from adjacent atelectasis. The efficacy of maps in the rest 5 cases (16.1%) was found to be unclear. The results of the similar examination of CT scans with reasonably thin slices of about 1.5 mm suggest that it appears to be unlikely the slice thickness is an important parameter for the method. In particular, the distribution of cases between the “yes”, “no”, and “unclear” categories was 5 (55.6%), 3 (33.3%), and 1 (11.1%) respectively. This is well comparable with corresponding results obtained for the first subgroup.

5 Conclusions

In this work we have documented results of statistical assessment of CT image intensity differences between the lung atelectasis and malignant tumors. The significance scores and classification accuracy results reported here are based on the advanced statistical and pattern recognition methods. Our results suggest that it is unlikely that the use of statistical significance scores for separating lung atelectasis and tumor regions would produce good quality discrimination for all patients. However, the recent clustering algorithms demonstrate some encouraging classification accuracy on the CT intensity samples consisting of few hundred voxels. The Hierarchical Clustering method is found to be better suited for CT voxels classification task comparing to SVM and Random Forests classifiers. This is in agreement with other studies where classes overlap in feature space substantially. The voxel sample classification accuracy potentially allows to reliably discriminate atelectasis and tumor regions using relatively small sliding window of 12 × 12 voxels (i.e., the half window size is 4.1 mm) in 2D and no more than 6 × 6 × 6 voxels (2.0 mm) in 3D case.

Also we have introduced the basic concept of so-called generalized gradient and demonstrated its abilities and key details on synthetic images, 3D CT images of physical phantom as well as CT scans of lung of 40 patients with clinically confirmed diagnosis of lung cancer.