Introduction

Tissue texture reflects the spatial distribution of contrast of image voxel gray levels across the field of view [1, 2], which is an effective descriptor of tissue heterogeneity. It has been recognized as an important biomarker in various clinical tasks, such as tumor type classification, tumor treatment response evaluation [3, 4]. Tissue texture can be enhanced by spectral computed tomography (CT) because spectral CT can provide a set of CT images with different voxel contrast using different X-ray energy. For example, by recent photon counting spectral CT (PCCT) technology, typically five energy channel images can be obtained [5]. To make full use of energy enriched texture by spectral CT for clinical tasks, this paper aims to address two related issues: (1) texture enhancement in spectral CT image reconstruction; and (2) spectral energy enriched tissue texture for lesion classification.

Reconstructing high quality spectral CT images is essential, where the tissue texture should be preserved because it contains high clinical importance. PCCT image reconstruction is subjected to photon starving problems [6, 7]. Photon counting technology enables detector to discriminate photons in certain energy range. If the total photon counts are the same with the traditional CT, the low photon counts will be suffered in each individual energy channel. Model based iterative reconstruction method under Bayesian law has been widely used to deal with such a low photon counts problem [8, 9], where the prior term enables us to incorporate our human prior knowledge or constraints for the purposes of noise reduction, edge sharpness preservation, and so on. To enhance the tissue texture information for PCCT, we recently proposed a tissue-specific texture prior (TP) [10,11,12,13] in addition to low rank (LR) prior, denoted as low rank texture prior (LRTP) algorithm to reconstruct images of multiple energy channels as a tensor [14]. Promising results were reconstructed for lung imaging.

Using the spectral enriched tissue texture for lesion classification will be another important issue once we would have obtained high quality spectral CT images. To our best knowledge, only a few studies [15,16,17,18,19] are reported using spectral CT enriched data for lesion classification. In ref. [16], the authors use the iodine concentrations in the lesions and lymph nodes measured from the material-decomposition images for the gastric cancer staging diagnosis. Similar studies are also performed in refs. [17,18,19]. In ref. [15], the authors use the virtual monochromatic images (VMI) obtained from dual energy CT for benign parotid tumors classification, where six texture features such as mean, standard deviation, entropy, etc., are used. It is realized that extracting enriched texture information from spectral CT for the classification task is challenging. As the artificial intelligence (AI) being developed, the machine learning, especially deep learning-based method has been applied successfully in the medical imaging field [20,21,22,23]. To benefit from the AI, the simplest way is to throw the spectral CT images directly to the deep learning model and ask machine to extract high level features. However, this way may require huge data for training and sometimes impossible in medical field. Another way is to incorporate our human knowledge, e.g., the tissue texture in the classification model. Hence, this paper investigates three models in terms of three type inputs: one is the spectral images, another is the co-occurrence matrices (CMs) extracted from spectral images, and the third one is the Haralick features (HF) extracted from CMs. The second and third models basically use the texture descriptor as inputs. We will describe three models detailed in the method section. This investigation will not only validate whether the enriched texture by spectral CT will benefit the classification task but also provide us guidelines for how to use the enriched texture.

By addressing the two important issues above, this paper shows the insight of incorporating the clinically important prior information, e.g., tissue texture in this paper, to increase the PCCT potential by the upstream image reconstruction, improve the outcome of the downstream computer-aided diagnosis, and so on.

Methods

LRTP for spectral CT image reconstruction

As mentioned in introduction, the image reconstruction for each individual energy channel is a low photon counts problem, which can be relieved by the Bayesian type image reconstruction method. We recently proposed one tissue-specific TP in addition to LR prior under the Bayesian law to enhance the tissue texture in the image reconstruction. Firstly, to make use of the synergy among the individual energy channel, the inter-channel correlation is modeled by low-rank representation technique. Secondly, the inner-channel spatial texture of the target corrupted images is characterized by TP, which is convex and can well preserve the edges and texture features. The proposed LRTP method integrates the inter-channel correlation and the inner-channel spatial texture into a unified Bayesian reconstruction framework. The overall cost function can be expressed as:

$$ \arg {\min}_{\upchi}\parallel \mathcal{A}\left(\upchi \right)-\mathcal{Y}{\parallel}_2^2+\lambda {\parallel}_{tr}+\beta R\left(\upchi \right) $$
(1)

The TP prior is inspired by our previous studies in traditional CT image reconstruction [16,17,18,19,20], which can be formulated as:

$$ R\left(\upchi \right)=\sum \limits_k\sum \limits_{r=1}^R\sum \limits_{j\in \mathrm{Region}(r)}\sum \limits_{m\in {\Omega}_j}{w}_{jmk}^{FD}{\left({x}_{jk}-{x}_{mk}\right)}^2 $$
(2)

where index k denotes the energy channel, index j runs over all the pixels in each channel image, Ωj represents a small fixed neighborhood window of the j-th pixel in the channel image, \( {w}_{jmk}^{FD} \) is the weighting coefficient which represent the correlation between pixel m and pixel j in k-th channel predicting from full-dose images, r specifies the tissue type, R is the total number tissue regions.

In this study, the tissue type number is setting as R = 4 to represent lung, fat, muscle and bone for chest CT imaging. Reconstructed full dose CT (FDCT) images were segmented using our previously reported vector quantization algorithm [24]. Figure 1 shows one example of reconstructed FDCT images of first and last (the 5th) energy channel. The segmented tissue masks are also shown in Fig. 1. Given the segmented masks of the four tissues, we apply Eq. (2) to calculate the corresponding markov random field (MRF) coefficients, of which the window size is setting to 7 × 7. Figure 2 demonstrates the predicted MRF coefficients set of the four tissue regions in the first (top panel) and last channel (bottom panel). We can observe that MRF coefficients in the same tissue region among different channels have strong similarity. It agrees with our hypothesis that strong correlation exists among energy channels, which motivates us to use the LR prior in the image reconstruction. However, there still have some differences. For example, in the muscle plot of Fig. 2, the red region of the first channel is larger than that of the last channel. This is also expected because the attenuation coefficient differs under different X-ray energy.

Fig. 1
figure 1

Full dose computed tomography images and their segmented tissue masks. Images of first and last (5th) energy channel. Tissue masks of lung, bone, fat and muscle

Fig. 2
figure 2

The predicted markov random field coefficients for the four tissue regions of lung, bone, fat and muscle. The top panel is for the first channel. The bottom panel is for the last channel. From left to right, markov random field weights of lung, bone fat and muscle are shown respectively

Since this paper focus on texture themed discussion, details of the proposed LPTR is not included and can be found in ref. [14]. Implementation of LRTP is summarized in Table 1.

Table 1 Workflow for the proposed low rank texture prior algorithm

Enriched texture classification models

As introduced above, spectral CT can enrich tissue texture by providing set of CT images with different contrast. The first row in Fig. 3 showed an example set of virtual monochromatic CT images with X-ray energy from 35, 45, 55, 65 and 75 keV respectively. It is clearly observed that the pixel contrast varies significantly under different energy. The five-energy spectral CT much enriched the texture and is expected to provide more information. However, it remains challenging and pioneering using the enriched texture for clinical tasks like predict the tumor type, cancer stage, etc.

Fig. 3
figure 3

An example of one polyp slice at different energy channel and the corresponding extracted texture features. The first row is the raw computed tomography image; the second row is the co-occurrence matrices (CMs) extracted from the spectral images; the third row is the Haralick features extracted from the CMs

In this paper, we investigated three classification models in terms of three type inputs: one is the spectral images, one is the CMs extracted from the spectral images, and the third one is the HF extracted from the CMs. As we discussed earlier, the spectral CT images have already enriched the tissue textures. The simplest way is to throw the set of images directly into a machine learning model and ask the machine to extract the effective features from each energy image and then fuse these features to generate higher level features for the final prediction. However, this way may require huge training data samples, which is impossible in medical field at present or near future. Thus, we can incorporate our prior knowledge of the texture into the model.

Since the texture reflects the voxel grey level distribution [2], we can quantify this distribution through the mathematic process. Tremendous efforts have been devoted developing texture descriptors [26,27,28]. Among them the Haralick model made a great success [29]. Some researcher developed the traditional Haralick model from 2 dimensional (2D) to 3 dimensional (3D) and adds more features [30]. Some research used the CMs instead of HF in the convolutional neural network (CNN) based model [31]. GLCM counts the frequency of voxel-pairs of certain gray levels in the 2D/3D space and is defined by two important parameters, i.e., direction and displacement as shown in Fig. 4. In 2D case, there are 8 sampling directions considering first neighboring pixels (Fig. 4a), while there are 26 different directions in volumetric space (Fig. 4b). Every direction would create one GLCM. Since GLCM is symmetry invariant, we only need to calculate in one direction without in its inverse direction. Therefore, we calculated 13 GLCMs in 3D space, where the calculating procedure was illustrated in Fig. 4c.

Fig. 4
figure 4

Illustration of grey level co-occurrence matrices of 2 dimensional /3 dimensional images. Left pannel: 2 dimensional grey level co-occurrence matrices (GLCM) calculation; Middle pannel: 3 dimensional GLCM calculation; Right pannel: A GLCM example when angle(θ) = 0°, displacement(d) = 1

This paper investigated these three type inputs, i.e., CT images, CMs and HF to validate the effectiveness of the spectral enhanced texture and explore possible ways for the classification tasks. An example of the CMs and HF texture features is also presented in Fig. 3 (2nd and 3rd rows). It is noted the CMs and HF has 13 sampling directions. We only demonstrate one direction as an example. We could see both the texture features in HF and CM also vary prominently among different energies. For the three type inputs, three models were implemented: the CT raw image-based CNN model (IM-CNN), the CM based CNN model (CM-CNN) and the HF based random forest model (HF-RF). The three models are illustrated in the Fig. 5. Details of each model are presented in the following.

Fig. 5
figure 5

Classification models for three different inputs: (a) the computed tomography raw image-based convolutional neural network (CNN) model, (b) the co-occurrence matrice based CNN model and (c) the Haralick feature based random forest model

IM-CNN: The slice with largest cross-section was chosen as the input. A multi-channel network structure was designed, which takes each energy 2D image as one input channel and multiple channels are used to take all energy images from spectral CT. A five-channel design was shown in Fig. 5a. The model consists of seven layers including three convolution layers, two max-pooling layers and two fully connected layers. In each convolution layer, batch normalization and activation function are performed. It uses the rectified linear unit (ReLU) as the activation function, the cross-entropy loss as the training loss and softmax function at the last fully connected layer.

CM-CNN: Texture-based Gray-level Co-occurrence Matrix (GLCM) was used as inputs for this model. A multi-channel network structure was designed, which takes each energy 2D GLCM as one input channel and multiple channels are used to take all energy GLCM from spectral CT. The multiple-channels design (n channels as an illustration) was shown in Fig. 5b. The model consists of seven major layers including two convolution layers, two max-pooling layers and three fully connected layers. In each convolution layer, batch normalization and activation function are performed. It uses the ReLU as the activation function, the cross-entropy loss as the training loss and softmax function at the last fully connected layer.

HF-RF: HF as a typical texture descriptor are extracted from GLCM, which is also handcrafted texture features defined in ref. [29]. According to ref. [29], every GLCM could deduce 14 Haralick measures. Twenty-eight HF would be generated by calculating means and ranges of Haralick measures over 13 directions. Then the texture descriptor consisting of 28 HFs is fed to random forest (RF) classifier to perform classification [26]. For each process of RF, the descriptors of all polyps are divided into training groups and testing groups. Before classification, we first calculate the priority of each variable in texture descriptor. Gini coefficient is introduced to be the priority measurement. Next, some variable sets are generated using the forward step feature selection method on the ranked variables. This architecture was shown in Fig. 5c. Classifications are performed under the parameters of 3000 trees and \( \sqrt{28} \) randomly selected variables for each node.

Evaluation strategies

Currently, the photon counting spectral CT is under research and development and has not been utilized in clinical. Therefore, the evaluation was performed on simulated spectral CT datasets.

Image reconstruction evaluation

VMI obtained from the dual energy CT can be used to mimic the output of the PCCT. The procedure can be described as follows. Due to different materials have different spectral attenuation response, the Dual energy CT will generate VMI based on two measurements of different energy. Then the photon counts for each energy channel can be calculated using the forward projection. Poisson noise is superimposed to the signal to consider the statistical property. Then the low counts projection was reconstructed by the proposed LRTP method.

Classification evaluation

The classification models should be evaluated on the datasets with pathology reports as the ground truth. Therefore, the evaluation was performed on our pathology proven colon polyp datasets. However, we do not have VMI images for this dataset. As a way around, we simulated the spectral CT data based on attenuation physical model. Colon polyps consist of soft tissue, which can be represented by three tissue types, i.e. fat, cellular tissue and water. Each tissue type has different attenuation coefficient or CT value at different X-ray energy. The CT-energy response curve of the three tissue types [32] to the X-ray energy in Hounsfield units is shown in Fig. 6. Water is used as the reference material of which the CT value is zero. The cellular tissue has positive values, while the CT values for fat tissue are always negative. Once we have the CT data at one energy, we can use a linear scaling method to simulate the CT data at other energies according to the response curve for each tissue type.

Fig. 6
figure 6

Computed tomography values of three tissue types under different energy

A two-fold cross-validation method was used to evaluate the performance via the mean and the standard deviation of area under the receiver operating characteristic curve (AUC). We compared the performance of all three models at each individual X-ray energy and the combination of five energy channels. We also compared the performance of all three models based on the difference images. The difference images are obtained by subtracting the CT data of the neighboring energy channels. Then we extracted the CMs and HF features based on the difference images. More details will be described in experiments and results section.

Experiments and results

Dataset

Dataset 1 A patient who revealed suspected pulmonary tumor appearing as nodules was scanned using a dual-energy high-definition CT scanner (GE Discovery CT750 HD) of the chest in gemstone spectral imaging mode. The VMI at 80 and 140 kVp with the thickness of 2.5 mm and the spacing of 0.75 mm were transferred to an AW 4.4 workstation for analysis. The VMI were generated at 10 keV monochromatic energy level increments from 60 to 100 keV, resulting in one set of 5 images, which represents the VMI for each channel. A representative slice was specified from the image volume to evaluate our methodology LRTP for PCCT image reconstruction

Dataset 2 Fifty-nine patients with 63 polyps, including 31 benign and 32 malignant, were scanned in conventional CT facility with effective 75 keV X-ray energy. Each abdominal CT image volume consists of more than 400 image slices. All polyps were resected with following pathology reports to verify whether each polyp was malignant as an adenocarcinoma or benign as an adenomatous. Summary of the polyp mass with its pathology results are presented in Table 2. Based on the CT-energy response curve in Fig. 6, those 63 volumetric polyp CT images were used to simulate the CT data in Hounsfield units at other effective energies from 35 keV to 65 keV

Table 2 Summary of ployp mass for dataset 2

Results of image reconstruction

The dataset 1 was used to evaluate the texture enhanced image reconstruction algorithm, LRTP. For comparison study, some well-established methods were employed, including the simultaneous algebraic reconstruction technique (SART), total variation minimization (TV) [33], low-rank representation and total variation regularization (LRTV) [34] as well as tensor dictionary learning (TDL) [35].

The reconstructed images are zoomed in the lung region and shown in Fig. 7. The gray images are the magnified lung ROI CT images. The corresponding absolute difference images are shown in color. From top to bottom are the reference images reconstructed from full-dose projections by SART and the low-dose images reconstructed by SART, TV, LRTV, TDL and LRTP methods. The SART reconstructed images contain strong noise artifacts and some texture features are drowned out by noise. The strong random noise can be clearly seen in the absolute difference images. TV and LRTV methods significantly reduced the noise but most of the small texture features, such as the blood vessels were smoothened. The corrupted texture can be clearly observed in their absolute difference images. TDL method could identify these texture features, and the details were retained without blurring. The proposed LRTP method yielded very clear images and the smallest difference with the reference image comparing with other methods.

Fig. 7
figure 7

Close-up views for comparison of lung region of interests reconstructed by different algorithms. The gray level images are the magnified lung ROI CT images in the red box in Fig. 1. The corresponding absolute difference images are shown in color. From top to bottom are the reference images reconstructed from full-dose projections by SART and the low-dose images reconstructed by SART, TV, LRTV, TDL and LRTP methods, respectively. ROI Region of interest; CT Computed tomography; TV Total variation; LRTV Low rand total variation; TDLTensor dictionary learning; SART Simultaneous algebraic reconstruction technique; LRTP Low rank texture prior

Quantitative measurements are shown in Table 3. The proposed LRTP had the lowest root mean square error (RMSE) and the highest peak signal to noise ratio for all energy channels. The structure similarity index (SSIM) and feature similarity index (FSIM) also employed to measure the structure similarity and feature similarity between the references and reconstructed images. In Table 3, the proposed LRTP method obtain the greatest SSIM and FSIM values than other competing methods in all channels which further demonstrates that the LRTP method has the best image quality in terms of quantitative assessment.

Table 3 RMSE (10e-4), PSNR, SSIM, FSIM Index for region of interest images in Fig. 7

Results of classification

We explored three machine learning models in terms of three type inputs, i.e., raw CT images, texture images (GLCMs images), hand crafted features from texture images. The three type inputs described above were fed to the three models presented in Fig. 5. For each model, each individual energy channel data was used then the combination data of five-energy channels was used. For each experiment, we randomly split the dataset into two folds for 100 runs. At each run, we used one-fold data for training and the other for testing. Then we swapped the training and testing data. In the end, we obtained the averaged AUC for the 100 runs and its standard deviation. The results are shown in Table 4.

Table 4 Classification performance of raw image scheme

For all three inputs, the five-energy spectral CT data gives the best performance. An improvement of the AUC score is obtained by 7.3%, 0.42% and 3.0% for the spectral images, CMs and HFs respectively on the five-energy spectral data over the original 75 kev data only. Comparing results among three inputs, we observed that GLCM feature image could provide more effective information than the CT image for the CNN learning. This agrees with our expectation that for the limited datasets with pathological ground truth, CMs based learning is more efficient than CT image-based learning since the GLCM is extracted from the raw image as an effective texture descriptor to reflect the lesion heterogeneity.

One alternative way to utilize the energy information is to use difference images. We obtain difference images by subtracting the CT data of the neighboring energy channels. Such we obtained four sets of difference CT images from the five energy channels. Similarly, we extract the CMs and HF from four difference CT images and fed the three type inputs into the corresponding models. The results of the difference image scheme are summarized in Table 5. Similar with raw image scheme, the five-energy spectral CT data gives the best performance. For individual difference images, the performance from all energy pairs are almost the same and better than the performance of corresponding raw image scheme. The CM-CNN model HF-RF and model achieves the best AUC of 0.934 and 0.927. Similarly, CMs and HF performed better than CT image, which have been discussed above.

Table 5 Results of the difference image scheme

Discussion

In this paper, the proposed LRTP reconstruction method combines low-rank representation and TP under Bayesian framework. This combination considers the correlation among different channels while preserve the texture information in the specific tissue regions. Comparing the LRTP with LRTV (Fig. 7), we can clearly observe the tissue texture was much better enhanced by the proposed method. While, the similarity of TP among different energy channels demonstrates the LR property of the spectral CT data, how to future enhance the tissue texture using the TP similarity is one of our future research interests.

In the computer aided diagnosis (CADx), tissue texture plays an important role. The tissue texture can be enhanced by the spectral CT because each tissue always has different attenuation coefficient under different X-ray energy. In spectral CT, multiple datasets are obtained at multiple monotonic energy X-ray energies. By compiling all energy channel CT data, the tissue texture can be better represented and benefit the CADx. Using PCCT enhanced texture for clinical task has not been studied thoroughly as mentioned in introduction section. This paper is serving as pioneering work to explore how to utilize PCCT effectively especially given a limited dataset, which is always true in medical imaging field. We investigated three type inputs for the classification task. The overall tendency remains the same across three type inputs that the spectral CT model outperforms single energy data or traditional CT data. This validates the effectiveness of enriched texture by spectral CT. Moreover, the CMs and HF performs much better than the raw CT images, which shows that incorporating our knowledge into the model can help to improve the performance, especially in the limited datasets.

For reconstruction evaluation, we used the VMI from dual energy to mimic the signal of photon counting. For classification evaluation, we used the basic physics model to simulate the spectral CT data. This simulation assumes that each voxel is pure material, which means no partial volume effect was considered. Studies on real spectral CT data is another of our future research interests.

Conclusions

Spectral CT is believed to be able to enhance the tissue contrast and enrich tissue texture information. In this study, we addressed two texture related issues for the spectral CT practical usage. One is to enhance the tissue texture of spectral CT images by the proposed LRTP algorithm. The other is to make use of the enriched texture for clinical tasks by investigating three type input models. All three models showed improved AUC results on the five-energy spectral CT data over the original single CT data only. The outcome also demonstrated the potential of PCCT in polyp classification tasks in the future clinical practice. Our innovation lies in two manifolds: (1) LRTP algorithm for PCCT reconstruction, which integrates the inter-channel correlation and the inner-channel spatial texture into a unified Bayesian reconstruction framework; (2) pioneering exploration of effective learning model using PCCT data for lesion classification task. This paper shows our effort to advance the potential of PCCT in practical usage by considering the upstream reconstruction and downstream classification model. The comprehensive texture themed study shows the insight of maximizing the application of clinical important prior information, e.g. tissue texture in this paper, into practical usage for performance improvement.