Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images
The nonalcoholic fatty liver disease is the most common liver abnormality. Up to date, liver biopsy is the reference standard for direct liver steatosis quantification in hepatic tissue samples. In this paper we propose a neural network-based approach for nonalcoholic fatty liver disease assessment in ultrasound.
We used the Inception-ResNet-v2 deep convolutional neural network pre-trained on the ImageNet dataset to extract high-level features in liver B-mode ultrasound image sequences. The steatosis level of each liver was graded by wedge biopsy. The proposed approach was compared with the hepatorenal index technique and the gray-level co-occurrence matrix algorithm. After the feature extraction, we applied the support vector machine algorithm to classify images containing fatty liver. Based on liver biopsy, the fatty liver was defined to have more than 5% of hepatocytes with steatosis. Next, we used the features and the Lasso regression method to assess the steatosis level.
The area under the receiver operating characteristics curve obtained using the proposed approach was equal to 0.977, being higher than the one obtained with the hepatorenal index method, 0.959, and much higher than in the case of the gray-level co-occurrence matrix algorithm, 0.893. For regression the Spearman correlation coefficients between the steatosis level and the proposed approach, the hepatorenal index and the gray-level co-occurrence matrix algorithm were equal to 0.78, 0.80 and 0.39, respectively.
The proposed approach may help the sonographers automatically diagnose the amount of fat in the liver. The presented approach is efficient and in comparison with other methods does not require the sonographers to select the region of interest.
KeywordsNonalcoholic fatty liver disease Ultrasound imaging Deep learning Convolutional neural networks Hepatorenal index Transfer learning
The nonalcoholic fatty liver disease, diagnosed in a large number of obese patients, is the most common liver abnormality . It is defined as the accumulation of fat in more than 5% of liver cells. This disease is associated with increased risk of hepatic cirrhosis and hepatocellular carcinoma, but it is also influencing higher cardiovascular morbidity and mortality in affected patients [2, 3]. Liver biopsy is the reference standard for direct liver steatosis quantification in hepatic tissue samples . However, biopsy is a costly and invasive procedure that carries a high risk of serious complications, commonly including pain, bleeding and in rare cases, death . Therefore, liver biopsy is not considered to be an easy, optimal way to assess and follow-up the progress of common liver diseases. Noninvasive liver imaging methods such as computed tomography, magnetic resonance imaging or ultrasound (US) have been intensively investigated . US may be the preferred modality for screening liver steatosis because of its non-invasiveness, low cost and wide availability.
Up to date various approaches have been proposed to assess the level of steatosis in liver using US . Among them, the hepatorenal sonographic index (HI) is considered to be highly efficient and simple [7, 8]. The HI method is based on comparison of the liver echogenicity to that of the right kidney cortex. Normal liver and renal tissues show similar echogenicity. However, in the presence of steatosis, the liver tissue brightness is higher than the kidney brightness. The ultrasound-based diagnostic results may depend on skills and experience of physicians performing the examination, type of ultrasound machine and even on US image settings [9, 10]. This operator dependence makes the comparison of results difficult and limits wider practical application of this important imaging technique. Another approach to liver steatosis assessment employs texture analysis. According to the review paper on liver image analysis , the gray-level co-occurrence matrix (GLCM) algorithm is the most frequently used method for liver disease characterization . GLCMs provide useful information about spatial gray-level dependencies in an image. Texture patterns of US images arise from the interference of backscattered US waves on tissue microstructures. The GLCM-based approaches to liver steatosis classification using US images have been proposed in several papers [12, 13, 14, 15].
Nowadays new algorithms for image analysis are intensively studied, including deep learning. These machine learning methods let the computers automatically develop useful features for classification. The usefulness of convolutional neural networks (CNNs) has been reported in solving various medical image analysis problems [16, 17]. CNNs transform input images with convolutional filters into a single decision variable as an output that usually indicates the input image label. However, to successfully train a CNN, usually a large amount of input data are required. This issue limits the practical applications of deep models in medical image analysis, since the available medical image datasets are usually small. Therefore, as a solution, various transfer learning techniques have been proposed . Instead of building a completely new model from scratch, it is possible to use a model developed for another problem. The usefulness of a pre-trained model depends on its ability to adjust to images outside the original training dataset. In the case of medical image analysis, the implementation of transfer learning techniques has been reported in several papers [19, 20, 21, 22].
The aim of this paper is to develop a deep learning model for steatosis level assessment based on US liver B-mode images and to compare it with the HI and the GLCM techniques. The US data analyzed in this study were collected from severely obese patients evaluated before bariatric surgery. We used a pre-trained CNN to extract features based on B-mode images. Next, using the neural features, we employed the support vector machine (SVM) algorithm to classify images containing fatty liver. Aside of fatty liver classification, it is clinically relevant to quantify the grade of liver steatosis. For this task, we used the extracted features and the Lasso regression method. In both cases, liver biopsy results served as a reference. The performance of the proposed approach was compared with the HI and the GLCM methods.
This paper is organized in the following way. First, we describe the patient group and the data acquisition routines. It is presented how to calculate the HI- and the GLCM-related features using liver US images. Next, our deep learning solution to fatty liver assessment is described. We show how to apply the transfer learning to extract CNN-based features using B-mode liver images. Next, we employ the CNN- and the GLCM-based features to perform fatty liver disease classification and to assess the level of steatosis. Results are presented and evaluated. Finally, we discuss the advantages and disadvantages of the applied methods.
Materials and methods
The dataset described above can be downloaded via the Zenodo repository ( https://doi.org/10.5281/zenodo.1009146). The dataset repository includes sequences of B-mode images and the biopsy results. The provided dataset could be useful for researchers interested in fatty liver imaging. It should be noted that during the acquisition of the data with the cardiac probe, we recorded the images with the kidney on the left side of the screen. For convenience of those researchers who are used to kidney on the right side of the image, we provide in Fig. 2 the example images following the standard convention. In the case of the dataset, the images were provided with the left sided kidney arrangement as recorded during the image acquisition.
The HI is defined as the ratio of average brightness level of the liver and the kidney cortex. Generally, the HI is expected to increase with the steatosis level. In our study, the HI was determined by a physician with experience in ultrasonography and echocardiography research acquisition . The physician was blind to biopsy results. In the first step, a single scan frame from the B-mode sequence was selected by the physician. Next, two regions of interest (ROIs) corresponding to the liver and the kidney cortex were specified. The ROI selection is illustrated in Fig. 2. Care was taken to select liver and kidney ROI in the middle part of the image sector, side by side at the same depth. If infeasible due to suboptimal image quality, liver ROI was selected above kidney ROI with the shortest distance possible. The ROI was determined by using circular method with the radius of the circle equal to 5 mm. In each case, the ROI was as uniform as possible. Regions of non-uniform speckle pattern, vessels or ducts were omitted during the ROI selection procedure. The ratio between the average brightness levels in the ROIs was determined with Matlab software (MathWorks INC, USA) using histogram analysis, see Fig. 2.
GLCM-based features were extracted following a similar approach proposed in [12, 13, 14]. The same liver ROIs were employed for analysis as in the case of the HI method. However, instead of the circular regions, we used square regions with side length of 10 mm. For each ROI nine different GLCMs were calculated considering angles between 0, 45 and 135, and path distances of 1, 2 and 3 . Next, for each GLCM the following texture features were extracted: maximum probability, uniformity, entropy, dissimilarity, contrast, inverse difference, inverse difference moment and correlation .
Classification and evaluation
To assess the level of steatosis, we employed the Lasso regression method. The same validation scheme was applied as in the case of the classification, but the steatosis level was estimated instead of the a posteriori class probability. Spearman correlation coefficients (SCCs) were calculated to assess the relation between the steatosis level, the models’ outputs and the HI parameter. Moreover, the SCCs between the models’ outputs and the HI parameter were determined. Next, the linear regression algorithm was used to relate the steatosis level and the HI parameter. All regression models were compared using the Meng test implemented in the cocor package in R [33, 34].
Classification performance summary
0.959 ± 0.044
0.893 ± 0.059
0.977 ± 0.021
Ultrasound imaging is the most commonly applied imaging modality. Our study confirms that the HI parameter is a good predictor of steatosis level in liver. It is simple to calculate and efficient. Our results are in a good agreement with other studies reporting the usefulness of the HI parameter. We obtained high values of the AUC and the SCC parameters, which were equal to 0.959 and 0.80, respectively. The AUC values reported for the HI method ranges from 0.76  to 0.99 . However, the papers commonly report different ranges of the HI parameter and different optimal cutoffs for the fatty liver classification. [7, 8, 36, 37, 38, 39]. This issue illustrates the ambiguity related to the HI-based fat assessment. The performance of the GLCM-based approach was worse with the AUC and the SCC equal to 0.893 and 0.39, respectively. Low value of the SCC parameter suggests that the GLCM-based features are not efficient for the steatosis level assessment. The obtained AUC value is in agreement with the results reported in the previous studies that employed GLCM-based features [12, 14, 15]. In [12, 15] the authors reported AUC values of around 0.8. In  the accuracy of around 0.8 was reported. In  the authors achieved high AUC value of 0.96. However, in this study the cross-validation was not applied and the authors used the same dataset to develop and evaluate the classifiers what could result in overfitting.
Our study shows the feasibility of using deep learning for the liver steatosis assessment. Although we used a small dataset containing only 550 images from 55 patients, these data were sufficient to develop a well performing classifier with transfer learning. The AUC value in the case of the fatty liver classification was equal to 0.977. According to Table 1, the obtained performance was higher than in the case of the HI method. Moreover, the CNN-based approach achieved significantly better results than the GLCM-based approach. The CNN features were useful and enabled efficient training of the classification and regression models. Good performance of the CNN-based approach was expected. In our study, we did not train the network from scratch, instead the pre-trained CNN was used for feature extraction. This model was developed using the ImageNet dataset containing 1.2 million labeled images of various objects. The HI calculation includes two convolutional operations (spatial averaging), which should be supposedly learned by the CNN to perform well on the ImageNet dataset. These two operations have to be conducted in the liver and the kidney, so the network has to detect these tissues first. The appearance of the liver with respect to surrounding tissues is important for efficient steatosis assessment.
In the case of the liver steatosis assessment, the obtained SCC, equal to 0.78, was slightly lower than the SCC calculated for the HI parameter, which was equal to 0.80. However, this difference was not statistically significant. Both regression models performed well, except for the patients with severe steatosis. In this case, the estimated values of steatosis were slightly too small. This may be due to the dataset, which was too small to build an accurate regression model. Moreover, the transfer learning in this case may not be efficient enough to capture the dependence between the input images and the liver steatosis level. Nevertheless, the proposed approach should be considered to be good, especially since the results were obtained in an automated process. Figure 7a illustrates the relation between the Lasso regression method and the HI parameter. In this case, the SCC was equal to 0.78, indicating high degree of correlation. According to the Bland–Altman plot in Fig. 7b, the average bias in estimates is low.
Although the performance of the proposed method was only slightly better than the performance obtained using the HI parameter, the proposed approach has several advantages that illustrate its clinical value. First, our method can be considered as an integrated computer-aided diagnosis system. It is operator independent and does not require ROI selection in comparison with the HI method and the GLCM-based approach. Next, the proposed method efficiently utilizes sequences of US images to assess the level of steatosis, while the approaches proposed in the literature commonly employs only one US image to conduct classification . However, there are several issues related to our work. First of all, the ROI selection is operator dependent and has impact on calculation of the HI parameter and the GLCM-based features. For proper estimation of the HI parameter, the physician has to select ROIs in the liver and the kidney. These ROIs have to be as uniform as possible to omit the regions of blood vessels, ducts or other structures in the organs. In our study we focused on machine learning and did not examine observer variability, the ROIs were determined by a single physician. The obtained results may differ between observers [9, 10]. Second, all employed methods are to some extent scanner dependent. B-mode image intensities can be modified by using different image reconstruction and processing algorithms, what may affect the feature extraction and consequently the classification. This is a general issue encountered in studies that aim to develop US-based computer-aided diagnosis systems. Image quality (speckle patterns and boundary visibility) depends on scanner settings. The Inception-ResNet-v2 network utilized in our study was trained using the ImageNet dataset that contains images recorded under slightly different lighting conditions. Therefore, we believe that the impact of image reconstruction algorithms implemented in the US scanners should be lower for the proposed approach than in the case of the HI- and the GLCM-based methods. We would like to investigate this problem in the future in two ways. First, it would be interesting to acquire raw ultrasound data and investigate how the image reconstruction algorithms impact the feature extraction from the CNN . Second, we are going to acquire B-mode images of the same liver using different scanner settings and investigate whether the model can learn features for classification that are independent of scanner settings. To make the assessment scanner independent, it would be interesting to employ the quantitative US techniques. These methods are used to estimate various physical properties of the tissue, such as the attenuation or scattering characteristics . Quantitative US techniques can be used to create parametric maps that serve as an additional source of information on investigated tissue in comparison with standard B-mode images [42, 43]. Those maps may serve as a more proper input to the CNN than regular B-mode images. The usefulness of quantitative US techniques in liver steatosis assessment has been reported in several studies [13, 44, 45]. In the future, we plan also to acquire more data and investigate various approaches to model development.
In this paper we proposed a CNN-based approach to steatosis level assessment utilizing B-mode ultrasound images. The model was developed using data acquired in obese patients undergoing wedge liver biopsy during bariatric surgery. Our approach is efficient and operator independent. Moreover, it outperforms the HI- and the GLCM-based classification.
Compliance with ethical standards
Conflict of interest
The authors do not have any conflict of interest.
All procedures performed in studies involving human participants were in accordance with the Ethical Standards of the Medical University of Warsaw and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
- 1.Beeman SC, Garbow JR (2018) Imaging and metabolism. Springer, New YorkGoogle Scholar
- 5.Luapuadat AM, Jianu IR, Ungureanu BS, Florescu LM, Gheonea DI, Sovaila S, Gheonea IA (2017) Non-invasive imaging techniques in assessing non-alcoholic fatty liver disease: a current status of available methods. J Med Life 10:19–26Google Scholar
- 15.Rivas EC, Moreno F, Benitez A, Morocho V, Vanegas P, Medina R (2015) Hepatic Steatosis detection using the co-occurrence matrix in tomography and ultrasound images. In: 20th symposium on signal processing, Images and Computer Vision (STSIVA), Bogota, pp 1–7. https://doi.org/10.1109/STSIVA.2015.7330417
- 20.Cheng JZ, Ni D, Chou YH, Qin J, Tiu CM, Chang YC, Huang CS, Shen D, Chen CM (2016) Computer-aided diagnosis with deep learning architecture: applications to breast lesions in US images and pulmonary nodules in CT scans. Sci Rep 6:244–254Google Scholar
- 23.Kalinowski P, Paluszkiewicz R, Ziarkiewicz-Wroblewska B, Wroblewski T, Remiszewski P, Grodzicki M, Krawczyk M (2017) Liver function in patients with nonalcoholic fatty liver disease randomized to roux-en-y gastric bypass versus sleeve gastrectomy: a secondary analysis of a randomized clinical trial. Ann Surg 266:738–745CrossRefPubMedGoogle Scholar
- 24.Kleiner DE, Brunt EM, Van Natta M, Behling C, Contos MJ, Cummings OW, Ferrell LD, Liu YC, Torbenson MS, Unalp-Arida A, Yeh M, McCullough AJ, Sanya AJ (2005) Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology 41:1313–1321CrossRefPubMedGoogle Scholar
- 27.Chollet F (2015) Keras. https://github.com/fchollet/keras
- 28.Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, vol 4Google Scholar
- 29.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255Google Scholar
- 39.Mancini M, Prinster A, Annuzzi G, Liuzzi R, Giacco R, Medagli C, Cremone M, Clemente G, Maurea S, Riccardi G, Rivellese AA, Salvatore M (2009) Sonographic hepatic-renal ratio as indicator of hepatic steatosis: comparison with 1 h magnetic resonance spectroscopy. Metab Clin Exp 58:1724–1730CrossRefPubMedGoogle Scholar
- 40.Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 427–436Google Scholar
- 44.Lin SC, Heba E, Wolfson T, Ang B, Gamst A, Han A, Erdman JW, O’Brien WD, Andre MP, Sirlin CB, Loomba R (2015) Noninvasive diagnosis of nonalcoholic fatty liver disease and quantification of liver fat using a new quantitative ultrasound technique. Clin Gastroenterol Hepatol 13:1337–1345CrossRefPubMedGoogle Scholar
- 45.Paige JS, Bernstein GS, Heba E, Costa EAC, Fereirra M, Wolfson T, Gamst AC, Valasek MA, Lin GY, Han A, Erdman JW, O’Brien WD, Andre MP, Loomba R, Sirlin CB (2017) A pilot comparative study of quantitative ultrasound, conventional ultrasound, and MRI for predicting histology determined steatosis grade in adult nonalcoholic fatty liver disease. Am J Roentgenol 208:168–177CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.