Two-stage CNNs for computerized BI-RADS categorization in breast ultrasound images
- 199 Downloads
Quantizing the Breast Imaging Reporting and Data System (BI-RADS) criteria into different categories with the single ultrasound modality has always been a challenge. To achieve this, we proposed a two-stage grading system to automatically evaluate breast tumors from ultrasound images into five categories based on convolutional neural networks (CNNs).
This new developed automatic grading system was consisted of two stages, including the tumor identification and the tumor grading. The constructed network for tumor identification, denoted as ROI-CNN, can identify the region contained the tumor from the original breast ultrasound images. The following tumor categorization network, denoted as G-CNN, can generate effective features for differentiating the identified regions of interest (ROIs) into five categories: Category “3”, Category “4A”, Category “4B”, Category “4C”, and Category “5”. Particularly, to promote the predictions identified by the ROI-CNN better tailor to the tumor, refinement procedure based on Level-set was leveraged as a joint between the stage and grading stage.
We tested the proposed two-stage grading system against 2238 cases with breast tumors in ultrasound images. With the accuracy as an indicator, our automatic computerized evaluation for grading breast tumors exhibited a performance comparable to that of subjective categories determined by physicians. Experimental results show that our two-stage framework can achieve the accuracy of 0.998 on Category “3”, 0.940 on Category “4A”, 0.734 on Category “4B”, 0.922 on Category “4C”, and 0.876 on Category “5”.
The proposed scheme can extract effective features from the breast ultrasound images for the final classification of breast tumors by decoupling the identification features and classification features with different CNNs. Besides, the proposed scheme can extend the diagnosing of breast tumors in ultrasound images to five sub-categories according to BI-RADS rather than merely distinguishing the breast tumor malignant from benign.
KeywordsBreast tumor in ultrasound image Breast Imaging Reporting and Data System (BI-RADS) Automatic categorization Deep convolutional neural network
BI-RADS assessment categories for breast ultrasound images
Incomplete and needs additional imaging evolution
Probably benign (< 2% probability of malignancy)
Category 4A: Low suspicion for malignancy (2% to 8% probability of malignancy)
Category 4B: Moderate suspicion for malignancy (9% to 49% probability of malignancy)
Category 4C: High suspicion for malignancy (50% to 95% probability of malignancy)
Highly suggestive of malignancy (> 95% probability)
Known biopsy-proven malignancy
There were a lot of semi-automated breast tumor classification methodologies, which employed hand-engineered features to better correspond to the probability of malignancy [11, 12]. However, semi-automated methods cannot totally relieve the diagnosis burden of physicians in nature . Besides, majority of previous studies focused primarily on classifying breast tumors into benign and malignant [14, 15, 16, 17]. When extending the relationship between the features extracted from breast ultrasound (BUS) images and the corresponding probability of malignancy into more complex categories, physicians need to spend extra time and effort to provide specific and appropriate handcrafted features. Recently, through exploiting hierarchical feature representations automatically learned from large-scale dataset, deep learning techniques have successfully addressed numerous medical image analysis problems [18, 19, 20, 21, 22, 23, 24]. Due to the superiority of deep learning in automatic feature extraction, several related works on detecting breast tumors from US images utilized deep learning methods instead of traditional feature engineering [20, 21, 23, 24, 25, 26]. For example, Yap et al.  attempted to detect breast ultrasound lesion with different convolutional neural network (CNN) models, including a Patch-based LeNet model, a U-Net model and a transfer learning approach with a pre-trained FCN-AlexNet. Bian et al.  developed the detection work on automated whole breast ultrasound (AWBU) with a deep convolutional encoder-decoder network. Generally, the detection of breast tumor can preliminarily provide region of interests (ROIs) for the successive tumor classification task. More effective tumor/lesion region can guide CNNs to learn better discriminative features for the classification task. A few studies have validated the feasibility of classifying breast tumor into different categories with CNNs [23, 27, 28, 29]. Huynh et al.  highlighted that deep learning can be a promising new direction for obtaining “good” features for automatic breast tumor classification by comparing the results with those of typical methodologies (quantized features + typical classifier). However, the detailed information about the breast tumor classification network was not provided. Zhang et al.  has demonstrated the feasibility of using a CNN in classifying breast tumors with shear wave elastography (SWE). Although features from SWE images are helpful in localizing breast tumors, equipping each ultrasound device with SWE is not practical. And the image features may involve an abundance of interference because the contour determined by SWE is rather coarse. Moreover, the attempt only focused on classifying the BUS images into benign or malignant. At present, few studies developed the research on the automatic multi-category classification based on the BI-RADS. He et al.  implemented the multi-category classification based on electronic medical records from the aspect of natural language description. Clinically, a direct analysis on the tumor’s category based on the collected BUS images can better assistant physicians in relieving the diagnose burden. Due to abundant noise and interference from other tissues in BUS images, it is a rather challenging task to implement accurate multi-category classification corresponding to the BI-RADS only with the BUS images.
This is the first comprehensive quantized grading system depending on BUS images, which can achieve a 5-score categorization based on BI-RADS, covering Category 3, Category 4a, Category 4b, Category 4c, and Category 5, thus potentially relieving the burden of a tedious image review process and alleviating subjective influence due to physicians’ experiences in clinical practice.
With our two-stage CNNs, features can be decoupled in the detection phase and classification phase, since the weights of the identification task and classification task cannot be well compatible in a one-stage CNN architecture for BUS images. Our two-stage system can perform better accuracy than the state-of-art one-stage methods.
Materials and methods
In this study, all the collected 2-D breast ultrasound (BUS) images were from female patients and contained a breast tumor. For each volunteer participant, only one case corresponding to the maximum cut surface of breast tumor was used to generate the datasets. This study included 531 cases of Category 3, 443 cases of Category 4A, 376 cases of Category 4B, 565 cases of Category 4C, and 323 cases of Category 5. Human subject ethical approval was obtained from a relevant committee at West China Hospital of Sichuan University before collecting ultrasound images. Each subject provided written consent prior to the research. Philips IU22 ultrasound scanner (Philips Medical System, Bothell, WA) with a 5- to 12-MHz linear probe was utilized while collecting the data.
The CNN architecture is an extensively utilized deep learning technique for analyzing medical images [18, 30]. Typically, a CNN is constructed with several convolution layers [31, 32], maxpooling layers , and fully connected layers . And the extensively utilized activation methods in the CNN include the rectified linear unit (ReLU), sigmoid, and tanh .
CNN-based localization and grading models
Inherent speckle noise and low image contrast of the US images may bring unnecessary distraction while extracting features, thus making the automatic classification of the breast US images difficult. To extract effective features for the classification, the tumor identification network (ROI-CNN) and refinement procedure were first exposed on the whole BUS image to determine the effective ROI. Then, the following tumor grading network (G-CNN) can focus on extracting the discriminative features for classifying tumors.
The identification model—ROI-CNN
To effectively reduce the influence of other tissues, like Cooper’s ligaments, identifying the tumor from the corresponding whole BUS image is the first and most important procedure for implementing the automatic grading system. Our ROI identification network (ROI-CNN) was developed based on the fully convolutional networks (FCN) .
However, too small size of the feature maps cannot well reflect the detailed boundary information of the breast tumor. In this study, a atrous convolution layer (refer to the yellow block in Fig. 2) was then incorporated into our ROI-CNN, which can effectively enlarge the receptive field of filters and capture a larger context without increasing the amount of parameters or the cost of computation [38, 39, 40]. In the atrous convolution layer, the kernel size was set to be 3 × 3 and the dilation rate was set to be 2. Besides, concatenating feature maps from different depths was performed (refer to Fig. 2) to ensure that features with two different receptive fields can be merged together. Following the atrous convolution layer, a convolution layer was additionally used as a transitional layer between the atrous convolution layer and the top convolution layer to provide balanced number of features from the deep layer and shallow layer for the concatenation operation. In the transitional convolution layer, the kernel size was set to 1 × 1 and the number of filters was set to 512. For the output of the ROI-CNN, the predicted identification possibility in the breast tumor region should be higher than the non-tumor region.
The grading model—G-CNN
Effective classifier can enhance the distinguishing ability of tumor features from different categories, thus promoting the accurate classification. Clinically, apart from the inner texture of breast tumors, the texture and the boundary information are also significant for classifying the breast tumors into different grades [10, 14]. Therefore, the grading model needs to take the texture and the boundary features into consideration to enhance the expression of the grading features.
Totally, the G-CNN network contained 18 convolution layers. The batch normalization strategy  was encapsulated at the top convolution layer in each block, and the first two FC layers, to regularize the model. A L2 regularization operation was performed to reduce overfitting, which can enable better test performance via better generalization. The kernel size of all convolution layers was 3 × 3, and each layer was followed by ReLU . All the max pooling layers was set to be 2 × 2 with a stride 2. At the end of the G-CNN were three fully connected (FC) layers that consisted of 4096 neurons, 1024 neurons and 5 neurons. A softmax layer followed the topmost FC layer with five neurons to conduct the grading output.
Affected by the ambulant speckle noise and other tissues in the BUS image, the prediction of the ROI-CNN may involve non-tumor region besides the tumor region. Moreover, the contour of the predicted region may have a bias from that of the real tumor contour. Therefore, additional refinement is imperative to ensure the effectiveness of the predicted ROI.
To ensure that only the lesion was export to subsequent grading system and improve the accuracy of the final categorization, the rough ROI from the ROI-CNN, which enclosed the breast tumor region, was then further refined by the following steps: (1) remove the connected domain with a smaller area (smaller than 40% of the max area) and choose the connected region closest to the image center; and (2) refine the boundary with a typical C–V level-sets methodology .
Our proposed framework was implemented on Tensorflow and all experiments were conducted on a workstation equipped with a 2.40 GHz Intel Xeon E5-2630 CPU and an NVIDIA GF100GL Quadro 4000 GPU.
During the training phase of the ROI-CNN, the layers in the blue dotted box (refer to Fig. 2) were initialized with a VGG model  based on a pre-trained image classification dataset provided by ImageNet Large-Scale Visual Recognition Challenge in 2012 (ILSVRC-2012 CLS). The other layers in the ROI-CNN were initialized with a Gaussian randomizer. The minibatch size involved 16 images, and the optimizer SGD [44, 45] was set with a learning rate of 0.0001 and a momentum of 0.9 until convergence was attained.
In the training phase of the G-CNN, Random initialization was employed to yield better performance and faster convergence. 16 images were set as the minibatch size, and the SGD optimizer was set with learning rates of 0.001 which would be gradually decreased by a factor of 0.9 until convergence was attained.
To validate the effectiveness of the grading scheme for breast tumors from US images, the localization and grading results were evaluated by comparing the corresponding manual annotations and labeling from the three physicians. The experiments implemented two aspects to assess our grading system. One was the effect of different options in tumor identification stage on the final grading results, and the other was the discriminative capability for different breast tumor categories.
The accuracy of the identified tumor
Three metrics were utilized to quantitatively evaluate the similarity between the predicted contour and the ground truth contour, including the Dice similarity coefficient (DSC) [46, 47], Hausdorff distance between two boundaries (HDist) [47, 48], and average distance between two boundaries (AvgDist) . DSC was employed to examine the overlapping areas between the two comparisons. HDist and AvgDist were exploited to measure the Euclidean distance between a computer-identified tumor boundary and the boundary determined by physicians. Higher DSC, lower HDist, and lower AvgDist corresponded to more similarity between the two boundaries. Furthermore, AUC values and ROC curves were exploited to evaluate the performance of different experiments with a variable scope of ROIs.
Image data involved in each stage of the categorization system
Each involved image was scored by three physicians with more than 3 years of experience performing BUS examinations based on the BI-RADS criteria. If the physicians differed in their annotations of the category, they discussed and then made consensus on the final category of the breast tumor.
Data preprocessing and augmentation
Due to the sample size of volunteer patients is limited, effective data preprocessing and augmentation is imperative for medical image datasets. The premise of augmentation is that the ROI must be incorporated into all augmented data regardless of the type of transformation exposure on the dataset.
Data augmentation in the ROI-CNN model
In the ROI-CNN training stage, the augmentation times of each input were set to the same with the number of training epochs. This type of augmentation can enhance the randomization of input data and reduce the possibility of overfitting of the trained model, thus improving the robustness of the ROI-CNN model. Each input image was followed by the subsequent procedures in each calculated epoch, including random brightness, random contrast, random movement, random flip, and standardization. Each input can export N times outputs while experiencing N epochs. Conversely, in the testing phase, only standardization was exposed using input samples.
Data augmentation in the G-CNN model
In the G-CNN training stage, to maintain the shape textures of breast tumors for the final classification, only geometric translation and flipping were involved. The original datasets were augmented four times with random movement, in which two augmentations were followed by horizontal flipping.
Effect of identification accuracy on the final grading
The coverage of localization, which denotes the area of the ROI, theoretically affects the feature mapping and may influence the final grading. To investigate the effects of the accuracy of the identified breast tumor on the final categorization from the BUS images, three types of import into the G-CNN with the corresponding experiments were involved and denoted as “No ROI-CNN”, “No Refined ROI-CNN”, and “Refined ROI-CNN”. “No ROI-CNN” corresponded to the experiment in which the input to the G-CNN directly applied the C–V level-sets method to input US images and lacked the prediction on the rough localization by the ROI-CNN. In the “No Refined ROI-CNN” experiment, the output of the ROI-CNN was not refined and was directly exported to the G-CNN. In the “Refined ROI-CNN” experiment, the original US images underwent complete processing procedures in our designed scheme.
The parameters of our designed method, experiment “Refined ROI-CNN”, were set as follows; (1) μ1, μ2, and α in equation (4) were all set to 1; (2) the maximum number of contour evolution iterations was set to 50. The parameters μ1, μ2, and α in the C–V level-sets experiment was the same as those in our refined ROI experiment. But the maximum number of contour evolution iterations in the “No ROI-CNN” experiment was set to 1000.
One-stage vs. two-stage categorization of BUS images
Making full use of the effective features is likely to achieve better categorization of breast tumor. To investigate the superiority of our two-stage system on grading BUS images, the accuracy of the predicted categorization of tumor was employed as an indicator, we compared the categorization of two-stage grading system with that of the one-stage grading architecture, which directly classified input breast US images into six classes, including the background and five breast tumor categories.
There are two types of the two-stage methods, one is with the refinement procedure, and the other is without the refinement procedure. In each type of the two-stage methods, we compared our G-CNN model with the other two typical classification network, one is the VGG network , and the other is the ResNet50 network . Totally, there are six experiments in the two-stage methods. For the one-stage classification methods, three experiments are included: (1) experiment “One-stage G-CNN”, which directly classified the input into 5 categories with the our proposed G-CNN architecture (refer to Fig. 3); (2) experiment “One-stage VGG”, which directly classified the input BUS image into 5 categories with the VGG architecture; (3) experiment “One-stage ResNet”, which directly classified the input BUS image into 5 categories with the ResNet50 architecture.
Effect of the identification on final grading accuracy
Comparisons of different identification implementations
0.6007 ± 0.0302
51.2115 ± 2688.8
82.7174 ± 3423.1
0.8665 ± 0.0133
7.6764 ± 157.8465
23.5292 ± 531.9632
0.9125 ± 0.0015
3.9668 ± 7.0654
11.0110 ± 40.6948
One-stage vs. two-stage framework
Comparisons of one-stage models and two-stage models with the grading accuracy of each category
Refined ROI-CNN + G-CNN
0.998 ± 0.0040
0.940 ± 0.0110
0.734 ± 0.0662
0.922 ± 0.0376
0.876 ± 0.1234
Refined ROI-CNN + VGG
0.990 ± 0.0047
0.920 ± 0.0194
0.673 ± 0.0692.
0.908 ± 0.0493
0.841 ± 0.1319
Refined ROI-CNN + ResNet
0.991 ± 0.0052
0.927 ± 0.0155
0.688 ± 0.0665
0.920 ± 0.0388
0.858 ± 0.1423
ROI-CNN + G-CNN
0.955 ± 0.0076
0.897 ± 0.0112
0.679 ± 0.0678
0.906 ± 0.0434
0.837 ± 0.1232
ROI-CNN + VGG
0.947 ± 0.0145
0.864 ± 0.0132
0.667 ± 0.0726
0.865 ± 0.0463
0.818 ± 0.1281
ROI-CNN + ResNet
0.954 ± 0.0136
0.878 ± 0.0124
0.669 ± 0.0664
0.899 ± 0.0425
0.835 ± 0.1338
0.797 ± 0.0190
0.552 ± 0.0312
0.496 ± 0.0876
0.715 ± 0.0674
0.559 ± 0.1257
0.723 ± 0.0244
0.533 ± 0.0471
0.460 ± 0.0888
0.644 ± 0.0535
0.436 ± 0.1328
0.755 ± 0.0268
0.550 ± 0.0302
0.472 ± 0.0799
0.692 ± 0.0585
0.508 ± 0.1402
Referring to Table 3, the one-stage with image-level classification methods (“One-stage G-CNN”, “One-stage VGG”, and “One-stage ResNet”) performs the worst on the average grading accuracy. Specially, the predicted accuracy of these methods can only achieve about 0.5 on the “Category 4A”, “Category 4B”, and “Category 5”. By introducing an extra identification network, the two-stage methods without refinement (“ROI-CNN + G-CNN”, “ROI-CNN + VGG”, and “ROI-CNN + ResNet”) can have an accuracy improvement on each category compared with the one-stage methods. Particularly, expect the “Category 4B”, the average accuracy of the other four categories are over 0.8 for the two-stage methods without refinement. Among of the listed three experiments (“ROI-CNN + G-CNN”, “ROI-CNN + VGG”, and “ROI-CNN + ResNet”), the experiment “ROI-CNN + G-CNN” achieves the highest accuracy on all the categories. And the experiment “ROI-CNN + VGG” performs the lowest accuracy among of the three methods. By successively adding an extra refinement procedure, we can observe that, the grading accuracy in each category of the two-stage methods with refinement procedure (“Refined ROI-CNN + G-CNN”, “Refined ROI-CNN + VGG”, “Refined ROI-CNN + ResNet”) becomes higher than that of the rest of the methods listed in Table 3. Particularly, in contrast to the typical classification models (VGG  and ResNet50 ), the methods with our G-NN network still performs better on the grading accuracy.
For the experiment “Refined ROI-CNN + G-CNN”, the predicted accuracy in Category “3” can reach an average of 0.998, which is rather close to one. Both Category “4A” and Category “4C” can achieve an average accuracy greater than 0.9, and the prediction accuracy for Category “5” can obtain an average of 0.876, which close to 0.9. Thus, we suggest that our predictions for the four categories are effective and highly accurate. Note that, the average accuracy of the Category “5” is less than 0.9, and the biased predictions for the Category “5” are primarily located in Category “4C”, due to the following two factors: 1) the number of samples in Category “5” is smaller than the numbers in the other categories, and 2) the ratio of the benign samples in Category “5” is higher than the ratio of the malignant samples in Category “5”. Although the prediction for Category “4B” cannot approach 0.8, it is higher than the statistical probablity. According to the criteria in Table 1, for each test tumor belonging to the Category “4B”, the probability of benign and malignant is rather close, even the experienced physicians can hardly determine the tumor’s category merely from the US image. Clinically, extra diagnostic tests such as biopsy, is required to determine the final accurate results. With only one type of source associated with US images, our two-stage grading system in Fig. 1 can achieve the best performance on grading accuracy than the other methods listed in Table 3.
The grading accuracy predicted by our two-stage categorization system
Automatic quantitation of the category of breast tumor from US scanning can assist physicians in the tedious diagnosing task. This is the first comprehensive quantized grading evaluation on breast ultrasound images based on the criteria of BI-RADS. With the CNN architecture, which can automatically learn and extract goal-oriented features from images, our two-stage grading system can accurately identify the tumor region and discriminate the category of the breast tumor in ultrasound images. Our grading system can achieve a 5-score categorization of BI-RADS, covering Category 3, Category 4a, Category 4b, Category 4c, and Category 5, thus potentially relieving the burden of a time-consuming image review process and alleviating influence due to physicians’ experiences in clinical practice. Additionally, the proposed categorization system can ensure the robustness and effectiveness of the fully automated categorization system by decoupling the identification features and classification features.
The proposed two-stage architecture can make full use of the effective features from breast US images by effectively decoupling the information of identification and categorization, thus improving the final grading accuracy. The identification task focuses on distinguishing the tumor from the background, while the grading task concentrates on classifying the breast tumors into different classes, so the features used to identify tumor from the background are different from those applied to classify the tumor into five categories. Referring to Table 3, the accuracy of the final diagnosis illustrates that our two-stage CNNs can achieve better performance than the one-stage methods. The results in Table 3 indicates that the two-stage architecture is more suitable for grading BUS images, because the features of the identification task and categorization task cannot be well compatible. Therefore, a two-stage grading system can ensure higher accuracy, which is a rather vital indicator of medical image analysis, in classifying breast tumor categories.
In the two-stage grading system, the designed identification model and refinement procedure contribute to achieving an effective ROI for the following classification model, thus reaching a desirable grading result. Generally, additional irrelevant information imported to the G-CNN models may be translated into interference and produce an unsatisfactory grading result. The results in Table 2 and Fig. 4 also suggest that with more accurate and precise ROI input to the G-CNN, the better implementations for grading breast tumors are possible. Affected by the abundant speckle noise in ultrasound images, the contours resulted from the level-set methods  may occur large bias during the evolution process. Therefore, the single refinement procedure cannot generate effective ROIs for the following G-CNN model (refer to Table 2). Meanwhile, the predictions from the ROI-CNN are usually smooth in the boundary, some detail information will be lost particularly for the malignant cases, so the single ROI-CNN model is inadequate in providing a desirable ROI for the following classification network. Therefore, by combining the ROI-CNN model and the refinement procedure together, the predicted ROI can be closer to the real breast tumor so as to provide more elaborate ROIs for the following symmetric architecture G-CNN.
Table 3 illustrates that our G-CNN model performs better accuracy than the typical VGG  and RestNet 50 . Generally, the enhancement of effective features can facilitate the final grading accuracy for the classification model. In our proposed G-CNN, the layers embedded in the concatenate path and the skip connections (refer to Fig. 3) can combine the lower dimensional feature maps and the higher dimensional features together, thus promoting discriminability for classifying different breast tumor categories. In contrast, the typical classification models, such as VGG  and RestNet 50 , only involve the encode path in Fig. 3 and export the high-level information to implement classification. However, the high-level features may suffer a loss of the texture or boundary information contained in the lower convolution layers. The texture and the boundary information usually provide important hints for the classification task according to BI-RADS. Due to the lack of a compensation strategy, ResNet50 and VGG cannot perform desirable accuracy on classifying the breast ultrasound images. Therefore, we conclude that better grading results can be achieved with the enhanced feature maps from G-CNN.
Our grading system has a desirable performance on the “Category 3”, “Category 4A” and “Category 4C”, which can obtain the accuracy greater than 0.9. But for the “Category 4B” and the “Category 5”, the grading accuracy of are lower than that of the other three categories (refer to Table 3). This may be caused by that the data amount of the “Category 4B” and “Category 5” are less than that of the other three categories. Although the prediction for Category “4B” cannot approach 0.8, it is significantly higher than manual decision. According to the criteria in Table 1, each tumor falling into the Category “4B” may have a close probability of being benign or malignant. Even the experienced physicians may have biases in determining the category only from the US image. Clinically, further diagnostic tests such as biopsy, is needed to achieve the final accurate results. With only one type of source associated with US images, our grading scheme can adequately predict the category of breast tumors. In the future, we will continue to collect more data, particularly on the “Category 4B” and “Category 5”, to further increase the prediction accuracy of our grading system. Moreover, we plan to integrate the images in Category 0, 1, 2, and 6 into current grading system, thus developing a comprehensive and complete computerized BI-RADs grading system.
In this study, we proposed a two-stage automatic categorization system to quantize the criteria of BI-RADS and offer an objective assessment. Based on deep learning techniques, a series of comprehensive explorations were conducted using a combination of the procedures of CNN-based methods, typical image processing schemes, and the CNN architecture applicable to breast US images. The proposed scheme can also serve as an assistant computerized toolkit for the education of radiology residents and medical students to improve their discriminative skills in breast tumor examination with US scanning. Meanwhile, the proposed grading scheme based on CNN can be easily extended to analyses of other breast ultrasound images generated from other equipment without extra feature engineering.
YH raised the conception of the article, analyzed the ultrasound data in this study and draft the article. LH released the annotation tool to convenient the physicians’ annotation task in this study. YH and LH designed the experiments in this study together. HD realized the experiment in the part of classification module. ZY helped check the grammar of this study. HL collected the data and organized the annotation task in this study. JZ, QL and GY revised the article. All authors read and approved the final manuscript.
The authors would like to thank to the physicians in the Department of Ultrasound, West China Hospital of Sichuan University for their helpful contribution into validating the categories of each collected breast US image.
The authors declare that they have no competing interests.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Consent for publication
Ethics approval and consent to participate
This study was approved by the Medical Ethics Committee of the West China Hospital, Sichuan University, and written informed consent was obtained from each participant.
National Natural Science Foundation of China (Grant No. 61273361).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.American Cancer Society Atlanta. Cancer facts and figures 2013. Atlanta: American Cancer Society Atlanta; 2013.Google Scholar
- 3.Center M, Siegel R, Jemal A. Global cancer facts & figures. Atlanta: American Cancer Society; 2011. p. 1–52.Google Scholar
- 10.Mendelson E, Böhm-Vélez M, Berg W, Whitman G, Feldman M, Madjar H. ACR BI-RADS® Ultrasound. ACR BI-RADS® atlas, breast imaging reporting and data system. Reston: American College of Radiology; 2013. p. 149.Google Scholar
- 21.Gong M, Zhang K, Liu T, Tao D, Glymour C, Schölkopf B. Domain adaptation with conditional transferable components. JMLR workshop and conference proceedings. 2016;48:2839–48.Google Scholar
- 22.Dong S, Shanhui S, Xin W, Ming L, Heye Z, Guang Y, Huafeng L, Shuo L. Holistic and deep feature pyramids for saliency detection. In: BMVC. 2018.Google Scholar
- 23.Shin SY, Lee S, Yun ID, Kim SM, Lee KM. Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images. In: IEEE transactions on medical imaging; 2018. p. 1–1.Google Scholar
- 24.Chiang T, Huang Y, Chen R, Huang C, Chang R. Tumor detection in automated breast ultrasound using 3-D CNN and prioritized candidate aggregation. In: IEEE transactions on medical imaging; 2018. p. 1–1.Google Scholar
- 25.Yap MH, Pons G, Marti J, Ganau S, Sentis M, Zwiggelaar R, et al. Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J Biomed Health Inform. 2017;99:1.Google Scholar
- 26.Cheng B, Ran L, Chou YH, Cheng JZ. Boundary regularized convolutional neural network for layer parsing of breast anatomy in automated whole breast ultrasound. 2017.Google Scholar
- 29.He T, Puppala M, Ogunti R, Mancuso JJ, Yu X, Chen S, et al. Deep learning analytics for diagnostic support of breast cancer disease management. In: 2017 IEEE EMBS international conference on biomedical & health informatics (BHI). 2017.Google Scholar
- 31.Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: European conference on computer vision. Berlin: Springer; 2014.Google Scholar
- 32.Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 IEEE international conference on computer vision (ICCV). 2011.Google Scholar
- 33.Szegedy C, Wei L, Yangqing J, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015.Google Scholar
- 34.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012.Google Scholar
- 35.Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th international conference on computer vision. 2009.Google Scholar
- 37.Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition. 2015.Google Scholar
- 39.Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y. OverFeat: integrated recognition, localization and detection using convolutional networks. Eprint Arxiv; 2013.Google Scholar
- 40.Papandreou G, Kokkinos I, Savalle PA. Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: IEEE conference on computer vision and pattern recognition. 2015.Google Scholar
- 41.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. 2015.Google Scholar
- 42.Dunne RA, Campbell NA. On the pairing of the softmax activation and cross-entropy penalty functions and the derivation of the softmax activation function. In: Proc. 8th Australian conference on the neural networks, Melbourne, 181. 1997.Google Scholar
- 43.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556; 2014.
- 44.Xu W. Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490; 2011.
- 45.Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Berlin: Springer; 2010. p. 177–86.Google Scholar
- 49.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.