Keywords

1 Introduction

Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death among women worldwide [1]. Hence, early diagnosis is a crucial factor in breast cancer treatment, where medical images are important sources of diagnostic information. Currently, breast ultrasound (BUS) is an important coadjuvant technique to mammography (x-ray) in patients with palpable masses and normal or inconclusive mammogram findings [2]. Also, BUS images are particularly effective in distinguishing cystic from solid lesions and are useful for differentiating between benign and malignant tumors [3].

In order to assist radiologists in the BUS image interpretation, computer-aided diagnosis (CAD) systems have emerged as a ‘second reader’ for analyzing the images by using computational approaches. Generally, the pipeline of a CAD system involves four basic stages: image preprocessing, lesion segmentation, feature extraction, and lesion classification [4]. Then, radiologists can take the CAD outcome as a second opinion and make a more conclusive diagnosis for reducing unnecessary biopsies in benign cases [5].

Image preprocessing commonly increases the contrast between the lesion region and its background, and also considers low-pass filtering to reduce the speckle artifact. Next, BUS segmentation procedure separates the lesion region from its background and other tissue structures. Thereafter, from segmented lesions, morphological and texture features are usually computed and to improve the between-class discrimination, relevant features are selected. These features represent the classifier inputs for distinguishing the lesions into benign and malignant classes [4].

In literature, a plethora of approaches have been proposed to address each stage of CAD systems for BUS images. In this sense, Cheng et al. [4] and Huang et al. [6] presented comprehensive surveys related to BUS image analysis. Despite the large quantity of proposed approaches, to get useful computational implementations for research purposes is usually difficult, because the source codes or programs are not commonly shared by the authors.

Hence, we introduce a MATLAB (The MathWorks, Natick, Massachusetts, USA) [7] toolbox for BUS image analysis, aiming to share with the research community the efforts that we made to implement several methods to develop CAD systems for breast ultrasound. The toolbox is composed of 62 functions divided into four sections: image preprocessing, lesion segmentation, feature extraction, and classification. This toolbox could be downloaded from our permanent link http://www.tamps.cinvestav.mx/~wgomez/downloads.html.

2 Toolbox Organization

The Breast Ultrasound Analysis Toolbox (BUSAT) has 62 functions oriented to image preprocessing (contrast enhancement, despeckling, and domain transformation), lesion segmentation (semi-automatic and fully-automatic methods), feature extraction (morphological, texture, and BI-RADS lexicon), and classification (linear and non-linear classifiers). Figure 1 illustrates the general organization of BUSAT and the list of available functions.

It is worth mentioning that all the functions were codified by our research group based on several articles from literature; hence, all the implemented methods have theoretical basis. In addition, several functions take advantage of some methods developed by other research groups to guarantee the quality of the results, for instance, LIBSVM to train Support Vector Machines [8], minimum redundancy maximum relevance (mRMR) for feature selection [9], etc.

Fig. 1.
figure 1

Organization of the BUS analysis toolbox and list of functions.

On the other hand, the main BUSAT directory contains the following six subfolders:

  • Data: contains data files and test images to run the examples of the toolbox.

  • Preprocessing: 13 functions for contrast enhancement, speckle filtering, and domain transformation.

  • Segmentation: four functions for lesion segmentation.

  • Features: 29 functions for computing morphological, texture, and BI-RADS features.

  • Classification: 16 functions for lesion classification in benign and malignant classes.

  • C functions: 21 compiled C code functions that are used by several functions of the toolbox.

3 Toolbox Usage

3.1 Installation

To start using BUSAT, the script RUN_ME_FIRST should be firstly run to add all the toolbox directories to the MATLAB search path.

3.2 Help Topics

To display the organization BUSAT, type in the MATLAB Command Window the statement help Contents. Note that every listed function has a hyperlink to its own help topics. Also, the user can consult the help topics of a specific function by typing the statement help followed by the name of the function as illustrated in Fig. 2. Observe that help topics are displayed in three parts: the syntax explanation of the function, an illustrative example, and the reference or bibliography for theoretical details. Also, hyperlinks to similar functions are showed.

Fig. 2.
figure 2

Example of help topics for a specific function.

3.3 Running Examples

Every function in BUSAT could be tested by running the example provided in the help topics. This could be performed by copying and pasting the example text on the Command Window. In the case of image preprocessing and lesion segmentation functions, both the original and the processed images are showed. For instance, images showed in Fig. 3 are displayed after running the example code in Fig. 2.

Fig. 3.
figure 3

Example of a BUS image despeckled by isfad function.

3.4 Special Considerations

Two special considerations should be taken into account:

  1. 1.

    C code functions: despite BUSAT provides compiled C code functions (called mex functions) for Linux, Mac OS and Windows using 64-bits processors, in some operative systems they should be recompiled from the source codes by using the MATLAB mex function. These source codes are provided within the directory Source_C_codes.

  2. 2.

    Parallel Computing Toolbox: to speed-up the execution of the functions autosegment, trainLSVM, trainSVM, trainRBF, and featselect, the parallel pool is automatically open if the MATLAB Parallel Computing Toolbox is available, otherwise, the functions are sequentially executed.

4 Practical Examples

4.1 Building a CAD System

BUSAT is useful to quickly build a CAD system by following the pipeline in Fig. 4. Note that distinct functions of contrast enhancement, speckle filtering, lesion segmentation, feature extraction, and lesion classification could be combined to create a specific CAD system.

Fig. 4.
figure 4

Conventional pipeline of a CAD system for BUS images.

Herein, BUSAT is used to exemplify the implementation of a CAD system that uses five morphological features and linear classification [10]. The implemented CAD system uses the wtsdsegment function to segment the breast lesion. This function already considers the image preprocessing, where contrast enhancement is performed by sace function, whereas speckle filtering is performed by chmf function. Thereafter, the segmentation algorithm based on watershed transformation is applied to get the lesion contour [11]. Next, five morphological features are computed: elliptic-normalized skeleton, lesion orientation, number of substantial protuberances and depressions, depth-to-width ratio, and overlap ratio. Finally, classifyLDA function classifies the lesion in benign and malignant classes by using linear discriminant analysis (LDA). Obviously, the LDA classifier should be previously trained with the trainLDA function to create the prediction model. Then, the MATLAB program that implements the CAD system is written as follows:

figure a

4.2 Evaluating a CAD System

When a CAD system is developed, it is necessary to evaluate its classification performance in terms of some indices such as accuracy, sensitivity, specificity, area under the ROC curve, etc.

Let \(\mathcal{X}=\{\mathbf {x}_1,\dots ,\mathbf {x}_n\}\) be a feature space with n observations, where the ith observation is a d-dimensional feature vector denoted by \(\mathbf {x}_i=[x_{i,1},\dots ,x_{i,d}]\). Also, the observation \(\mathbf {x}_i\) is associated to a class label \(y_i \in \{1,2\}\), where 1 and 2 denote benign and malignant lesions, respectively. Note that this kind of labeling is required by the training functions, although depending on the classifier the labels are adjusted. For instance, for the SVM classifier, the label \(y = 1\) becomes \(y = -1\) and the label \(y = 2\) becomes \(y = +1\).

Then, to perform CAD assessment, from the \(\mathcal{X}\) set, training and test sets should be created, where the former is used to generate the prediction model and the latter is used to evaluate the classifier generalization. In addition, if the classifier requires hyperparameters, a grid-search scheme and k-fold cross validation method are automatically performed by the training functions to tune such parameters. For instance, the function trainSVM adjusts both the soft margin parameter C and the Gaussian kernel parameter \(\gamma \), if they are not introduced in the input arguments of the function.

BUSAT contains the classperf function to evaluate the classification performance of a CAD system. Suppose that a user generates a feature matrix X of size \(n \times d\) and a target vector Y of size \(n \times 1\). Also, suppose that the CAD’s classifier is based on SVM with Gaussian kernel. Then, the following MATLAB program implements the evaluation of a CAD system:

figure b

5 Experimental Results

BUSAT contains three classifiers for distinguishing between benign and malignant lesions: linear discriminant analysis (LDA), support vector machine (SVM) with Gaussian kernel, and radial basis function network (RBFN). These classifiers are evaluated within a CAD system to determine which method performs better in terms of the indices Matthews correlation coefficient (MCC), area under the ROC curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) [12].

The BUS dataset considered 1,128 cases from 659 female patients acquired during routine breast diagnostic procedures at the National Cancer Institute (INCa) of Rio de Janeiro, Brazil. All the cases were histopathologically proven by biopsy, where 781 images presented benign lesions and 347 images had malignant tumors. The images were collected from three ultrasound scanners with linear transducer arrays with frequencies between 7.5 and 12 MHz: Logiq 7 (GE Medical System Inc.), Logiq 5 (GE Medical System Inc.), and Sonoline Sienna (Siemens).

The entire dataset was segmented by the wtsdsegment function. Next, 25 morphological and texture features were computed, which are summarized in Table 1. The feature space was randomly split in training (90%) and test (10%) sets, which were normalized by the softmaxnorm function. Thereafter, LDA, SVM, and RBFN classifiers were trained by the functions trainLDA, trainSVM, and trainRBFN, respectively. It is worth mentioning that trainSVM and trainRBFN functions perform grid-search and k-fold cross validation method (with \(k=10\)) to tune their parameters. In the case of the SVM, the C and \(\gamma \) parameters are adjusted, whereas for the RBFN, the number of hidden units is determined. Finally, the test set was classified by the functions classifyLDA, classifySVM, and classifyRBFN, and the classification performance of each classifier was evaluated by the classperf function. For statistical analysis, 50 independent runs of training-testing procedure was performed.

Table 1. Computed features for lesion classification. \(\mathcal{M}\) and \(\mathcal{T}\) denote morphological and texture features, respectively. Symbol # denotes number of features.

Table 2 summarizes the classification performance results obtained by the three evaluated classifiers. Besides, Table 3 shows the one-way analysis of variance (ANOVA) results to test whether the mean values between compared classifiers are different at \(\alpha =0.05\). Also, the Scheffe’s method determines if there is statistical significance between two classifiers.

Table 2. Classification performance results (mean ± standard deviation).
Table 3. p-values of the statistical comparison between classifiers. Symbol (–) denotes that groups are not statistically significant different (i.e., \(p>0.05\)), contrarily symbol (+) indicates that groups are statistically significant different (i.e., \(p<0.05\)).

It is notable that the three classifiers did not present statistical differences in terms of MCC and AUC indices, that is, they are capable of distinguishing adequately between benign and malignant cases. However, the SVM classifier outperformed its counterparts in terms of sensitivity (SEN = 0.90) and accuracy (ACC = 0.89), whereas the RBFN classifier obtained the best results in terms of specificity (SPE = 0.94). These results pointed out that the SVM classifier is adequate to be implemented within a CAD system for BUS images.

6 Conclusions

This paper presented the Breast Ultrasound Analysis Toolbox (BUSAT) for MATLAB, which contains several approaches proposed in literature to perform image preprocessing (contrast enhancement and speckle filtering), lesion segmentation (semi-automatic and fully-automatic methods), feature extraction (morphological, texture, and BI-RADS lexicon), and classification (linear and non-linear classifiers).

We presented the experimental results of the evaluation of three classifiers (LDA, SVM, and RBFN) to distinguish between benign and malignant cases, where SVM presented an adequate classification performance. Obviously, the configuration of the CAD system could lead to different classification results, that is, the image preprocessing techniques, the segmentation method, and the computed features impact on the lesion classification. Thus, the potential of BUSAT is the versatility to build and evaluate different configurations of CAD systems in reduced time.

To the best of our knowledge, BUSAT is the first toolbox intended to provide to the research community an easy and quick way to codify programs for computer-aided diagnosis for breast ultrasound. In addition, because the source codes are available to the users, it is possible to modify the functions in order to enhance the implemented methods or reuse code in new functions. Feature work considers to increase the number of implemented methods, for instance, new multiclass classifiers for BI-RADS categorization.