1 Introduction

Breast cancer, the most common cancer among women, is ranked as the second leading cause of cancer-related death, in North America. Annually, 1.3 million new cases of breast cancer are diagnosed worldwide [1]. Prescreening is typically carried out using clinical breast examination or self-breast examinations that suffers from high false-positive rates. Ultrasound, X-ray mammography, and magnetic resonance imaging (MRI) are the most commonly used imaging modalities for breast cancer detection. While X-ray mammography is the primary screening technique, it is often a painful exam that is mainly recommended for women over the age of 50, due to its low sensitivity (67.8%) for younger women or women with dense breasts as well as its potential health risk due to its ionizing radiation. Ultrasound and MRI modalities are well adapted for differentiating benign and malignant masses in dense breast tissue, however, ultrasound suffers from higher false positive rates compared to mammography and its effectiveness varies depending on the skill of the technician, whereas MRI is more costly and associated with long wait times [2].

New research [3, 4] focuses on a novel imaging modality for breast cancer based on near-infrared (NIR) diffuse optical tomography (DOT), a non-invasive and non-ionising imaging modality that has demonstrated its clinical potential in probing tumors. DOT is a particularly-beneficial diagnostic method for women with dense breast tissue. DOT enables measuring and visualizing the distribution of tissue absorption and scattering properties where these optical parameters are related to physiological markers, e.g., blood oxygenation and tissue metabolism. When multiple wavelengths are used, DOT can map deoxyhemoglobin and oxyhemoglobin concentrations, which in turn can be used to quantitatively assess tissue malignancy from total hemoglobin concentration.

Recently, we developed a new functional hand-held diffuse optical breast scanner probe (DOB-Scan) [5] that has been applied to breast cancer detection as a screening tool and aims to improve the assessment parameters in terms of positive predictive value and accuracy. The probe is currently in clinical trials for in vivo breast cancer imaging studies. It combines multi-frequency and continuous-wave near-infrared light to quantify tissue optical properties in 690 to 850 nm spectra and produces a cross-sectional image of the underneath tissue. The proposed probe uses encapsulated light emitting diodes instead of laser-coupled fiber-optic, which decreases the complexity, size, and cost of the probe while providing accurate and reliable optical properties measurement of the tissue. In this work, we focus on improving the image reconstruction from DOB-Scan probe measurements using machine learning technique.

Image reconstruction methods are mostly analytic and often suffer from well-known reconstruction problems, e.g., noise, motion artifacts, image degradation due to short acquisition time, and computational complexity [6]. Iterative reconstruction algorithms have become the dominant approach for solving inverse problems over the past few decades [7]. While iterative reconstruction with regularization, e.g., total variation, provides a way to mitigate some of the shortcomings of analytic reconstruction it remains difficult to obtain a method that is fast, provides high-resolution images, and requires a simple calibration process [8].

A more recent trend is machine learning based image reconstruction, which is motivated by the outstanding performance of deep learning on computer vision problems tasks, e.g., object classification and segmentation. Convolutional neural networks (CNNs) have previously been applied to medical image reconstruction problems in computed tomography and MRI [9,10,11]. Many approaches [6, 12, 13] obtain an initial estimate of the reconstruction using a direct inverse operator or an iterative approach, then use machine learning to refine the estimate and produce the final reconstructed image. Although this is a straightforward solution, the number of iterations required to obtain a reasonable initial image estimate can be hard to define and in general increases the total reconstruction run-time.

A more elegant solution is to reconstruct an image from its equivalent projection data directly by learning all the parameters of a deep neural network, in an end-to-end fashion and therefore, approximates the underlying physics of the inverse problem. In [14], a unified framework for image reconstruction that allows a mapping between sensor and image domain is proposed. A pre-trained CNN model is used to learn a bidirectional mapping between sensor and image domains where image reconstruction is formulated in a manifold learning framework. The trained model is tested on a variety of MRI acquisition strategies.

While deep learning based image reconstruction has been applied to a variety of medical imaging modalities, they have not yet been used for DOT. In this paper, we propose a deep DOT reconstruction method to learn a mapping between raw acquired measurements and reconstructed images. The raw collected data can be considered as image features that approximate nonlinear combinations of image pixel values, which form the desired tissue optical coefficients. Therefore, the raw measured data is a nonlinear function of the desired image pixels values and so performing image reconstruction amounts to learning to invert this nonlinear function. We propose to use deep neural networks to learn, from training data, this nonlinear inverse mapping.

To train our model, we rely on synthetic datasets of image pairs and their corresponding measurements that simulate real-world DOT signals. We leverage a physics-based optical diffusion simulator to generate these synthetic datasets. We evaluate our system on real measurements on phantom datasets collected with the NIR DOB-Scan probe and show the utility of our synthetic data generation technique in mimicking real measurements and the generalization ability of our model to unseen phantom datasets. The performance of our proposed system shows that our framework improves reconstruction accuracy when compared against a baseline analytic reconstruction approach.

2 Methodology

Our main goal is to reconstruct tomographic images from corresponding sensor-domain sampled data or measurements. To this end we collect training measurements from (a) synthesized tissue geometries with known optical properties using a physics-based simulation of the forward projection operation, and (b) data collected using the probe on physical phantoms. We describe the generation of synthetic training datasets as well as the design of the neural network architecture below.

2.1 Generating Training Data for DOT Reconstruction

Synthetic Datasets: Our aim here is to create training data pairs in-silico, which include image of optical tissue property and its corresponding measurement. The deep learning model will then be trained to generate the image from the measurement. We synthesize different geometries of tissue, i.e. different breast shapes and sizes and different lesion shapes, sizes, and locations, and model them as 2D triangular meshes. We then assign to these geometries optical transport parameters (absorption and scattering coefficients) similar to real human breast tissue and lesion distribution values [15].

To collect synthetic DOT measurements, we used the Toast++ software suite [16], which simulates the forward projection operation to generate projection measurements for each training mesh. Modelling the probe sources and detectors accurately in Toast++ was a critical step in obtaining realistic measurements that mimic real values obtained by the DOT probe. The source model we created consisted of two light sources that deliver near-infrared light to a body surface at different points. The detector model is defined as a row of detectors that measure the back-scattered light from the tissue and emitted from the boundary. The simulated light source and detectors’ spatial distribution were defined to mimic the probe geometry detailed in [5], which comprise 2 LED light sources that illuminate tissue symmetrically and surround 128 detectors. Both LED and all detectors are colinear as depicted in Fig. 1. The forward projection simulation captures a 1D raw intensity diffraction resulting from the scattering of the illuminating light exiting the test object.

Phantom Dataset: To create physical phantom datasets we rely on a tissue-equivalent solution where an intralipid solution is used to mimic background breast tissue due to its similarity in optical properties to breast fat [3, 4]. Measurements are collected with the DOB-Scan probe. In order to mimic cancerous lesions, a tube with 4 mm cross-sectional diameter was filled with a tumor-like liquid phantom (Indian black ink solution) and was placed at different locations inside the intralipid solution container. The flowchart of synthetic and phantom data acquisition procedures are shown in Fig. 2 (Left side).

Fig. 1.
figure 1

The spatial distribution of the simulated sources and detectors matching the layout of the physical probe (left). A sample synthetic mesh is also shown (right).

Fig. 2.
figure 2

In silico training pairs generation using TOAST++ and phantom test pairs collection using DOT-probe are depicted on the left. The overall architecture of the proposed model is shown on the right, where the arrow after the first fully connected layer represents the reshaping procedure before the convolution layers.

2.2 Reconstructing Images from DOT Measurements

By passing an input measurement through a set of nonlinear transformations one can reconstruct the equivalent image. The proposed architecture consists of a dense layer followed by a set of convolution layers which are designed to efficiently combine features from the first layer with those of deeper layers. The architecture of our proposed model is shown in Fig. 2 (right side).

Initial Image Estimate: A fully connected layer, with a ReLu activation, is used as the first layer of the network in order to map the measurement vector to a two-dimensional array that will serve as an initial image estimate. This layer is first pre-trained then included in the deeper architecture including convolutional layers. The goal we seek to achieve using the fully connected layer is to generalize the filtered back projection (FBP) operation by learning a weighted combination of the different receptive sensors based on the signal collected from scattered light emitted at different locations in the reconstructed tissue. Empirically we did not observe any improvements in the reconstruction results using more than one fully connected layer. This may be related to the size of the input measurement which is only 256 dimensional in our dataset. Higher dimensional inputs may benefit from additional layers.

Convolutional Layers: A set of convolutional layers, with 64 channels, are used to refine the first image and produce the final reconstruction image. The non linear ReLU activation and zero-padding are employed at each convolution layer. All feature maps produced by all convolutional layers are set to size \(128 \times 128\). The size of the convolution filters is increased gradually to cover a larger receptive field at deeper layers and capture local spacial correlations. Details of the architecture are shown in Fig. 2.

Integration Layer: The integration layer is a convolutional layer with \(7 \times 7\) kernel size and a single output channel. It is used to reduce features across the channels from the penultimate layer of the CNN model into a single channel. The output of this layer is the reconstructed image.

Training: We trained the model by minimizing the mean squared error between the reconstructed image and the ground truth synthetic image. We used an \(L^2\) norm penalty on the last convolutional layer output as it facilitates training (i.e. we observed faster convergence using regularization). The model was implemented in Keras and trained for a total of 2,000 epochs on an Nvidia Titan X GPU using batch gradient descent with momentum. The learning rate was set to 0.001 and we used a learning decay of \(1\mathrm {e}{-6}\), momentum was set to 0.9. All training hyper-parameters were optimized via grid search on a validation set. We sequentially trained the model to first reconstruct an image using the fully connected layer only, then we fine-tuned the entire architecture after including the different convolution layers (Fig. 2).

Note that the model was only trained on synthetic data and we kept the phantom data for evaluation only, as depicted in Fig. 2. In total, we generated 4,500 synthetic training images and their corresponding simulated DOT measurements and tested our model in 200 synthetic DOT measurements then in 32 phantom real probe measurements with corresponding ground truth images.

3 Experiments and Results

We compared our results with those obtained by the analytic reconstruction approach described in [5]. Briefly, the analytic method is based on comparing the collected measurement to the measurement of a tissue-equivalent solution with homogeneous value. The resulting difference is then used to perform filtered back-projection and to estimate the spatial location of the lesion.

Qualitative Results: Once trained using the generated synthetic data, our model was tested on the phantom dataset. In Fig. 3, we visually compare our proposed reconstruction method to the analytic approach results for phantom cases. Evidently, the images reconstructed by our method are more accurate than those reconstructed by the more conventional analytic approach, when tested on data with a known ground truth. In Fig. 3 we show the reconstructed image using only the first fully connected layer which is equivalent to the filtered back-projection operation. Our qualitative results show that reconstructions obtained with one fully connected layer (third column in Fig. 3) are on par with reconstructions obtained with the analytic approach (second column in Fig. 3).

Fig. 3.
figure 3

Qualitative reconstruction performance of our model compared to conventional techniques. (a)–(d): Ground truth; analytic approach results; generalized FBP with one fully connected layer only; and proposed model results.

Quantitative Results: In order to measure the quality of the results, we consider the mean square error as well as the distance between the centre of the lesions in the ground truth image versus the reconstructed image. The peak signal to noise ratio (PSNR), the SSIM similarity measure, and the Jaccard index (intersection over union) are also calculated. The Jaccard index, used for comparing the similarity and diversity of sample sets, is the ratio of area of overlap between detected and ground truth lesion to the area of their union. This metric is computed after thresholding the reconstructed image to obtain a binary mask where foreground pixels correspond to pixels with highest optical coefficient.

Table 1. Quantitative results scores on 32 phantom test measurements

Table 1 shows the results for the phantom dataset. This experiment also allows us to evaluate the quality of the synthetic dataset we generated by testing how well a model trained only on synthetic data generalizes to unseen physical phantom images. Results reported in Table 1 show that the proposed approach is able to generalize well to the phantom dataset and achieves better performance than the baseline analytic approach in terms of distance (+50%), Jaccard index (+35%), similarity score (+14%) and PSNR (+5db). The high standard deviation in distance metric is mainly due to samples with deep lesion (lesion location \({\ge }\)30 mm) since as the lesion depth increases it becomes harder to differentiate the signal from the tumor-free tissue signal. On average, our model achieves an order of magnitude faster reconstruction than the baseline analytic approach.

4 Conclusion

This work represents a step forward for both image reconstruction in DOT and the use of machine learning in bio-imaging. We present the first model that leverages physics based forward projection simulators to generate realistic synthetic datasets and we model the inverse problem with a deep learning model where the architecture is tailored to accurately reconstruct images from DOT measurement. We test the method on real acquired projection measurements subject to sensor non-idealities and noise. Results show that our method improves the quality of reconstructed images and shows promising results towards real-time image reconstruction. In future work, we will focus on exploring even more realistic DOT simulation scenarios and extend the study to clinical cases.