1 Introduction

Image fusion is the procedure consisting of registering and combining two or more source images to obtain single image by using image processing techniques. Its main goal is to provide suitable information for human visual perception and to reduce redundancy [1] by storing a single fused image instead of multiple source images. Image fusion technology as one of the major research fields in image processing has been applied in large scale of applications such as remote sensing, computer vision and medical diagnosis. Due to the advent of disease, complementary information is required from different imaging modalities such as magnetic resonance images (MRI), computed tomography (CT), positron emission tomography (PET) and ultrasound (US) and which the selection depends on clinical requirements like the organ undergo study. Thus, multimodal medical image fusion techniques have shown notable achievement in improving accuracy of decisions in the field of medical diagnosis and treatment planning.

Image decomposition is an important tool that affects the fusion quality. Recently, multi-scale decomposition based image fusion methods has been widely used in the medical image fusion area, and has achieved great success. Wavelet theory has emerged since the beginning of the last century as a signal processing tool then directed towards image processing [2]. It has been applied for multimodal medical image fusion [3] and accomplished favorable outcome since it preserves different frequency information and allows localization both in time and spatial frequency domain. Owing to the limitation of capturing directional information, wavelets are not optimally efficient in representing images containing sharp transitions such as edges. In the past few years, multi scale geometric analysis (MGA) methods have been reported in the literature as revolutionary algorithms to overcome this deficiency. Many MGA tools have been introduced into medical image fusion for the purpose such as contourlet, ridgelet, bandelet, curvelet, etc. Those approaches have proved directional sensitivity and efficiency when dealing with medical imaging fusion process based on contourlet transform [4], non-subsampled contourlet transform [5], ridgelet transform [6], bandelet transform [7], curvelet transform [8]. Literature has reported that curvelets is an efficient transform to represent images with smooth edges similarly to contourlets which is purely discrete filter bank variety of curvelet [10]. However, multi resolution representation of the geometry cannot be provided by curvelet transform which cannot be built in the discrete domain. Moreover, contourlet transform suffer from the lack of shift-invariance [16] which was settled by non-subsampled contourlet transform but still suffering from limited number of directions and high computational cost. In recent past, shearlet theory as an extension of the wavelet framework has been provided by Labate et al. [9, 11]. It owns the advantageous properties of all above approaches and additionally it is equipped by rich mathematical structure suitable for multi resolution analysis which is very useful in for developing fast algorithmic implementations. The fact that there is no limitation on the number of directions obtained by applying the shear matrix makes the shearlet advantageous over the contourlet. Thus, shearlets build a tight frame at different scales and directions convenient to optimal sparse representation of images with edges [10]. On the other hand, Easley et al. [11] introduced the non-subsampled shearlet transform (NSST) to fill the need of shift invariance property.

Although the shearlet transform provides an efficient tool for image decomposition, one open problem that remains under investigation is how to select the appropriate fusion rules for low frequency and high frequency coefficients. The computational intelligent systems play a crucial role in the field of medicine. In [27] a method based on fuzzy classification and regions segmentation is proposed to detect tumoral zone in the brain IRM images. Besides, Neuro-fuzzy logic is one of the approaches which are finding applications in image processing fields as well as in medical image fusion [26]. As a fusion rule, it consists of a combination of artificial neural network (ANN) and fuzzy logic where neurons can be trained and the membership functions can also be applied for decision making. Neuro-fuzzy inference system (NFIS) has been adopted in [17] to fuse multimodal medical images. The recent literature in [18, 19] have also reported the combination of multi scale geometric analysis with neuro-fuzzy logic in the purpose to fuse medical images. In [18], Das et al. employed the non-subsampled contourlet transform to decompose input images and reduced pulse coupled neural network with fuzzy logic is utilized as a fusion rule. Furthermore, in [19] images are decomposed using wavelet transform then fused based on neuro-fuzzy. In this work, MRI-CT image fusion is performed in order to help as an accurate tool for planning the correct surgical procedure or therapy. In this regard, we firstly propose to decompose input CT and MRI images using the shearlet transform. Then, we perform neuro-fuzzy inference to fuse high sub band similarly to low sub band given by the shearlet decomposition.

The remainder of this paper is organized as follows: recent literature associated to shearlet transform and neuro-fuzzy in the realm of medical image fusion is described in Sect. 2. In Sect. 3, we present the proposed fusion method consisting of shearlet decomposition of the input images fused based on NFIS. Experimental results and comparisons are discussed in the last section. Finally, conclusion and future research directions are outlined.

2 Related Works

Shearlet transform (ST) is equipped with rich mathematical structure which is improved by shearing filters having small support size then directional filters so it can be implemented more efficiently. Shearlet theory has been studied and applied gradually. Its applications in image processing were extended to image denoising [11] and edge detection [12] where it has been shown that ST allows one to exactly identify the location and the orientation of the edge. However, this broader area of research at the cross road of medical image fusion is still under exploring. ST was introduced by Miao et al. [13] in the field of image fusion and accomplished satisfying performance. Deng et al. [14] also applied ST to fuse remote sensing images but still not able to overcome the problem of shift invariance. Another extension provided by Wang et al. [15], the sift-invariant sheralet transform (SIST) which is combined with Hidden Markov Tree (HMT) to model the dependent relationship for the SIST sub-bands. Owning the property of shift invariance the non-subsampled shearlet transform combined with neural networks was conducted by Kong and Liu in [23]. In [16], a fusion method for the CT and MRI images were presented utilizing pulse coupled neural network in the non-subsampled shearlet transform (NSST) domain which incorporate several different combination of the shearing with the non-subsampled Laplacian pyramid transform. It has been concluded that NSST can suppress the pseudo-Gibbs phenomenon advantageously over standard shearlet. Furthermore, MGA tools have been proposed in junction with neuro-fuzzy [18, 19]. Moreover, non-subsampled contourlet transform are applied to decompose input images into low and high frequency sub-bands, and then the neuro-fuzzy is performed as a fusion rule [25]. In [20], Rajkumal et al. compared lifting wavelet transform and neuro-fuzzy with only iterative neuro-fuzzy and concluded the superiority of the second approach.

3 Proposed Fusion Methods

The task of multimodality image fusion is to make many salient features in the new image such as regions and their boundaries. However, image registration is an important requirement applied for fusion technique. In this paper, it is assumed that the source images are registered before initiating the fusion process. In the following, we propose to decompose the CT and MRI images using the shearlet transform to obtain low and high frequency coefficient. Then, low frequency coefficients are fused by maximization of absolute value while high frequency sub band is fused based on NFIS.

3.1 Non Subsampled Shearlet Transform

In contrast to all MGA tools, the shearlet provides a unique combination of mathematical rigidness and computational efficiency when addressing edges. Proposed by K. Guo et al. [9, 11, 12, 21, 22], it is derived from the theory of wavelets. In dimension \( n = 2 \), the affine systems with composite dilation are defined as follows:

$$ A_{AS} \left(\Psi \right) = \left\{ {\Psi _{j,l,k} \left( x \right) = \left| {\det A} \right|^{j/2}\Psi \left( {S^{l} A^{j} x - k} \right);\,j,l \in {\mathbb{Z}},k \in {\mathbb{Z}}^{2} } \right\} $$
(1)

Where \( \Psi \in L^{2} \left( {{\mathbb{R}}^{2} } \right) \), A, S are both \( 2 \times 2 \) invertible matrices, and \( { \det }\left| S \right| = 1 \). The elements of this system are called composite wavelet if \( A_{AS} \left(\Psi \right) \) forms a tight frame for \( L^{2} \left( {{\mathbb{R}}^{2} } \right) \) satisfied by:

$$ \sum\limits_{j,l,k} {\left| {\left\langle {f,\Psi _{j,l,k} } \right\rangle } \right|^{2} } = \left\| {f^{2} } \right\| $$
(2)

The shearlet transform is a function of three variables: the scale j, the shear l and the translation k. Let A denote the scaling matrix and S stand for the shear matrix. For each \( a > 0 \) and \( s \in {\mathbb{R}} \),

$$ A = \left( {\begin{array}{*{20}c} a & 0 \\ 0 & {\sqrt a } \\ \end{array} } \right),\quad S = \left( {\begin{array}{*{20}c} 1 & s \\ 0 & 1 \\ \end{array} } \right) $$
(3)

The matrices described above plays an important role in the process of the shearlet transform. The former matrix A controls the scale of the shearlet by applying a fine dilation along the two axes which increasingly elongated the frequency support at fine scales. The latter matrix, which is not expensive, dominates the orientation of the shearlet. The tiling of the frequency and the size of frequency support are illustrated in ‎Fig. 1 for a particular values of a and s. The frequency support size of the shearlet for particular values of a and s is shown in ‎Fig. 2.

Fig. 1.
figure 1

The structure of frequency tiling and the size of the frequency support.

Fig. 2.
figure 2

Frequency support of shearlets \( \psi_{j,l,k} \) for different values of a and s.

In reference [11], commonly assume \( a = 4,s = 1 \), where \( A_{0} \) and \( S_{0} \) are respectively the anisotropic dilation matrix and the shear matrix. Equation (3) gives:

$$ A_{0} = \left( {\begin{array}{*{20}c} 4 & 0 \\ 0 & 2 \\ \end{array} } \right),\quad S_{0} = \left( {\begin{array}{*{20}c} 1 & 1 \\ 0 & 1 \\ \end{array} } \right) $$

For any \( \xi = \left( {\xi_{1} ,\xi_{2} } \right) \in \widehat{{\mathbb{R}}}^{2} ,\xi_{1} \ne 0 \), let \( \psi^{\left( 0 \right)} \left( \xi \right) \) be given by

$$ \hat{\psi }^{\left( 0 \right)} \left( \xi \right) = \hat{\psi }^{\left( 0 \right)} \left( {\xi_{1} ,\xi_{2} } \right) = \hat{\psi }_{1} \left( {\xi_{1} } \right)\hat{\psi }_{2} \left( {\xi_{2} /\xi_{1} } \right) $$

where \( \hat{\psi }_{1} ,\hat{\psi }_{2} \in {\text{C}}^{\infty } \left( {\widehat{{\mathbb{R}}}} \right) \) are both wavelets, and \( {\text{supp}}\,\hat{\psi }_{1} \subset \left[ { - 1/2, - 1/16} \right] \cup \left[ {1/16,1/2} \right] \), \( {\text{supp}}\,\hat{\psi }_{2} \subset \left[ { - 1,1} \right] \). In addition, assume that:

$$ \sum\limits_{j \ge 0} {\left| {\hat{\psi }_{1} \left( {2^{ - 2j} \omega } \right)} \right|^{2} } = 1\,{\text{for}}\,\left| \omega \right| \ge \frac{1}{8} $$
(4)

and, for each \( j \ge 0 \),

$$ \sum\limits_{{l = - 2^{j} }}^{{2^{j} - 1}} {\left| {\hat{\psi }_{2} \left( {2^{j} \omega - l} \right)} \right|^{2} } = 1,\quad \left| \omega \right| \le 1 $$
(5)

That is, each element \( \psi_{j,l,k} \) is supported on a pair of trapezoid of approximate scope \( 2^{2j} \times 2^{j} \) oriented along lines of slope \( l2^{ - j} \) (‎Fig. 1b). Under these assumptions (Eqs. 4 and 5), several examples of \( \hat{\psi }_{1} \) and \( \hat{\psi }_{2} \) imply that:

$$ \sum\nolimits_{j \ge 0} {\sum\limits_{{l = - 2^{j} }}^{{2^{j} - 1}} {\left| {\hat{\psi }^{\left( 0 \right)} \left( {\xi A_{0}^{ - j} B_{0}^{ - l} } \right)} \right|^{2} } } = \sum\limits_{j \ge 0} {\sum\limits_{{l = - 2^{j} }}^{{2^{j} - 1}} {\left| {\hat{\psi }_{1} \left( {2^{ - 2j} \xi_{1} } \right)} \right|^{2} \left| {\hat{\psi }_{2} \left( {2^{j} \frac{{\xi_{1} }}{{\xi_{2} }} - l} \right)} \right|^{2} } } = 1 $$
(6)

Accordingly, we can obtain discrete non-subsampled shearlet transform by sampling the shearlet on a proper discrete set. Suppose that A and B are respectively two registered CT and MRI images. Our fusion algorithm for image A and B begins with performing discrete NSST for these two images to obtain low-and-high frequency sub band coefficients of them as illustrated in Fig. 3. The image decomposition process is divided in two steps: non-subsampled pyramid (NSP) is used to accomplish multi-scale factorization by applying non-subsampled filter banks in order to satisfy shift-invariance. The decomposition leads to \( j + 1 \) sub images; one is low frequency image, the others represent the high frequency images where j denotes the number of decomposition levels. The second phase performs the multi-directional decomposition realized by the shearing filters (SF) at each scale which induces directional details information. Orientation factorization with l stages in high frequency produces \( 2^{l} \) directional sub images. The NSST decomposition process is illustrated in ‎Fig. 4 where the two basic steps are demarcated. In this work, decomposition level by NSP is \( j = 3 \) and the sub-band filter adopted is “maxflat” in a purpose to be aligned with the compared methods based on NSST [16, 26, 28] and to investigate the efficiency of neuro-fuzzy.

Fig. 3.
figure 3

Block diagram of the proposed fusion method

Fig. 4.
figure 4

Schematic diagram of multi-scale and multidirectional decomposition of NSST

3.2 Neuro-Fuzzy Inference System Based Image Fusion

Properties like brightness and edges have fuzzy effects on images due to the non-uniform illumination and inherent image vagueness [24]. NFIS is a feed-forward neural network system in which neural nets are used to tune the membership functions of fuzzy sets that operate as a decision making system [19]. The main concept of fuzzy logic lies in fact that fuzzy sets are defined by a membership function which associate a membership degree for each element of those sets. The hybrid technique is performed in three steps; first, the membership function and fuzzy rules are defined and adaptive neuor-fuzzy inference system (ANFIS) is generated from the training data using “genfis1” function which provides initial conditions for the training. The second step consists of routine training for Sugeno-type fuzzy inference system using “anfis” function in a regard to identify the membership parameters. Finally, the total output is calculated. The process learning of NFIS and its structure are illustrated in ‎Fig. 5. Three layers are involved; the first calculates the input membership degree, the second, calculates the pertinence degree of each rule and the final adds up the output of NFIS.

Fig. 5.
figure 5

The schematic framework of NFIS

Low and high frequency fusion rules.

Low frequency coefficients of the fused image are conventionally given by the averaging method. However, this technique is only able to contribute with low contrast result [4]. To overcome this deficiency, low frequency sub-bands of input images are chosen to be fused using the maximum of the absolute value to preserve more contrast. Thus, \( LF_{A,B} \left( {i,j} \right) \) denotes the low frequency coefficients located at \( \left( {i,j} \right) \) of image A or B, the fused low sub-band is given as follows:

$$ LF_{F} \left( {i,j} \right) = \left\{ {\begin{array}{*{20}c} {LF_{A} \left( {i,j} \right)\left| {LF_{A} \left( {i,j} \right)} \right| \ge \left| {LF_{B} \left( {i,j} \right)} \right|} \\ {LF_{B} \left( {i,j} \right)\left| {LF_{A} \left( {i,j} \right)} \right| < \left| {LF_{B} \left( {i,j} \right)} \right|} \\ \end{array} } \right. $$
(6)

On the other hand, high frequency coefficients are fused based on neuro-fuzzy approach. At each decomposition level and for each sub-image obtained by the shearing filter, NFIS is performed in a goal to fuse the trained inputs. The fusion process is described in the following.

Algorithm. The proposed medical image fusion method as illustrated in ‎Fig. 3 pursue the following steps:

  • Step 1: Pre-registered CT and MR image are decomposed by NSST to obtain low and high frequency coefficients.

  • Step 2: Low frequency coefficients of the source images are fused by the greatest of the absolute value methods (Eq. 6).

  • Step 3: High frequency coefficients of each decomposition level and each sub-image are fused based on NFIS as follows:

    • Step 3.1: Form a training data in column and reshaping the input sub-images in column form to get the check data.

    • Step 3.2: Generate fuzzy inference system (FIS) structure from training data, number and type of membership function using “genfis1” command.

    • Step 3.3: Training process is performed by applying “anfis” command involving the generated FIS and the training data. Finally, the fuzzy inference calculation is performed.

  • Step 4: Apply the inverse NSST on the fused coefficients to get the fused medical image.

4 Experimental Results and Comparisons

In this section, several illustrative experiments are conducted in order to assess the effectiveness of our proposed methods. The implementation is handled in Matlab R2013a on a PC with 2.39 GHz Core 2 Duo processor and with 2 GB of memory. The proposed fusion method is evaluated on different datasets each includes pre-registered CT and MRI images of the same person and the same part of the body. Furthermore, obtained results are compared quantitatively and qualitatively with other existing methods of the literature according to several performance measures.

4.1 Evaluation Criterion

Visual perception is most of time subjective when providing instinctive comparisons of the fused images due to eyesight level and mental state. As a consequence, several evaluation metrics should be applied in order to provide an objective assessment. These criterions are of two types; metrics based on single image and the others integrating both source and fused images.

Entropy (En).

Entropy measures the amount of information available in both source and fused images each apart. The larger is the entropy of the fused image denotes the presence of more abundant information. It is defined as follows:

$$ En = - \sum\limits_{i = 0}^{l - 1} {p\left( i \right)log_{2} p\left( i \right)} $$
(7)

Where \( p\left( i \right) \) indicates the probability of pixels gray level with the range \( \left[ {0, \ldots ,l - 1} \right] \).

Standard deviation (STD).

STD reflects the contrast of a single image. An image with high standard deviation will have high contrast. The degree of deviation between pixels gray level of an image \( I\left( {i,j} \right) \) whose size is \( M \times N \) and the average is expressed by:

$$ STD = \sqrt {\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {\frac{{\left[ {I\left( {i,j} \right) - \left( {\left( {1/(M \times N} \right)\sum\nolimits_{i = 1}^{M} {\sum\nolimits_{j = 1}^{N} {I\left( {i,j} \right)} } } \right)} \right]^{2} }}{M \times N}} } } $$
(8)

Spatial frequency (SF).

Spatial frequency (SF) [16] reflects the level of clarity and returns the whole activity of an image. Hence, the larger is the SF the higher is the resolution. It is calculated trough row and column frequency and defined as:

$$ SF = \sqrt {RF^{2} + CF^{2} } $$
(9)

Where RF is row frequency and CF is column frequency both defined by Eqs. 10 and 11 where a and b denotes the image size and \( I\left( {i,j} \right) \) gives the gray level of the fused image.

$$ RF = \sqrt {\frac{1}{{a\left( {b - 1} \right)}}\sum\limits_{i = 1}^{a} {\sum\limits_{j = 2}^{b} {\left( {I\left( {i,j - 1} \right) - I\left( {i,j} \right)} \right)^{2} } } } $$
(10)
$$ CF = \sqrt {\frac{1}{{\left( {a - 1} \right)b}}\sum\limits_{i = 2}^{a} {\sum\limits_{j = 1}^{b} {\left( {I\left( {i,j} \right) - I\left( {i - 1,j} \right)} \right)^{2} } } } $$
(11)

Structural similarity index (SSIM).

SSIM [29] is a perceptual metric that quantifies image quality degradation. It expresses the similarity between the reference and the fused image and it values is in \( \left[ { - 1,1} \right] \). So that large value means similarity between source and fused images and the value 1 indicates the identical between two images. It is defined as:

$$ SSIM\left( {F,I} \right) = \frac{{\left( {\left( {2\mu_{F} \mu_{I} + C_{1} } \right) \times \left( {2\sigma_{FI} + C_{2} } \right)} \right)}}{{\left( {\left( {\mu_{F}^{2} + \mu_{I}^{2} + C_{1} } \right) \times \left( {\sigma_{F}^{2} + \sigma_{I}^{2} + C_{2} } \right)} \right)}} $$
(12)

Where F is the fused image, I is the input image, \( \mu_{F} \) and \( \mu_{I} \) are respectively the mean intensity of image F and I, \( \sigma_{F} \) and \( \sigma_{I} \) denotes the variance of image F and I, \( \sigma_{FI} \) calculates the covariance of F and I and finally, \( C_{1} \) and \( C_{2} \) are constants.

Peak signal to noise ratio (PSNR).

PSNR is given in dB value for quality judgment and it reflects the level of noise restraint. Better fused image quality is related to the higher value of PSNR which means little difference between input and fused image and less distortion. It is expressed by:

$$ PSNR = 10 \times { \log }_{10} \left( {255^{2} /MSE} \right) $$
(13)

Mutual information (MI).

MI indicates how much information that input image brings to the fused image. Its value increases with increasing of details and texture information in the fused result. Given two input images \( X_{A} ,X_{B} \) and a fused image \( X_{F} \) It is defined as [16]:

$$ MI = I\left( {X_{A} ;X_{F} } \right) + I\left( {X_{B} ;X_{F} } \right) $$
(14)

Where,

$$ I\left( {X_{R} ;X_{F} } \right) = \sum\limits_{u = 1}^{L} {\sum\limits_{v = 1}^{L} {h_{R,F} \left( {u,v} \right)log_{2} \frac{{h_{R,F} \left( {u,v} \right)}}{{h_{R} \left( u \right)h_{F} \left( v \right)}}} } $$
(15)

R denotes a reference image and F a fused image, where \( h_{R,F} \left( {u,v} \right) \) is the joint gray level histogram of \( X_{R} \) and \( X_{F} \). \( h_{R} \left( u \right),h_{F} \left( v \right) \) are the normalized gray level histogram of \( X_{R} \) and \( X_{F} \) respectively.

Image quality index (IQI).

IQI reflects the quality of the fused image. Its dynamic range is \( \left[ { - ,1} \right] \) and IQI is higher closer to unit signifies the better quality of the fused result. IQI is defined as:

$$ IQI = \left( {\frac{{\sigma_{FR} }}{{\sigma_{F} \sigma_{R} }}} \right).\left( {\frac{{2\mu_{F} \mu_{R} }}{{\mu_{F}^{2} + \mu_{R}^{2} }}} \right).\left( {\frac{{2\sigma_{F} \sigma_{R} }}{{\sigma_{F}^{2} + \sigma_{R}^{2} }}} \right) $$
(16)

Where \( \mu_{F} , \mu_{R} \) are the means and \( \sigma_{F} ,\sigma_{R} \) are the variances of fused and source images, respectively. Since two source images A and B are contributing in the fusion process, so the total IQI value is given by the mean:

$$ IQI = \frac{{IQI\left( {A,F} \right) + IQI\left( {B,F} \right)}}{2} $$
(17)

4.2 Results and Discussion

Experiments are carried out on different datasets including CT and MR images in order to compare the proposed approach with several existing methods. It is obvious that CT images discriminate soft tissues information and show bone structures where the MR images provide the soft tissue information and lacks in boundary information. In the following, experiments are conducted on different datasets and results will be discussed quantitatively and qualitatively based on performance metrics described above and also on visual perception.

Experiment 1.

Visual and quantitative results of three methods dealing with CT/MR image fusion were compared with the proposed method. Iterative Neuro-Fuzzy Approach (INFA), Discrete Wavelet Transform (DWT) based approach and Lifting Wavelet Transform combined with Neuro-Fuzzy Approach (LWT-NFA) [20] are analyzed and compared subjectively and objectively based on EN and SSIM. Performance results are listed in ‎Tables 1 and ‎2. Comparative analysis is carried on six different pairs of pre-registered CT (Fig. 6. A1-F1) and MR (Fig. 6. A2-F2) images (256 × 256). Their resultant fusion images are shown in Fig. 6. From the visual analysis of the fused results, it can be observed that our method preserve successfully both soft tissue information provided by MR images and bony structures given by CT images with better resolution compared with the aforementioned methods.

Table 1. Entropy performance of different approaches.
Table 2. SSIM performance of different approaches.
Fig. 6.
figure 6

Comparative visual results of different methods applied to input CT (A1 - F1) and MR (A2 - F2) images. Rest of rows illustrates fusion results provided by INFA method (A3 - F3), DWT method (A4 - F4), LWT-NFA method (A5 - F5) and the proposed method (A6 - F6).

As mentioned above, besides the visual comparison, an assessment of quantitative results based on evaluation criterion (EN and SSIM) demonstrates the outcomes of the proposed fusion method. ‎Table 1 shows the entropy results where it can be concluded that the proposed method gives the highest performance than others which means that more information lies in fused image given by our algorithm. ‎Table 2 exposes the highest values of SSIM produced by the proposed method over different methods. It reveals that our algorithm produces less quality degradation of the resultant image which means the better similarity between source and fused image.

Experiment 2.

To further evaluate and compare the performance of the proposed methods with the adjacent literature, we propose to process another pre-registered image set (‎Fig. 7) already applied by several methods. The fusion results are compared with neuro-fuzzy based fusion method in the non-subsampled contourlet domain [18] (NSCT-NF), neuro-fuzzy approach [25] (INF), non-subsampled shearlet transform and spiking neural network [16] (NSST-NN), pulse coupled neural network in the non-subsampled shearlet domain [26] and finally, shearlet transform based fusion approach [13]. Objective evaluation of different results is tabulated in ‎Table 3. While visual results are demonstrated in ‎Fig. 7.

Fig. 7.
figure 7

Fusion results of different methods applied to input CT and MR images. Rest of rows illustrates fusion results provided by NSCT-NF, NFA, NSST-NN-PCNN, ST and the proposed scheme.

Table 3. Comparative performance of different methods for the dataset shown in Fig. 7.

On the basis of visual results given by different scheme and illustrated in ‎Fig. 7, it can be recognized that the proposed fusion algorithm produces fused images with competitive quality and containing both soft tissue and dense tissue information derived from source images. Further, edges information is recuperated in resultant image with good contrast. Additionally, objective evaluation performance listed in ‎Table 3 shows greatest entropy produced by the proposed scheme means that more information is preserved. The standard deviation value is competitive with other methods reflecting a good contrast compared to others as assessed by visual perception. Mutual information and spatial frequency values are not the better compared to the rest of scheme due to the training of input data but still higher than neuro-fuzzy in the non-subsampled contourlet domain. IQI provided by our algorithm is greatest compared to other schemes depicting the better similarity between reference and fused images. Finally, PSNR and SSIM are comparatively better. Thus, the new method decreases noise than others and the corresponding fusion results are similar to references with less distortion. It can also be concluded that pseudo-Gibbs phenomenon is suppressed through shift invariant shearlet transform.

Time cost is also paid attention in this work. It has been summarized that NSST is lower time consuming than NSCT [16]. Moreover, fusion rules based on PCNN are time consuming due to the learning process compared to the average or maximum methods but still closer to neuro-fuzzy fusion rule.

5 Conclusion and Perspectives

Equipped with a rich mathematical structure, shearlet transform is an MGA tool that possesses anisotropy, directionality and shift invariance. In this work, we have exposed a multimodal medical image fusion method based on non-subsampled shearlet transform and neuro-fuzzy. Thus, low frequency sub-bands are fused by maximization of absolute value while high frequency fusion rule is based on Neuro-Fuzzy Inference System. Experiments carried on different CT and MR pre-registered datasets reveals the effectiveness of the proposed method. Based on visual perception, we can notice that the fused images produced by the proposed scheme are rich of information details that belong both to soft tissues and bones with good contrast. Objective evaluation demonstrates that the fusion results provided by the proposed method contain more details and less distortion and noise. Subsequently, the main advantage of the shift invariant shearlet transform over standard shearlet is covered which is the elimination of the pseudo-Gibbs phenomenon.

Additional outcomes are attempted in future in order to further optimize and enhance the performance of our method. Fusion rules for low and high frequency will be addressed with more attention and hybrid intelligence will be paid more consideration. Future works will investigate the deep learning in medical image fusion where different modalities will be integrated in the experimental protocol.