1 Introduction

Image filtering is one of the most studied problems in image processing and probably one of the most used techniques in image processing, computer vision and related areas. In the literature we can find two broad categories of image filters: linear and non linear. More recently, non local methods attracted the attention of researchers in the area. In fact, several of the state of the art algorithms are both non local and non linear [6]. Among several image filtering applications, image denoising stands as one of the most relevant. The main goal of denoising is to remove undesired components from the image. Here we assume an additive noise model where the observed image I is the result of adding a random noise N to the ideal noiseless image \(I_o\): \(I = I_o+n\).

The main goal of image denoising is to estimate \(I_o\) while preserving its edges and details. Usually there is a tradeoff between noise reduction and detail preservation. Non linear and non local methods were a good step forward with respect to linear and local filters. In [2] Buades, Morel and Coll introduced the Non Local Means image filter (NLM) which opened a whole area of research of non linear and non local filtering methods. The underlying idea of this method is to estimate, \(I_{o_i}\) (image value at pixel i), using a weighted average of all pixels in the observed image I.

In this work we introduce a modified version of NLM that better selects the pixels to be used in weighted averaging. Instead of using all pixels from the image, or restrict to points in a local search window, we propose to cluster points using local information and average only corresponding points. As we will see later, similarity is measured using patches of pixels around the pixel of interest. Also we will borrow ideas from [4] where instead of clustering a random sample of patches was proposed. In order to improve the results we will add spatial coherence to the process and show that this is a key element to produce competitive results.

2 Review of Non Local Means Image Filter

The NLM filter, as presented in [2], estimates the filtered version of the image using a weighted average of pixels in a neighboring region, \(\mathcal{N}_i\), of the pixel to be filtered:

$$\begin{aligned} \hat{I}_i = \frac{1}{W_i}\sum _{j \in {\mathcal{N}_i}} w_{ij} I_j. \end{aligned}$$
(1)

The weights \(w_{ij}\) express the similarity of pixels i and j and \(W_i\) is a normalization term. One of the key factors of NLM is that this similarity is based on the distance between patches. A patch is a square window of size \((2K+1)\times (2K+1)\) centered at the pixel of interest. If \(p_i\) and \(p_j\) are patches at pixels i and j respectively, the similarity weight between them is defined as:

$$\begin{aligned} w_{ij}=\text{ exp }(-||p_i-p_j||^2/\sigma ^2). \end{aligned}$$
(2)

This idea can be easily extended to average all pixels in the image and make the filter truly non local. Although this extension looks very attractive, it has some problems. The first one is obviously its computational cost which has been addressed in the literature (If the image has N pixels and the patches are of size \((2K+1)\times (2K+1)\) the computational cost is \(O(N^2(2K+1)^2)\)). Assuming that the computational cost is not a critical issue, the second weakness of the extension is that it does not improve the results in terms of MSE (Mean Square Error) or PSNR (Peak Signal to Noise Ratio) [1, 2]. The intuition behind this problem is that weights not always discriminate non corresponding patches. If the number of non corresponding patches grows as the number of patches increases the filtered version deviates from the expected value. In the literature this problem has been addressed by several authors with several techniques that basically intend to average only patches belonging to the same class [5]. In Sect. 4.1 we will come back to this discussion when introducing clusters into NLM.

3 NLM and Random Sampling

In [4] Chan, Zickler and Lu proposed an interesting approach to reduce the computation cost of the NLM filter by randomly selecting a small number of patches from the whole set of patches of the image. So, in Eq. (1) instead of summing for all patches the sum considers only the subset of sampled patches (no other modifications are needed). Together with experimental results the authors present a theoretical analysis to support their proposal that was further studyed in [3]. The results of this method are analyzed in [8] where the authors show that using the same number of patches as the classical NLM filter (the number of points inside \(\mathcal{N}_i\)) the performance in terms of PSNR or MSE is worse than the classical approach. That is, at the same computation cost, worst denoising performance.

Random sampling is the best sampling option in order to reduce computational cost? To reduce the computational cost and at the same time select the best candidates we need to guide the sampling; a random sampling is not the best option. To guide the sampling we propose to cluster all patches and perform a sampling only within each cluster. In this way we accomplish the computational cost reduction while selecting only similar patches. For each pixel i with patch \(p_i\) belonging to cluster \(C_{k_i}\) we randomly sample patches inside this cluster. In the next section we present the details of this approach.

4 Non Local Means over Clusters of Patches

Here we present a modified version of the NLM filter that works over clusters of patches. The main idea is as follows: instead of arbitrary average patches across the whole image, to aggregate information only of close patches. For that end we cluster the patches of the image into \(N_c\) clusters and apply NLM inside each cluster. In the remainder of this section first we discuss the adaptation of the NLM filter and clustering method.

4.1 Non Local Means Using Clusters

Assume we cluster the set of patches \(X = \{p_1,...,p_N\}\) into \(N_c\) clusters. Let \(C_k\) be the cluster k, the NLM filter can be expanded as:

$$ \hat{I}_i = \frac{1}{W_i}\sum _j w_{ij} I_j = \frac{1}{W_i}\sum _{k=1}^{N_c} \sum _{j\in C_k} w_{ij} I_j. $$

To include only similar patches in the weighted average the above equation is modified to use only one cluster. If the patch \(p_i\) around of pixel i belongs to cluster \(C_{k_i}\) the modified NLM filter equation is:

$$ \hat{I}_i = \frac{1}{W_i}\sum _{j\in C_{k_i}} w_{ij} I_j. $$

The previous equation provides a computation cost reduction via the decrease of the number of patches to average. Of course we have to remember that we need to run a clustering algorithm before filtering. In the next sub-section we describe the clustering method used in this work.

4.2 Clustering of Patches

The set of patches of the image, \(X = \{p_1,...,p_N\}\), is clustered using a standard k-means algorithm. Before applying the clustering the patches are transformed using PCA (Principal Component Analysis). In [9] the author showed that using PCA improves the results of NLM. It is also known that PCA concentrates the noise in the components with lowest eigenvalue. In this work in all cases the patches are of size \(5\times 5\) and the patches are projected into the first 12 components.

The number of clusters was determined using spectral clustering [10]. The idea in this case is to construct a graph that encodes the structure of the patches in X. The first step is to construct the pairwise similarity matrix S. The element \(s_{ij}\) of the matrix is the similarity between patches \(p_i\) and \(p_j\), \(s_{ij}=\text{ exp }(||p_i-p_j||^2/\sigma ^2)\). Then, the similarities are normalized using the total weight of incident arcs for each patch \(p_i\): \(d_i = \sum _j s_{ij}\). The Laplacian of the graph is defined as \(L = D^{-1}S\), where \(D=\text{ diag }(d_1,...,d_N)\). One of the properties of the Laplacian is that the multiplicity of the eignevalue 1 gives the number of natural clusters in the data [10]. In a real case the eigenvalues are not exactly one and thresholding techniques must be applied. In our case we estimate the number of clusters with the number of eigenvalues greater than 0.8.

Once we have the clusters, instead of using all points inside the cluster, following [4], we apply a random sampling. As we said before, the main goal of this step is to select the best possible candidates. In terms of computation cost this approach, given the clusters, it is the same as the random sampling proposed in [4]. We fix the number of sampled patches to keep the computational cost of the filtering process constant and evaluate the different algorithms using the MSE and the SSIM (Structural Similarity Index) [11]. The following section describes all the algorithms that will be evaluated; the proposed ones and the ones from the literature included for comparison purposes.

5 Proposed Methods

This section presents the implementation details of the proposed algorithms together with a review of the methods from the literature used for comparison.

Traditional Non Local Means (NLM): The traditional NLM implements the Eq. (1). The search window \(\mathcal{N}\) was set as a square window of size \(21\times 21\) and the parameter \(\sigma \) used in the weight computation in (2) was set based on an estimation of the noise variance \(\hat{\sigma }_n\): \(\sigma = \hat{\sigma }_n\). The estimation of \(\hat{\sigma }_n\), assuming Gaussian noise, can be done applying the proposal in [7].

Non Local Means with Random Sampling (NLMRS): This algorithm implements the idea presented in [4] and discussed in Sect. 3. A subset of patches is randomly sampled and all weights are computed against the selected patches. To compare all the algorithms under comparable situations we sample \(21\times 21\) patches to keep the computation cost constant. In this way, this algorithm uses the same number of patches as the NLM described in previous section.

Non Local Means with Random Sampling inside Clusters (NLMRS-C): As explained in Sect. 3 in this case we propose to apply a random sampling but restricting the samples inside each cluster. That is, given a pixel i to be processed, with patch \(p_i\) belonging to cluster \(C_{k_i}\), this pixel is filtered using only patches from cluster \(C_{k_i}\). As we explained before, to be able to compare filtering results we fix the number of patches used. Therefore, we sample \(21\times 21\) patches inside each cluster.

Non Local Means with Random Sampling inside Cluster and Spatial Neighbors (NLMRS-S): In Sect. 1 we discussed the desired characteristics of images filters. As we observed, spatial coherence is important. However, if we analyze the methods NLMRS-C and NLMC, described below, it is clear that they not enforce spatial coherence. If two neighboring pixels belong to different clusters they will be filtered with different patches and coherence cannot be guaranteed. This is a key difference between these methods and NLM; the later one working in a search window centered at the pixel being processed intrinsically includes spatial coherence.

To combine spatial coherence with patches selected via clustering and sampling we decided to combine local and patches sampled from the corresponding cluster. Let i be a pixel with corresponding patch \(p_i\) belonging to cluster \(C_{k_i}\) and \(\mathcal{N}_i\) the corresponding search window. The set of patches to be used for the filtering operation is constructed as the union of the patches corresponding to pixels in \(\mathcal{N}_i\) with patches sampled from \(C_{k_i}\). We use a local search window \(\mathcal{N}_i\) of size \(15\times 15\) and complete the remaining patches via sampling inside \(C_{k_i}\). Hence, we use \(15\times 15 = 225\) local patches and \(21\times 21 - 15\times 15 = 216\) sampled from the cluster trying to have a 50/50 relationship between both sets.

Non Local Means with Clusters (NLMC): This algorithm is a modification of the NLM that instead of using all pixels in the image, as explained in Sect. 1, uses only patches inside each cluster. For a given patch \(p_i\) centered at pixel i and belonging to cluster \(C_{k_i}\) it will only use corresponding patches inside \(C_{k_i}\). This algorithm will be used for comparison purposes but will not be compared directly to NLM, NLMRS-C and NLMRS-S since it uses a different number of patches to filter each pixel.

6 Results

This section summarizes the results obtained in terms of MSE and SSIM for the algorithms detailed in previous section. We tested the algorithms over a set of six well known images in two different additive gaussian noise configurations. The following tables present the results for two noise levels, \(\sigma _n=10\) and \(\sigma _n=20\). For each image we present the MSE and SSIM for each method. The central columns contain the methods that can be directly compared while the left and rightmost columns are included for comparison purposes with the works from the literature and used as references in this work. Bold numbers highlight the best results among the algorithms under evaluation (NLMC and NLMRS are not considered).

This experiments confirm that adding spatial coherence plays an important role in the denoising quality (both objectively observing MSE and subjectively as expressed in the SSIM). Looking at the MSE NLMRS-S outperforms NLM in 4 out of the 6 images tested for both noise configurations. Barbara and House images are the two cases where NLM performs better than NLMRS-S. These images contain periodic textures that are better denoised using local patches. Since NLMRS-S combines \(50\%\) of local patches with \(50\%\) of patches sampled from the cluster it ends up using less highly similar patches for the filtering. Similar results are encountered when observing the SSIM. For \(\sigma _n=10\) we observe that NLMRS-S produces the best SSIM scores in all images. We can see that this is a promising approach to improve traditional NLM filter without increasing its computational cost. In our approach we only added a clustering stage before applying the filter. This additional step generates improvements in terms of MSE and SSIM. For future work, it would be of interest to further analyze image content to detect periodic structures to switch between NLMRS-S and NLM.

Additionally, when comparing the results NLMRS against NLMC and NLMRS-S, we confirm that better results can be obtained if some guidance is added during the sampling process. As we said before, complete random sampling is not a good strategy. Furthermore, if we compare NLMC with NLMRS-S we conclude that spatial coherence is needed to improve denoising results. Even though NLMC uses more patches to filter each pixel this does not directly translate into better MSE and SSIM scores. Only in two cases for \(\sigma _n=10\) NLMC outperforms NLMRS-S in terms of MSE (for SSIM there is no improvement) (Table 1).

In Fig. 1 we depict the results for the image Cameran for the case \(\sigma _n=10\). The clustering clearly show the different regions of the mage in terms of local patch configuration. Note how NLMRS-S preserves in a better way the details of the image (see the grass) and NLM generates smoother results (see the sky) (Table 2).

Table 1. MSE and SSIM results for \(\sigma _n=20\).
Table 2. MSE and SSIM results for \(\sigma _n=10\).
Fig. 1.
figure 1

Results for \(\sigma _n=10\). From left-top to bottom right: original cameraman image, visualization of patch clusters, NLM result and NLMRS-S result. Note that NLMRS-S preserves in a better way the details of the image (see the grass) and NLM generates smoother results (see the sky).

7 Conclusions

We proposed a modified version of NLM using clustering and spatial regularization that provides good results in terms of MSE and SSIM. The proposed method outperformed classical NLM in a dataset of standard images. These results allowed us to confirm that random sampling as proposed in [4] can be improved guiding the sampling using clustering. We also confirmed that spatial coherence is critical, as expected, in denoising algorithms. We managed to balance spatial coherence with non local patches obtained via clustering and sampling. We believe this opens an interesting are of analysis for NLM and similar methods.