Keywords

1 Introduction

Edge-aware image processing technique is broadly studied for smoothing images without destroying different levels of structures. It is wildly applied for computer graphics community. Edge-preserving filters can be broadly divided into two broad categories: average based approaches and optimization based approaches.

The methods of first class smooth images by taking a weighted average of nearby pixels, where the weights depend on the intensity/color difference. Average based filters include bilateral filter [17], nonlocal means filter [1], guided image filter [8] and rolling guidance filter [23]. They often use guidance image to define the similarity between pixels. The main drawback of these filters is that they will produce the halo effect near the edge.

The total variation (TV) model [15], \(L_{0}\) gradient minimization filter (\(L_{0}\) filter) [20], weighted least squares (WLS) [3] and curvature filter [7] belong to the optimization based methods. These approaches smooth images by optimizing objective functions containing terms defined in \(L_{p}\) norm (\(p = 0,1,2\)). Although the optimization based methods can avoid the halo effect along salient edges and often generate high quality results, it does not have the property of joint filtering with reference image, and this shortcoming limits their applications.

Recently, the nonlocal framework has been extensively studied by many scholars as a regularization term to overcome the staircase effect and obtain better performance. Gilboa and Osher defined a variational functional based nonlocal TV operators [6]. Zhang et al. [24] proposed a fast split Bregman iteration for this nonlocal TV minimization. Lately, the nonlocal regularizations are extended to process more general inverse problems in [12]. However, they penalize large nonlocal gradient magnitudes, and it possibly influence contrast during smoothing.

In summary, most image smooth models aim to preserve edges from noise and textures, and each of them has its limitations. In this work, we present a new edge-preserving filter based on an optimization framework, which incorporates the nonlocal strategy into the \(L_{0}\) gradient minimization model and takes advantage of both variational models and spatial filters. This notion leads to an unconventional global optimization process involving discrete metrics, whose solution is able to manipulate the edges in a variety of ways depending on the saliency.

The proposed framework is general and can be used for several applications. Different from other optimization based methods, the proposed algorithm can use the reference image for joint filtering.

The depth images captured by 3D scanning devices such as ToF camera or Kinect camera may be highly degraded, which have limited resolution and low quality. As a result, it’s hard to recover high quality depth maps from single depth image. Fortunately, the depth map is often coupled by high resolution (HR) color image which shows the same scene and they have strong structural similarities [4, 19, 22]. In recently, deep learn based depth upsampling methods [5, 9, 11, 16] achieve well results. These methods produce the end-to-end upsampling networks, which learn high-resolution features in the intensity image and supplement the low-resolution depth structures in the depth map.

So this paper applies the proposed filter for depth image super resolution and treats the natural image as the reference image. With the guidance of the high-resolution RGB image, the proposed algorithm is well suited for upsampling the low-resolution depth image and it can not only reduce noises, but also preserve the sharp edges during super resolution. With simulations, the experimental results demonstrate that the proposed approach is promising, and it does significantly improve the visual quality of the low-resolution depth image compared with the existing upsampling methods.

2 Non-local \(L_{0}\) Gradient Minimization

Different from the definition of gradient, the nonlocal gradient \(\nabla _{\omega } S_{p}\) of each pixel p on the image S is defined as follows:

$$\begin{aligned} \nabla _{\omega } S_{p} = \{S_{p}(q) , \forall q \in \varOmega _{p}\} \end{aligned}$$
(1)

where

$$\begin{aligned} S_{p}(q) = (S_{q} - S_{p})\sqrt{\omega (p,q)} \end{aligned}$$
(2)

and \(S_{p}(q)\) is the vector element corresponding to \(q, \omega (p, q)\) is the weight function, which is assumed to be nonnegative and symmetric, it measures the similarity features between two patches (the size is \(m\times m\)) centered at the pixels p and \(q, \varOmega _{p}\) is a search window centered at the pixel p (the size of \(\varOmega _{p}\) is \(n\times n\)) [12, 24].

The weight function \(\omega (p,q)\) in \(\varOmega _{p}\) has the form:

$$\begin{aligned} \omega (p, q) = \frac{1}{C_{p}}\exp (-\frac{(G_{a}*\mid J(p+\cdot ) - J(q+\cdot ) \mid ^{2})(0)}{h^{2}}) \end{aligned}$$
(3)

and the normalizing factor \(C_{p}\) is

$$\begin{aligned} C_{p} = \sum _{q \in \varOmega _{p}}\exp (-\frac{(G_{a}*\mid J(p+\cdot ) - J(q+\cdot ) \mid ^{2})(0)}{h^{2}}) \end{aligned}$$
(4)

where \(G_{a}\) is the Gaussian kernel with standard deviation ah is a smoothing parameter, and J is a reference image which can be chosen according to different applications.

In this work, we denote the input image and filtered image as I and S, respectively. Our nonlocal gradient measure is written as

$$\begin{aligned} C(S) = \sharp \{p \mid \parallel \nabla _{\omega } S_{p}\parallel _{1} \ne 0 \} \end{aligned}$$
(5)

It counts p whose magnitude

$$\sum _{q\in \varOmega _{p}} \mid (S_{q} - S_{p})\sqrt{\omega (p,q)}\mid $$

is not zero. Based on this definition, we can estimate S by solving:

$$\begin{aligned} \min _{S} \{\parallel S - I \parallel _{2}^{2} + \lambda C(S)\} \end{aligned}$$
(6)

The first term constrains image structure similarity.

It is a discrete counting metric involved in Eq. (6). These two terms describe the pixel-wise difference and global discontinuity respectively, it is commonly regarded as computationally intractable. In this work, we introduce an auxiliary variable based on the half-quadratic splitting method, which can expand the original terms and update them iteratively. This approach leads to an alternating optimization strategy.

Due to the discrete nature, our method contains new subproblems, and it is different from other \(L_{0}\)-norm regularized optimization problems. Although the proposed method can only approximate the solution of Eq. (6), but it can make the original problem easier to handle and inherit the property to maintain salient structures [20].

The auxiliary variables \(\mathbf d _{p}\) are introduced, and they are corresponding to \(\nabla _{\omega } S_{p}\). We can rewrite the cost function as

$$\begin{aligned} \min _{S,\mathbf d } \{ \sum _{p} ( S_{p}-I_{p})^{2} + \lambda C(\mathbf d _{p}) + \beta \parallel \mathbf d _{p} - \nabla _{\omega } S_{p} \parallel _{2}^{2} \} \end{aligned}$$
(7)

where \(C(\mathbf d ) = \sharp \{p \mid \parallel \mathbf d _{p} \parallel _{1} \ne 0 \}\), and \(\beta \) is a an automatically adapting controlling parameter.

Our split variables approaches motivate us to propose this iterative method. In practice, a good result can be obtained by solving the following two subproblems iteratively.

Subproblem 1: computing S

$$\begin{aligned} S = \arg \min _{S} \{ \sum _{p} ( S_{p}-I_{p})^{2} + \beta \parallel \mathbf d _{p} - \nabla _{\omega } S_{p} \parallel _{2}^{2} \} \end{aligned}$$
(8)

Now, the subproblem for S consists in solving the linear equations

$$\begin{aligned} (S -I) - \beta div_{\omega }(\nabla _{\omega }S - \mathbf d ) = 0 \end{aligned}$$
(9)

which provides

$$\begin{aligned} S = (1-\beta \varDelta _{\omega })^{-1}(I - \beta div_{\omega }{} \mathbf d ) \end{aligned}$$
(10)

Here, \(div_{\omega }{} \mathbf d \) is defined as the divergence of \(\mathbf d \), and its discretization at p can be written as

$$\begin{aligned} div_{\omega }{} \mathbf d _{p} = \sum _{q\in \varOmega _{p}}(\mathbf d _{p}(q) - \mathbf d _{q}(p))\sqrt{\omega (p,q)} \end{aligned}$$
(11)

The non-local Laplacian \(\varDelta _{\omega }\) is defined as

$$\begin{aligned} \varDelta _{\omega } S = div_{\omega }\nabla _{\omega }S = \sum _{q\in \varOmega _{p}}(S_{q} - S_{p})\omega (p,q) \end{aligned}$$
(12)

Since the non-local Laplacian is negative semi definite, the operator \(1-\varDelta _{\omega }\) is diagonally dominant. Therefore we can solve S by a Gauss-Seidel algorithm.

Subproblem 2: computing \(\mathbf d \)

$$\begin{aligned} \mathbf d = \arg \min _\mathbf{d } \{ \sum _{p}\parallel \mathbf d _{p} - \nabla _{\omega } S_{p}\parallel _{2}^{2} + \frac{\lambda }{\beta } C(\mathbf d _{p}) \} \end{aligned}$$
(13)

This subproblem can be solved efficiently because the Eq. (13) can be spatially decomposed where \(\mathbf d _{p}\) are estimated individually. It is the main benefit of the proposed scheme, which makes the altered problem empirically solvable. Equation (13) is accordingly decomposed to:

$$\begin{aligned} E_{p} = \parallel \mathbf d _{p} - \nabla _{\omega } S_{p}\parallel _{2}^{2} + \frac{\lambda }{\beta } H(\mathbf d _{p}) \end{aligned}$$
(14)

where \(H(\mathbf d _{p})\) is a binary function returning 1 if \(\parallel \mathbf d _{p} \parallel _{1} \ne 0\) and 0 otherwise.

Equation (14) reaches its minimum \(E^{*}_{p}\) under the condition

$$\begin{aligned} \mathbf d _{p} = \left\{ \begin{array}{ll} \mathbf 0 , \qquad \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2} \le \lambda /\beta \\ \nabla _{\omega } S_{p}, \quad otherwise \end{array}\right. \end{aligned}$$
(15)

Proof:

(1) When \(\lambda /\beta \ge \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2}\), non-zero \(\mathbf d _{p}\) yields

$$\begin{aligned} E(p)= & {} \parallel \mathbf d _{p} - \nabla _{\omega } S_{p}\parallel _{2}^{2} + \lambda /\beta \end{aligned}$$
(16)
$$\begin{aligned}\ge & {} \lambda /\beta \end{aligned}$$
(17)
$$\begin{aligned}\ge & {} \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2} \end{aligned}$$
(18)

Note that \(\mathbf d _{p} = \mathbf 0 \) leads to

$$\begin{aligned} Ep= \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2} \end{aligned}$$
(19)

Comparing Eq. (16), the minimum energy \(E_{p} = \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2} \) is produced when \(\mathbf d _{p} = \mathbf 0 \).

(2) When \(\lambda /\beta < \parallel \nabla _{\omega } S_{p} \parallel ^{2}_{2}\), Eq. (19) still holds. But when \( \mathbf d _{p} = \nabla _{\omega } S_{p}, E_{p}\) has its minimum value \( \lambda /\beta \). Comparing these two values, the minimum energy \(E_{p}\) is produced when \( \mathbf d _{p} = \nabla _{\omega } S_{p}\).

Parameter \(\beta \) is automatically adapted in iterations starting from a small value, it is multiplied by 2 each time. This scheme is effective to speed up convergence [20].

Fig. 1.
figure 1

Visual quality comparison of image smoothing. (a) original image, (b) the result of \(L_{0}\) filter, (c) the result of NLTV, (d) the result of our method.

Fig. 2.
figure 2

Visual quality comparison of flash/no flash denoising. (a) original flash/no flash image pair, (b) the result of JBF, (c) the result of NLTV, (d) the result of our method.

Continuous nonlocal gradient \(L_{1}\) norm was enforced in nonlocal total variation (NLTV) [24] smoothing to suppress noise. In our method, strong smoothing inevitably curtails originally salient edges to penalize their magnitudes. In this framework, large nonlocal gradient magnitudes are allowed by nature with our discrete counting measure.

In Fig. 1, we show a natural image smoothing example compared with other competitive algorithms. One can see that \(L_{0}\) filter [20] (\(\lambda = 0.035\)) generates a sharp but not completely smooth image which is shown in Fig. 1(b). Many details are still retained after filtering, such as flower diameter and butterfly, it is not good enough for applications. The result obtained by NLTV [24] (\(\lambda = 0.05\)) is shown in Fig. 1(c), in the case of overall non-local gradients with small energies, the edges are not sharp, which makes them difficult to distinguish low contrast details around. In Fig. 1(d), our result (\(\lambda = 0.05\)) contains the most significant structures, which are slightly sharper as the nonlocal gradient energy increases.

Our alternating minimization method is described in Algorithm 1.

figure a

In [14], Petschnigg et al. proposed to denoise a no-flash image with its flash version as the reference image. In Fig. 2, we show a comparison of using the joint bilateral filter (JBF) [14], NLTV [24] and our method. Although JBF works well, from the Fig. 2(b), one can find that the gradient inversion artifacts are significant near some edges. And NLTV does not obtain a satisfactory result. Our result is sharper and contains few noise, which is shown in Fig. 2(d).

3 Depth Image Upsampling

In this application, we upscale a single depth image d (size of \(m \times n\)) which is guided by a high-resolution natural image T (size of \(M \times N\)). One can see that depth images are textureless compared with natural images and have quite sparse gradients. However, according to the statistics of depth image gradient [21], the sparse gradient assumption is not accurate enough. That is to say, most gradient values of depth image are not always 0 but rather very small.

Table 1. PSNR (in dB) comparison on middlebury 2007 datasets with added noise for magnification factors (\({\times }4, {\times }8\)).
Table 2. SSIM comparison on middlebury 2007 datasets with added noise for magnification factors (\(\times 4, \times 8\)).

The proposed nonlocal \(L_{0}\) gradient regularization can reduce the penalty for small elements, because we deal with the nonlocal gradient of the image as a whole, take into account the energy sum of the multi-directional weighted gradients, and avoid to obtain an overly smooth result.

In the first step, we upsample the depth image d to the size of \(M \times N\) with nearest neighbor interpolation, and obtain an initial image D. In the second step, we compute the weights \(\omega (p,q)\) with the high-resolution natural image T in Eqs. (3) and (4), that is to say, T is used as the reference image. In the last step, we use D as the input image and solve the minimization problem Eq. (7). The result of Eq. (7) is the final joint upsampling image.

We show some experimental evaluations of our algorithm compared with the competitive methods for depth image upsampling. We work on 3 depth images from Middlebury 2007 datasets [4] with the scaling factors of 4 and 8, respectively. To simulate the acquisition process, these depth images are added Gaussian noise [13].

Fig. 3.
figure 3

Joint upsampling on “Art” image. (a) high-resolution RGB image, (b) original depth map, (c) low-resolution and noisy depth image (enlarged using nearest neighbor upsampling), (d) He et al. [8], (e) Park et al. [13], (f) Chan et al. [2], (g) SRF [10], (h) our method.

Fig. 4.
figure 4

Joint upsampling on “Books” image. (a) high-resolution RGB image, (b) original depth map, (c) low-resolution and noisy depth image (enlarged using nearest neighbor upsampling), (d) He et al. [8], (e) Park et al. [13], (f) Chan et al. [2], (g) SRF [10], (h) our method.

Fig. 5.
figure 5

Joint upsampling on “Moebius” image. (a) high-resolution RGB image, (b) original depth map, (c) low-resolution and noisy depth image (enlarged using nearest neighbor upsampling), (d) He et al. [8], (e) Park et al. [13], (f) Chan et al. [2], (g) SRF [10], (h) our method.

The numerical results for this experiment in terms of the PSNR are shown in Table 1. In our experiments, our method clearly outperforms the other four methods in the most cases.

The numerical results for this experiment in terms of the Peak Signal Noise Ratio (PSNR) are shown in Table 1 and Structural Similarity (SSIM) [18] in Table 2. From the Table 1, one can see that our method clearly outperforms the other four method in the most cases. In Table 2, the proposed method achieve significant SSIM improvements over other leading methods. In average, our algorithm outperforms other methods by 0.05 for the SSIM comparison.

To show the visual comparison clearly, we show some results of experiments in Fig. 3. One can find that our method can enhance edges and reduce noise better, whereas other algorithms suffer from edge blurring or noise. From Table 1 and Figs. 3, 4 and 5 one can observe that the proposed approach is effective for noisy complex scenes and can obtain clearer high resolution depth images.

In order to show the stability of the proposed deconvolution algorithm, we give the convergence curve of the alternative optimization in Fig. 6. We plot the histories of the relative error \(\mid S^{k+1} - S^{k} \mid \). Three depth images (Art, Book and Moebius) are used and the scaling factor is 8. It is noticeable that the proposed method is stable.

Fig. 6.
figure 6

Convergence curves of the alternative optimization.

In order to improve computational time and storage efficiency, we only compute the “best” neighbors, that is, for each pixel p, we only include the 10 best neighbors in the searching window of \(7 \times 7\) centered at p and the size of patch is \(5 \times 5\), the parameters a and h are empirically set to 0.5 and 0.25, respectively. 7–10 iterations are generally performed in our algorithm.

For computational time, the proposed approach takes about 2.3 s for a computer which runs Windows 7 64bit version with Intel Core i5 CPU and 8 GB RAM to construct the weight function of a \(256 \times 256\) image in Matlab 2010b. Once the weight is constructed, the iteration of our method is comparable to ROF [15] in speed. The computation speed depends on the number of iterations. In general, it takes around 3.5 s for 10 iterations.

4 Conclusion

In this work, we propose a solution for nonlocal \(L_{0}\) gradient minimization and show its applications for depth image upsampling. We propose an effective smoothing approach based on minimizing discretely counting nonlocal spatial changes. Different from many optimized based filters, the proposed method has the property of joint filtering, so our filter can be used for many applications. In particular, it achieves good performance in the depth image super resolution. Treating the high-resolution RGB image as a reference image, the proposed algorithm is well suited for upsampling the low-resolution depth image. The experimental results demonstrate that the proposed approach is promising, and it has better objective performance compared to the existing upsampling methods.