1 Introduction

Depth image based rendering (DIBR) is one of the most important technologies to synthesize different virtual views, using color images and their associated depth maps [3, 5, 15, 16, 31, 32]. However, some regions occluded by the foreground (FG) objects in the original views may become visible in the synthesized views, which are called disocclusion [2, 11, 12, 18, 28]. Figure 1 shows the occurrence of disocclusion in single-view rendering. Original camera represents the captured reference viewpoint, and virtual camera represents the virtual view synthesized by DIBR. The visible areas in the reference view are marked with vertical lines in red, while those visible areas in the virtual view are marked with horizontal lines in green. The disocclusion region, which is not visible in the reference viewpoint but visible in virtual image, is located at the boundary of the foreground and background as pointed out in Fig. 1. Disocclusion problem is particularly serious in single-view-plus-depth format. Therefore, the task of disocclusion-filling becomes more crucial in single-view rendering.

Fig. 1
figure 1

Occurrence of disocclusion in single-view rendering

Several methods for solving this problem have been proposed, which can be categorized into two classes: (1) depth map preprocessing before rendering [10, 13, 14, 29, 30] and (2) image inpainting jointing depth and texture information after rendering [6, 17, 19, 21, 24]. In this paper, we focus on the former class, which mainly consists of depth filtering and depth boundary refinement. The main idea of depth filtering is to reduce the difference of depth values on the object boundary. Lee et al. [14] adopted adaptive smoothing filters which including an asymmetric smoothing filter and a horizontal smoothing filter for depth map preprocessing. The asymmetric smoothing filter was adopted to reduce the geometric distortions and the horizontal smoothing filter was adopted to reduce the time of computation and occurrence of holes. In this way, disocclusion region in the synthesized view becomes smaller. However, depth filtering usually introduces rubber-sheet artifacts and geometric distortions. Another method applied is the dilation of the FG objects of the depth map [13, 29, 30]. In [29], a dilation based method for view synthesis by DIBR was proposed. The depth pixels which located in the large depth discontinuities are refined with the neighboring foreground depth values, and thus improve the quality of the rendered virtual view. In [30], Xu et al. extended the method in [29] with a misalignment correction of the depth map for a better view synthesis result. Koppel et al. [13] proposed a depth preprocessing method which including the depth map dilation, adaptive cross-trilateral median filtering, and adaptive temporally consistent asymmetric smoothing filtering to reduce holes and distortions in the virtual view. Under the dilation based method, the FG objects in the depth map are slightly bigger than the corresponding objects in the textured image, but the disocclusion region is not reduced. When the big holes are filled through image inpainting, the diffusion region may be blurred. In addition, the global dilating may causes distortions, as the boundary disparity information is changed in the common area. The disocclusion is mainly located between the FG and the background (BG). Furthermore, the size of the disocclusion region is quite large. Such challenging conditions prevent the use of traditional image inpainting methods and make such holes difficult to be filled.

In this paper, in order to resolve the above problems, we propose a divide-and-conquer hole-filling method to effectively solve the disocclusion problem. Due to the fact that reducing the sharp depth discontinuities can narrow the disocclusion region, we propose to modify the depth pixels around the depth discontinuities by dilating associated with a linear interpolation process. In particular, to avoid the distortion on other FG objects, the interpolation is limited to the neighboring BG. Finally, median filtering is adopted to remove isolated depth pixels. Experimental results demonstrate that our approach is effective in recovering the large disocclusion region.

2 Proposed method

Figure 2 shows the block scheme of the proposed method, which includes disocclusion region detection, local object dilating, disocclusion handling, and median filtering. With the preparation of disocclusion region detection and local object dilating, the divide-and-conquer strategy is mainly reflected in the disocclusion handling section.

Fig. 2
figure 2

Block scheme of the proposed method

For the right synthesized view, disocclusion occurs on the depth map with high to low sharp depth transition areas, and vice versa for the left synthesized view as described in [14]. In this paper, we take the rendering of the right virtual view for example. The same process can be applied on the rendering of the left virtual view. The depth difference between two horizontal adjacent pixels can be expressed as:

$$ {d_{f}}\left( {x,y} \right) = d\left( {x\text{{ -} }1,y} \right) - d\left( {x,y} \right) $$
(1)

where \(d\left ({x,y} \right )\) represents the depth pixel value at position \(\left ({x,y} \right )\), and \({d_{f}}\left ({x,y} \right )\) represents the horizontal depth difference between the depth pixels at positions \(\left ({x,y} \right )\) and \(\left ({x\text {{ -} }1,y} \right )\). If \({d_{f}}\left ({x,y} \right )\) is larger than the specified threshold T 0, there is a large depth discontinuity at position \(\left ({x,y} \right )\).

Based on the above concept, the disocclusion regions are detected according to the degree of depth discontinuity and are labeled as b(x,y)=1 as follows.

$$ b\left( {x,y} \right) = \left\{ {\begin{array}{*{20}{c}} {1,\;\begin{array}{*{20}{c}} {\;if\;{d_{f}}\left( {x,y} \right) > {T_{0}}} \end{array}}\\ {0,\;\;\;\;otherwise\;\;\;\;\;\;\;\;\;} \end{array}} \right. $$
(2)

Then the target region for refinement is restricted to the neighboring BG and is marked as a binary mask as:

$$ \begin{array}{l} mask\left( {{x_{0}} + k,y} \right) = \left\{{\begin{array}{*{20}{c}} {1}&{if\begin{array}{*{20}{c}} \end{array}b\left( {{x_{0}},y} \right) = 1\& \&} \\ {}&{d({x_{0}} + k,y) - d({x_{0}},y) < {T_{0}}}\\ {0}&{otherwise} \end{array}} \right.\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;k \in \left\{ {0,1,2, {\cdots} 2 * {d_{f}}\left( {{x_{0}},y} \right)\text{{ -} }1} \right\} \end{array} $$
(3)

where x 0 represents the x-coordinate of a pixel where \(b\left ({{x_{0}},y} \right ) = 1.\) The condition d(x 0 + k,y)−d(x 0,y)<T 0 is used to avoid that other FG objects be marked by the binary mask.

Figure 3 shows the relationship between the virtual view and the original view. Assume that \(d\left ({x - 1,y} \right )\) is 30 and \(d\left ({x ,y} \right )\) is 28, the depth difference between \(d\left ({x - 1,y} \right )\) and \(d\left ({x ,y} \right )\) is equal to 2 (as shown in Fig. 3, the adjacent pixels are marked in red and blue). After rendering, the hole region is appeared in the virtual view (the black pixels between the red pixel and blue pixel). There are two hole pixels in the virtual view between the adjacent pixels in original view which are marked in red and blue, respectively. It is worthwhile to mention that the pixel value of depth map in Middlebury is equals to the value of disparity, and the parallel rendering is adopted in our paper.

Fig. 3
figure 3

The relationship between the virtual view and the original view

A more generalize description is shown as follows. As the similar with [11], assume that the x-coordinates of two adjacent pixels in the original view are x-1 and x, the x-coordinates of the two pixels become \(x - 1 - d\left ({x - 1,y} \right )\) and \(x - d\left ({x,y} \right )\) after rendering. Thus, the width of the disocclusion region can be formulated as follows.

$$ \begin{array}{l} \left( {x - d\left( {x,y} \right)} \right) - \left( {x - 1 - d\left( {x - 1,y} \right)} \right) - 1 = d\left( {x - 1,y} \right) - d\left( {x,y} \right) = {d_{f}}\left( {x,y} \right) \end{array} $$
(4)

For \({\left ({{x_{0}},y} \right )}\), the width of the disocclusion region (which is the information loss area in the synthesized view) is equal to \({{d_{f}}\left ({{x_{0}},y} \right )}\). In our proposed method, the neighboring background information with width of \({{d_{f}}\left ({{x_{0}},y} \right )}\), is used to fill the disocclusion region. Thus, the maximum value of k is set as \({2 * {d_{f}}\left ({{x_{0}},y} \right )-1}\). In other words, the horizontal size of the refined region in the depth map is \({2 * {d_{f}}\left ({{x_{0}},y} \right )}\).

An example of the disocclusion region detection is shown in Fig. 4. Figure 4a and b show the original left color image and its associated depth map. Figures 4c and d demonstrate that half of the binary mask (Fig. 4c) can accurately mark out the disocclusion region (the black areas in Fig. 4d). It is worthwhile to point out that the region marked by the red rectangle in Fig. 4c does not match the hole in Fig. 4d. This is because other FG object is present in this region. In this way, the FG depth pixels are preserved, and the distortion on the FG object after rendering is avoided. Figures 4e and f show the image inpainting result accompanied by annoying artifact distortion.

Fig. 4
figure 4

Example of the disocclusion region detection. (a) The reference color image, (b) the corresponding depth map, (c) half of the binary mask, (d) the right virtual image generated by (a) and (b), (e) the hole-filling result without preprocessing, and (f) enlarging the red mark of (e)

The principle of the proposed divide-and-conquer disocclusion handling approach is shown in Fig. 5. Figure 5a shows the positional relationship in the horizontal direction between color and depth discontinuity. After rendering, the generated disocclusion around the depth discontinuity is shown in Fig. 5b. The size of the disocclusion region is determined by the value of \({{d_{f}}\left ({x,y} \right )}\), and the quantitative relationship between them can be found in [29]. In Fig. 5c, the disocclusion region is filled with the neighbor pixels. As the surrounding pixels of the disocclusion region contain both FG and BG pixels, annoying artifact will appear. To avoid this annoying artifact, the FG objects of the depth map, which neighbors with the sharp depth discontinuity, are firstly dilated as shown in Fig. 5d (marked by purple arrow). Thus, the depth boundary will be slightly wider than the corresponding color boundary. In such a way, the problem of annoying artifact is solved. A structure element n*n large is used for dilation in this paper.

Fig. 5
figure 5

Example for clarification of the divide-and-conquer disocclusion handling method. (a) Color pixel values and the corresponding depth values for a horizontal line, (b) color pixels after rendering by input depth map, (c) hole-filling result from (b), (d) our depth map preprocessing result, (e) virtual color pixels after rendering using (d), and (f) hole-filling result from (e)

The marked target region except for the dilation area in the depth map, is refined to reduce the sharp depth discontinuities through a linear interpolation process, whose pixel values decrease from the FG depth value to the BG depth value as shown in Fig. 5d (marked by red arrow). The linear interpolation process can be described as:

$$ \begin{array}{l} {d^{\text{{*}}}}\left( {{x_{0}} + k,y} \right) = d({x_{0}} - 1,y) - \frac{{{d_{f}}({x_{0}},y)}}{{2*{d_{f}}({x_{0}},y) - n}}*k\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;k \in \left\{ {n,n + 1, {\cdots} 2 * {d_{f}}\left( {{x_{0}},y} \right)\text{{ - 1}}} \right\} \end{array} $$
(5)

where d * represents the refined pixels in the depth map. Specifically, as we limited the refined areas, the minimum value of the linear interpolation will not reach the BG depth value when the horizontal distance between the boundaries of two FG objects is less than \({2 * {d_{f}}\left ({{x_{0}},y} \right )}\).

Finally, in order to get a better result of DIBR, a two-dimensional median filtering algorithm [6] is used to remove isolated depth pixels. In this way, the error pixels on the synthesized viewpoints can be reduced.

3 Experimental results

In order to evaluate the effectiveness of the proposed method, the color images and corresponding depth maps of Venus, Middl1, Rocks, Lampshade1, and Flowerpots, provided by Middlebury database [8, 22, 23] are used in our experiments. All the depth images provided by Middlebury database have the same resolution with the corresponding color images. The original images are shown in Fig. 6. T 0 is set as 5, and n is set as 3. The depth map of Venus after being refined by the above algorithms is shown in Fig. 7a. Since the disparity of the target region decreases linearly, after rendering by the preprocessed depth map, the large disocclusion on the virtual image is divided into several small uniform holes as shown in Fig. 7b. The small holes are easily filled by various inpainting methods [1, 20, 26]. In this paper, we show the result with Telea image inpainting [26]. To further improve the quality of the synthesized images of the proposed method, the pixels which are located in the regions of small holes are processed by a horizontal median filter [9] after image inpainting. The 1*5 smoothing window for the horizontal median filter was adopted in our experiment. To further show the advantages of the divide and conquer method with linear interpolation, the intermediate results of the proposed method are shown in Fig. 8. It can be seen from the figure, after rendering with the preprocessed depth map, the large disocclusion on the virtual image is divided into small holes, which are easily filled by image inpainting.

Fig. 6
figure 6

The original images of Middlebury database. (a) Venus (434*383), (b) Middl1 (465*370), (c) Rocks1 (425*370), (d) Lampshade1 (433*370), and (e) Flowerpots (437*370)

Fig. 7
figure 7

Results of the proposed algorithm for Venus. (a) The preprocessed depth map, (b) the synthesized image using (a), and (c) the hole-filling result

Fig. 8
figure 8

Intermediate results of the proposed algorithm for Middl1 and Lampshade1. The top row shows the results of the preprocessed depth map, and the bottom row shows the synthesized image using the preprocessed depth map. (a) Middl1, (b) Lampshade1

For comparison subjectively, results of different methods are also shown in Figs. 910 and 11. The virtual view images are synthesized with the original depth maps (Figs. 911a), the preprocessed depth maps by Lee’s method [14] (Figs. 911b), Xu’s method [30] (Figs. 911c), and the proposed divide-and-conquer hole-filling method (Figs. 911d), and the groundtruth(Figs. 911e), respectively. The top row shows the results of dealing with the disocclusion by the above methods, and the bottom row shows some parts of the virtual images which are labeled with red rectangles. It can be seen that the proposed method can fill the disocclusion regions more effectively than others. Meanwhile, the visual effect of the virtual image is improved remarkably. Compared to Xu’s method [30], the annoying artifact and diffusion effect are reduced and even removed by our proposed method.

Fig. 9
figure 9

Synthesized virtual images of Venus based on the produced depths. (a) Without preprocessing, (b) Lee’s method, (c) Xu’s method, (d) proposed divide-and-conquer hole-filling method, and (e) the groundtruth

Fig. 10
figure 10

Synthesized virtual images of Middl1 based on the produced depths. (a) Without preprocessing, (b) Lee’s method, (c) Xu’s method, (d) proposed divide-and-conquer hole-filling method, and (e) the groundtruth

Fig. 11
figure 11

Synthesized virtual images of Lampshade1 based on the produced depths. (a) Without preprocessing, (b) Lee’s method, (c) Xu’s method, (d) proposed divide-and-conquer hole-filling method, and (e) the groundtruth

To compare with the ground-truth provided by Middlebury database objectively, PSNR values are computed between the virtual images and the corresponding ground-truth. The results of PSNR for Middlebury database are shown in Table 1. From the table, the virtual images rendered with the proposed method achieve higher PSNR values compared to the images rendered without preprocessing, Lee’s method [14], and Xu’s method [30]. For example, with the proposed method, we can achieve 2.55 dB gain for Flowerpots as well as 0.17 dB gain for Venus, compared with Xu’s method [30]. The reasons why the proposed method is effective than other methods are as follows: 1) The important foreground information has less geometrical distortion compared with the smoothing based methods. 2) It is reasonable to fill holes using adjacent background texture, since the missing information of the disocclusion region in the synthesized view, which is occluded by the foreground object, is background information. 3) The disocclusion is divided into several small holes by the proposed divided and conquer strategy. Compared with big holes, small holes are easily to be filled, since there are background pixels which are adjacent with the small holes in the horizontal direction and the missing pixels on the small holes are similar with their horizontal neighbors.

Table 1 Comparison of PSNR (dB) for Middlebury database

It should be noted that the PSNR gain of smoothing based methods sometimes are lower than the results without preprocessing, as smoothing based methods cause geometrical distortion in texture. These methods are generally used for improving the visual effect of the synthesized views or boosting the computational speed. The dilation based method with a sophisticated in-painting method or texture synthesis method can reduce more errors and obtain a higher PSNR gain. It is mainly because the important foreground information has less geometrical distortion compared with the smoothing based methods, and the missing textures located in the disocclusion region are similar with the adjacent background textures. In addition to inheriting the advantages of the dilation based method, the proposed method divides the disocclusion into several small holes and thus makes holes to be filled easily and accurately.

To further evaluate the performance of the proposed method for real word cases, four test sequences, Balloons, Door Flowers, Lovebird1, and Exit [4, 7, 25, 27], are used in our experiments. The original images are shown in Fig. 12. The resolution for Balloons, Door Flowers, and Lovebird1 is 1024*768, and the resolution for Exit is 640*480. The 1st and 3rd viewpoints of Balloons, the 8th and 9th viewpoints of Door Flowers, the 6th and 7th viewpoints of Lovebird1, and the 1st and 2nd viewpoints of Exit are adopted. The PSNR values of the four test sequences are shown in Table 2, and the subjective comparison of different methods are shown in Figs. 13 and 14. From these results, it can be observed that the proposed method can improve the quality of the synthesized view objectively and subjectively.

Fig. 12
figure 12

The original images of real world test sequences. (a) Balloons (1024*768), (b) Door Flowers (1024*768), (c) Lovebird1 (1024*768), and (d) Exit (640*480)

Fig. 13
figure 13

Synthesized virtual images of Balloons based on the produced depths. (a) Without preprocessing, (b) Lee’s method, (c) Xu’s method, (b) proposed divide-and-conquer hole-filling method, and (e) the groundtruth

Fig. 14
figure 14

Synthesized virtual images of Lovebird1 based on the produced depths. (a) Without preprocessing, (b) Lee’s method, (c) Xu’s method, (d) proposed divide-and-conquer hole-filling method, and (e) the groundtruth

Table 2 Comparison of PSNR (dB) for Real World Test Sequences

In order to investigate the additional computation complexity of the proposed method, we performed experiments under the simulation environments of Intel(R) Core(TM) i5-4590 CPU @3.30GHz with 8.0GB memory and 64 bit Windows 7 operating system. The comparisons of running time are shown in the Table 3. It can be observed that the running time of the proposed algorithm is less than Lee’s method, and a little more than Xu’s method, which is acceptable.

Table 3 Comparison of the Running Time (in Seconds)

4 Conclusion

In this paper, we have presented a novel and effective divide-and-conquer method for handling disocclusion of the synthesized image in single-view rendering. Firstly, a binary mask is used to mark the disocclusion region. Then, the depth pixels located in the disocclusion region are modified by a linear interpolation process. Finally, a median filtering is adopted to remove the isolated depth pixels. With the proposed method, disocclusion regions in the synthesized virtual view are divided into several small holes after DIBR, and are easily filled by image inpainting. Experimental results demonstrate that the proposed method can effectively improve the visual effect of the synthesized view. Furthermore, the proposed method gives higher PSNR values of rendered virtual images compared to previous methods.