Keywords

1 Introduction

The range image is obtained from a stereo system or a depth sensor, with which a disparity map or a depth image is formed, respectively. These images have attractive characteristics, such as invariance to lighting, rotation or scaling as presented by Creusot [1], allowing to work with data without prior processing. However, the stereo system and depth sensors such as the Kinect present images with areas without information, because infrared light does not find a surface where it is reflected. These areas can be presented as small spaces scattered throughout or as large regions of null information (without intensity levels).

There are different works in the literature focused on the restoration of range images, which we can classify into four groups according to the information they use to perform this task. The first consists of techniques that make use of one or several RGB images of the same scene for restoration, as presented by Pertuz et al. [2] and Torres and Dudek [3]. Wang et al. [4] use multiple range and RGB images. Additionally, the inpainting method is used to estimate depth and color information together. The second group is conformed of proposals that implement multiple range images obtained by the same sensor. Examples are found in Kolmogorov and Zabih [5] and Lin et al. [6] use a couple of images to carry out the restoration, both use the graphic cutting method for this task. Lin et al. [7] use a sequence of depth images obtained by the Xtion pro live sensor, achieving both spatial and temporal noise reduction in depth images. The third group makes use of several images, but these come from different sensors. The works presented by Zhu et al. [8] and Gudmundsson et al. [9], use images obtained by the ToF sensor, as well as, a stereo system. The authors take advantage of both systems, obtaining higher quality range images. However, prior alignment and calibration of these sensors is required. Finally, the fourth group consists of those works that use only the range image to restore. Sruthy et al. [10] and Chen et al. [11] manage to restore range images without using RGB images or multiple images. Both works detect regions without information, classify or reduce these regions in a preprocessing, and use linear and non-linear filters to restore the image.

This proposal implements an inpainting method that allows restoring the range image from the same information, without using other images (RGB or range). The processing is carried out by the Gaussian Pyramid method proposed by Ogden [12], which is mainly based on processes of reduction, enlargement and interpolation. In addition to implementing the method for range images, the work is based on improving the results of the Gaussian Pyramid by modifying the interpolation technique to be used. Usunariz in [13] uses this method but applying the 4-directions interpolation technique. The proposed work aims to improve the Gaussian Pyramid method in order to carry out the interpolation technique to be used, this to know which can be the most appropriate technique to achieve the best results. The obtained results are analyzed qualitatively and quantitatively, evaluating aspects such as the percentage of estimation achieved and the processing time. The structure of the work presented is described below. Section 2 describes the proposed methodology, as well as the interpolation techniques that were implemented. Section 3 exposes the obtained experimental results and Sect. 4 shows the conclusions based on the results.

2 Methodology

In this section, both the inpainting method for range images and the different interpolation techniques to be evaluated are described. In addition, in Sect. 2.3, it is detailed how the pixel estimation is performed applying an interpolation technique.

2.1 Restoration by Gaussian Pyramid Method

Fig. 1.
figure 1

Block diagram representative of the Gaussian Pyramid method to restored range image.

The proposed method for restoring range images is illustrated in Fig. 1. The first block is made up of the initialization processes, in which different matrices are created to store the corresponding information to be used in the following stages (Fig. 2, Block 1). A counter is also initialized to carry the number of reductions that make up the Gaussian reduction pyramid and a variable called “flag”, is declared with a value of one that will serve as the start and end the indicator for the next stage.

Fig. 2.
figure 2

Visual diagram and pseudocode of the process of restoring a range image. Block 1 corresponding to the initialization of the process; Block 2 refers to compute of the Reduction process and Block 3 aims to realize the Enlargement process.

Second block consists of a cycle for the processes of estimating information and removes areas without information (Fig. 2, Block 2). First process uses a function named “Fill_holes”. This function estimates the values of the empty pixels contain in the outline of the regions without information on the image being processed. For this task, it is proposed to implement four different interpolation techniques (see Sect. 2.2), in order to know which technique allows us to obtain the best restoration. Second process involves reducing regions without information present in the original image (Fig. 2, Reduction block). The cycle ends when there are no more zones without information, indicating the last reduction. Finally, third block consists of a cycle to carry out the enlargement of the restored but reduced image; as well as the replacement of the estimated information in each image. Applying the Gaussian Pyramid technique, the enlargement process is realized. By means of several substitution processes (Fig. 2, Block 3), an increasingly restored image is obtained (Fig. 2, Enlargement block). The cycle repeats until an image of the same size as the initial image is reached, indicating that the restored image has been obtained. The result is the restored image, which is the initial image, but with the estimated information inserted in the areas that initially did not contain information.

2.2 Interpolation Techniques

4-directions.

This interpolation technique was implemented according to the technique proposed by Usunariz in [13]. Staring from a pixel without information, it is necessary to confirm that, the two consecutive neighboring pixels have information. This can happen to the right, left, up or down. Figure 3a shows a representative diagram of this technique, where the possible information to use in the interpolation process (purple area) is observed. The black box represents the pixel without information and the red point is the position at which its value is interpolated.

Fig. 3.
figure 3

Representative diagram of a) 4-directions, b) Bilinear and c) Bicubic technique. (Color figure online)

4-directions 2.

Here, is presented a 4-directions technique variation. The difference with respect to the original technique consists in the values that are going interpolated and these are not consider for the following interpolations. Therefore, all interpolated values, are stored in a vector, until the interpolation process is finished, these new values are substituted in the corresponding pixels. Figure 4 shows a comparison of the two techniques.

Fig. 4.
figure 4

Representative diagram of the estimation of pixels without information by 4-directions and 4-directions 2 interpolation techniques. The estimated value of the pixels P6, P7 and P8 changes according to the technique implemented.

Bilinear.

In digital image processing, Bilinear interpolation makes use of the four pixels closest to the position to be interpolated [14]. A representative diagram of the Bilinear interpolation for pixel estimation without information is shown in Fig. 3b. The purple area encloses the 2 × 2 neighborhood pixels. The black box represents the pixel without information, which can take any of the four positions within the red area. The point is the position of which the pixel value is interpolated. The estimated value is calculated with (1), which ponders the influence of the neighborhood pixels and their corresponding weights in both directions. The parameters \( a \) and \( b \) are the distances in the vertical and horizontal direction, respectively, to the point interpolated from the pixel \( P\left( {i,j} \right) \). The function \( H\left( x \right) \) is the core function of the Bilinear interpolation given by (2).

$$ \begin{array}{*{20}l} {P\left( {i,j} \right) = H\left( { - a} \right) *H\left( b \right) *P\left( {i,j} \right) + H\left( { - a} \right) *H\left( { - \left( {1 - b} \right)} \right) *P\left( {i,j + 1} \right)} \hfill \\ {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, + \,H\left( {1 - a} \right) *H\left( b \right) *P\left( {i + 1,j} \right) + {\text{H}}\left( { - \left( {1 - a} \right)} \right) *H\left( { - \left( {1 - b} \right)} \right) *P\left( {i + 1,j + 1} \right)} \hfill \\ \end{array} $$
(1)
$$ H\left( x \right) = \left\{ {\begin{array}{*{20}c} {1 - \left| x \right|, x \in \left[ { - 1,1} \right]} \\ {0, otherwise} \\ \end{array} } \right. $$
(2)

Bicubic.

This technique considers the pixels within a 4 × 4 neighborhood to interpolate the desired value [15]. A representative diagram of this interpolation technique for pixel modification without information is shown in Fig. 3c. The purple area encloses \( 16 \) pixels within the 4 × 4 neighborhood; the black box represents the empty pixel, which can take any of the four positions within the red area. The red point is the position at which it interpolates in value of the empty pixel. The variables \( m \) and \( n \) indicate the rows and columns, respectively. This contains the neighborhood pixels in a 4 × 4 region for the point to be interpolated. Equation (3) allows us to calculate the value to be estimated; which is the pixel interpolated by a double sum of the product between each pixel, that is contained in the neighborhood and this is multiplied by the value of the core function \( H_{c} \left( x \right) \), calculate with (4), in both directions.

$$ P\left( {i,j} \right) = \sum\nolimits_{m = - 1}^{m = 2} {\sum\nolimits_{n = - 1}^{n = 2} {{\text{P}}\left( {{\text{i}} + {\text{m}},{\text{j}}} \right) * {\text{H}}_{\text{c}} \left( {{\text{m}} - {\text{a}}} \right) * {\text{H}}_{\text{c}} \left( {{\text{n}} - {\text{b}}} \right)} } $$
(3)
$$ H_{c} \left( x \right)\left\{ {\begin{array}{*{20}c} {\left( {\alpha + 2} \right)*\left| x \right|^{3} - \left( {\alpha + 3} \right)*\left| x \right|^{2} + 1, } & {x \in \left[ {0,1} \right)} \\ {\alpha *\left| x \right|^{3} - 5\alpha *\left| x \right|^{2} + 8\alpha *\left| x \right| - 4\alpha , } & {x \in \left[ {1,2} \right)} \\ {0, } & {x \ge 2} \\ \end{array} } \right. $$
(4)

The process for the estimation of the pixel without information through this interpolation technique considers that, several pixels within the neighborhood will take the value of zero. Therefore, the contribution within the double sum will be null by these pixels. For the above, the processing time is reduced due to redundant calculations must be avoided. That is, if should know what are the pixels that really have information within the neighborhood, then these are the ones that contribute their weight to obtain the estimate. For this, the position of the pixel \( P\left( {i,j} \right) \) must be known, since it will act as a pivot in (3), which allows us to know according to the location of the pixel without information, the values that \( m \) and \( n \) must take.

2.3 Fill_holes Function

All scheduled versions of the “Fill_holes” function follow the same execution sequence. Figure 5 presents a diagram with the steps performed by the “Fill_holes” function in general. The diagram begins with a matrix that represents an image that contains a region without information (the blue pixels make up the region without information while the orange pixels contain information). First step that takes place is an inverse binarization, identifying the regions without information. Second step is carried out is an emptying of all the regions present in the binary image. That is, all the pixels that do not correspond to the contour of each region will take a value of zero. In the third step, the interpolation process is performed to estimate the value of the pixels corresponding to the contour of the region without information. Here the different interpolation techniques described above are implemented, where each uses information from the neighborhood surrounding the pixel to be interpolated.

Fig. 5.
figure 5

Representative diagram of the process carried out by the “Fill_holes” function.

2.4 Analysis of the Amount of Missing Information in Range Images

Additionally, an analysis was carried out to know if the amount of missing information in the range images influences with respect to the estimated information rate. For this, all the images of the databases were used, being a total of \( 159 \). A statistical analysis of frequency distribution was performed, to know the average value of information estimation at different intervals of missing information. Eight classes were obtained, with a class width of \( 9 \). However, classes \( 5 \)\( 8 \) were grouped into one (new class \( 5 \)), because they had few elements in comparison with to the other classes, making the results insignificant. The five classes, identified by C1, C2, C3, C4 and C5, contain \( 51 \), \( 43 \), \( 37 \), \( 14 \) and \( 14 \) images, respectively.

3 Results

To validate and know the performance of the proposed method using the different interpolation techniques, it was tested with five image range databases. Database B1 [16] with \( 43 \) images of \( 581\,{ \times }\,421 \) pixels, B2 generated for this research work, that contains \( 26 \) images of \( 627\,{ \times }\,464 \) pixels, B3 comprehends \( 26 \) images of the generic Middlebury database [17, 18] of \( 1282\,{ \times }\,1110 \) pixels, B4 [19, 20], with \( 24 \) images of \( 581\,{ \times }\,421 \) pixels, and B5 [21] with \( 40 \) images of \( 581\,{ \times }\,421 \) pixels. Figure 6 shows the qualitative results of applying our proposed method, using different interpolation techniques to restore range images. Two images of each databases are presented. The images in the first column presents the RGB image as reference only of the scene. Column 2 illustrates the range images to be restore. From the third to the sixth columns, the images restored by interpolation techniques are shown, corresponding to the techniques 4-directions, 4-directions 2, Bilinear and Bicubic, respectively. According to the qualitative results, since the implementation of the proposed method with interpolation in 4-directions, an image restoration is obtained (Fig. 6, column 3). However, this technique can add non-existent artifacts (Fig. 6, row 5, column 3) due to the how interpolation is performed. The second technique, 4-directions 2, does not present this problem (Fig. 6, row 5, column 4); this is due to not considering the interpolated pixels for the following interpolations. In the same way, an improvement is qualitatively obtained in several estimated regions, this with respect to the 4-directions (Fig. 6, row 2, column 4). Bilinear and Bicubic techniques offer the best qualitative results; because increasing the number of estimated regions with a major homogeneity with respect to intensity values (Fig. 6, row 1, 3–4, column 5–6).

Fig. 6.
figure 6

Qualitative results of the restoration applying our proposed method with different interpolation techniques using several generic and own databases. Range images are presented in the HSV color space, where regions in red represent pixels without information.

To compare and determine which of these techniques is the most appropriate to use in our estimation method, three metrics were calculated to evaluate them: the mean square error (MSE), the estimated information rate and the processing time. Figure 7 presents the obtained results for the estimated information rate (Fig. 7b) and the processing time (Fig. 7a), applying the four interpolation techniques in each database. Considering the individual average of each technique in the different databases, we calculate a global average to compare them with each other and know the overall result of implementing these techniques in range images. First, we present the global results regarding the percentage of estimated information: For the 4-directions technique, \( 79.33\% \) is presented, for 4-directions 2 with of \( 79.9\% \), for Bilinear and Bicubic, \( 83.76\% \) and \( 83.57\% \), respectively. According to the above, the 4-directions, the original technique, is the one that presents the lowest estimated information rate. The technique 4-directions 2, Bilinear and Bicubic, presents \( 0.71\% \), \( 5.58\% \) and \( 5.34\% \), respectively, of increase with respect to the 4-directions. Now we present the global results regarding the processing time: For the 4-directions a \( 0.84 \,\text{s} \), 4-directions 2 an average of \( 1.04 \,\text{s} \), for the Bilinear and Bicubic interpolation, there are \( 30.60\, \text{s} \) and \( 12.23 \,\text{s} \), respectively. Therefore, the original technique is the one that presents the best processing time, followed by the 4-directions 2, then the Bicubic and finally, the Bilinear. For the MSE, all techniques have a value greater than \( 97\% \), so it is ensured that the original information is maintained in the depth image, regardless of the interpolation technique implemented.

Fig. 7.
figure 7

Quantitative results in the 5 databases, applying the different interpolation techniques. a) Results of the processing time. b) Results of the estimated information rate.

Fig. 8.
figure 8

Estimated information rate by class, applying the different interpolation techniques. a) 4-directions, b) 4-direcctions 2, c) Bilinear and d) Bicubic.

The results of the estimated information rate with respect to the percentage of missing information presented by the range images are presented below. First, for the \( 159 \) images, with a maximum missing information of \( 68.2\% \) and a minimum of \( 0.5\% \). The analysis was carried out in all classes, applying several interpellation techniques, the results are shown in Fig. 8. The 4-directions, Bilinear and Bicubic techniques have a maximum percentage of estimated pixels for C5, images with the greatest lack of information. C3 is the one that, regardless of the interpolation technique implemented, presents the minimum amount of estimated information. The 4-directions 2 technique allows obtaining a higher estimation rate for images belonging to class C1. Therefore, the present proposal offers repetitiveness and consistency with respect to the different interpolation techniques here tested.

4 Conclusions

The implementation of the Gaussian Pyramid method allows the restoration of range images from the same image, without making use of multiple range images or RGB image. The interpolation technique used in this method does affect the result of the restoration of the range images. The original interpolation technique, 4-directions, can present non-existent artifacts in the restored image. However, this problem does not occur when applying the other interpolation techniques. According to the amount of information missing in the range images, the interpolation techniques in 4-directions, Bilinear and Bicubic, have a maximum of estimated pixels for the C5. The 4-directions 2 technique has a value maximum estimate for C1. Finally, it is concluded that, if the processing time is not an important point to consider, the Bilinear interpolation technique is the recommended technique to implement. If the processing time is important, it is recommended to implement the 4-directions 2 technique, since, although it has a longer time than the original technique, it allows to increase the amount of estimated pixels and does not show the problem of unwanted artifacts.

Future work implies the restored range images will be used to perform object detection tests, whose results in processing time and detection accuracy will be compared with those obtained from the detection of objects in range images without process.