Superpixel-based color–depth restoration and dynamic environment modeling for Kinect-assisted image-based rendering systems

  • 892 Accesses

  • 4 Citations


Depth information is an important ingredient in many multiview applications including image-based rendering (IBR). With the advent of electronics, low-cost and high-speed depth cameras, such as the Microsoft Kinect, are getting increasingly popular. In this paper, we propose a superpixel-based joint color–depth restoration approach for Kinect depth camera and study its application to view synthesis in IBR systems. Thus, an edge-based matching method is proposed to reduce the color–depth registration errors. Then the Kinect depth map is restored based on probabilistic color–depth superpixels, probabilistic local polynomial regression and joint color–depth matting. The proposed restoration algorithm does not only inpaint the missing data, but also correct and refine the depth map to provide better color–depth consistency. Last but not the least, a dynamic background modeling scheme is proposed to address the disocclusion problem in the view synthesis for dynamic environment. The experimental results show the effectiveness of the proposed algorithm and system.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Chan, S.C., Shum, H.Y., Ng, K.T.: Image-based rendering and synthesis. IEEE Signal Process. Mag. 24(6), 22–33 (2007). doi:10.1109/MSP.2007.905702

  2. 2.

    Shum, H.Y., Chan, S.C., Kang, S.B.: Image-based rendering. Springer, New york (2008)

  3. 3.

    Han, J., Pauwels, E., de Zeeuw, P., de With, P.: Employing a rgb-d sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58(2), 255–263 (2012). doi:10.1109/TCE.2012.6227420

  4. 4.

    Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J.T., Gusev, G.: 3D self-portraits. ACM Trans. Graph. (TOG) 32(6), 187 (2013)

  5. 5.

    Pedersoli, F., Benini, S., Adami, N., Leonardi, R.: Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis. Comput. 30(10), 1107 (2014). doi:10.1007/s00371-014-0921-x

  6. 6.

    Wang, C., Liu, Z., Chan, S.C.: Superpixel-based hand gesture recognition with kinect depth camera. IEEE Trans. Multimed. 17(1), 29–39 (2015). doi:10.1109/TMM.2014.2374357

  7. 7.

    Khoshelham, K., Elberink, S.O.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012)

  8. 8.

    Burrus, N.: Kinect calibration. (2014).

  9. 9.

    Herrera, C.D., Kannala, J., Heikkilä, J.: Joint depth and color camera calibration with distortion correction. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 2058–2064 (2012). doi:10.1109/TPAMI.2012.125

  10. 10.

    Zhang, C., Zhang, Z.: Calibration between depth and color sensors for commodity depth cameras. In: Proc. IEEE Int Multimedia and Expo (ICME) Conf, pp 1–6, doi: 10.1109/ICME.2011.6012191(2011)

  11. 11.

    Wang, Y., Zhong, F., Peng, Q., Qin, X.: Depth map enhancement based on color and depth consistency. Vis. Comput. 30(10), 1157 (2014). doi:10.1007/s00371-013-0896-z

  12. 12.

    Zhu, Z.Y., Zhang, S., Chan, S.C., Shum, H.Y.: Object-based rendering and 3-D reconstruction using a moveable image-based system. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1405–1419 (2012). doi:10.1109/TCSVT.2012.2198133

  13. 13.

    Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-D object dataset: Putting the kinect to work. In: Proc. IEEE Int Computer Vision Workshops (ICCV Workshops) Conf, pp 1168–1174, doi: 10.1109/ICCVW.2011.6130382 (2011)

  14. 14.

    Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: Proc. IEEE Int Computer Vision Workshops (ICCV Workshops) Conf, pp. 601–608, doi:10.1109/ICCVW.2011.6130298 (2011)

  15. 15.

    Ding, K., Chen, W., Wu, X.: Optimum inpainting for depth map based on l0 total variation. Vis. Comput. 30(12), 1311 (2014). doi:10.1007/s00371-013-0888-z

  16. 16.

    Matyunin, S., Vatolin, D., Berdnikov, Y., Smirnov, M.: Temporal filtering for depth maps generated by kinect depth camera. In: Proc. 3DTV Conf.: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4, doi:10.1109/3DTV.2011.5877202 (2011)

  17. 17.

    Butler, D.A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., Kim, D.: Shake’n’sense: reducing interference for overlapping structured light depth cameras. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pp. 1933–1936. ACM, New York. doi: 10.1145/2207676.2208335 (2012)

  18. 18.

    Zitnick, C.L., Kang, S.B.: Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vis. 75(1), 49–65 (2007)

  19. 19.

    Chaurasia, G., Duchene, S., Sorkine-Hornung, O., Drettakis, G.: Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. (TOG) 32(3), 30 (2013)

  20. 20.

    Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012). doi:10.1109/TPAMI.2012.120

  21. 21.

    Chen, X., Zou, D., Zhou, S., Zhao, Q., Tan, P.: Image matting with local and nonlocal smooth priors. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1902–1907. doi:10.1109/CVPR.2013.248 (2013)

  22. 22.

    Kong, X., Ng, M., Zhou, Z.H.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013). doi:10.1109/TKDE.2011.141

  23. 23.

    Takeda, H., Farsiu, S., Milanfar, P.: Kernel regression for image processing and reconstruction. IEEE Trans. Image Process. 16(2), 349–366 (2007). doi:10.1109/TIP.2006.888330

  24. 24.

    Zhang, Z., Chan, S.C., Wang, C.: A new regularized adaptive windowed lomb periodogram for time-frequency analysis of nonstationary signals with impulsive components. IEEE Trans. Instrum. Meas. 61(8), 2283–2304 (2012). doi:10.1109/TIM.2012.2186655

  25. 25.

    Katkovnik, V., Egiazarian, K., Astola, J.: A spatially adaptive nonparametric regression image deblurring. IEEE Trans. Image Process. 14(10), 1469–1478 (2005). doi:10.1109/TIP.2005.851705

  26. 26.

    Wang, C., Chan, S.C.: A new bandwidth adaptive non-local kernel regression algorithm for image/video restoration and its GPU realization. In: Proc. IEEE Int Circuits and Systems (ISCAS) Symp, pp. 1388–1391. doi:10.1109/ISCAS.2013.6572114 (2013)

  27. 27.

    Zhang, Z., Chan, S.C.: On kernel selection of multivariate local polynomial modelling and its application to image smoothing and reconstruction. J. Signal Process. Syst. 64(3), 361–374 (2011). doi:10.1007/s11265-010-0495-4

  28. 28.

    Paris, S., Durand, F.: A fast approximation of the bilateral filter using a signal processing approach. Int. J. Comput. Vis. 81(1), 24–52 (2009). doi:10.1007/s11263-007-0110-8

  29. 29.

    Wang, C., Zhu, Z.Y., Chan, S.C., Shum, H.Y.: Real-time depth image acquisition and restoration for image based rendering and processing systems. J. Signal Process. Syst. 79(1), 1–18 (2013). doi:10.1007/s11265-013-0819-2

  30. 30.

    Vázquez, C., Tam, W.J., Speranza, F.: Stereoscopic imaging: filling disoccluded areas in depth image-based rendering. In: Proc. SPIE 6392, Three-Dimensional TV, Video, and Display V, 63920D, vol 6392, pp. 1–12. doi: 10.1117/12.685047 (2006)

  31. 31.

    Liu, W., Zhang, D., Cui, M., Ding, J.: An enhanced depth map based rendering method with directional depth filter and image inpainting. Visu. Comput. 32(5), 579 (2016). doi:10.1007/s00371-015-1074-2

  32. 32.

    Zivkovic, Z., van der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn. Lett. 27(7), 773–780 (2006). doi:10.1016/j.patrec.2005.11.005

  33. 33.

    Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7):780–785. doi:10.1109/34.598236 (1997)

  34. 34.

    Berman, A., Dadourian, A., Vlahos, P.: Method for removing from an image the background surrounding a selected object. US Patent 6,134,346 (2000)

  35. 35.

    Munshi, A.: The opencl specification 2.0. (2014)

Download references

Author information

Correspondence to Chong Wang.

Additional information

This work was supported in part by K.C. Wong Magna Fund in Ningbo University; National Natural Science Foundation of China (61603202); Zhejiang Open Foundation from Information and Communication Engineering of the Most Important Subjects, China (xkxl1512, xkxl1526); the Open Project Program of the State Key Lab of CAD\( { \& }\)CG in Zhejiang University (A1606); the Research Foundation of Education Department of Zhejiang Province, China (Y201533827); Zhejiang Provincial Natural Science Foundation, China (LQ16F030001); Ningbo Natural Science Foundation, China (2016A610070) and the General Research Fund (GRF) of Hong Kong Research Grant Council (RGC).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 29564 KB)

Supplementary material 1 (mp4 29564 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Chan, S., Zhu, Z. et al. Superpixel-based color–depth restoration and dynamic environment modeling for Kinect-assisted image-based rendering systems. Vis Comput 34, 67–81 (2018).

Download citation


  • Image-based rendering
  • Superpixel
  • Kinect
  • Background modeling
  • Local polynomial regression