Abstract
Depth information is an important ingredient in image-based rendering (IBR) systems. Traditional depth acquisition is mainly based on computer vision or depth sensing devices. With the advent of electronics, low-cost and high-speed depth acquisition devices, such as the recently launched Microsoft Kinect, are getting increasingly popular. A comprehensive review of these important and emerging problems and their solutions are thus highly desirable. This paper aims to 1) review and summarize the various approaches to depth acquisition and highlight their advantages and disadvantages, 2) review problems arising from calibration and imperfections of these devices and state-of-the-art solutions, and 3) propose a surface-normal-based joint-bilateral filtering method for fast spatial-only restoration of missing depth data and a confidence-based IBR algorithm for reducing artifacts under depth uncertainties. For the latter, we propose a confidence measure based on color-depth, spatial and restoration information. A joint color-depth Bayesian matting approach is proposed for refining the depth discontinuities and the alpha matte for rendering. Improved rendering results are obtained compared with rendering using conventional restored depth maps. Possible future work and research directions are also briefly outlined.
Similar content being viewed by others
References
Chen, S. E. (1995). QuickTime VR—an image-based approach to virtual environment navigation. In Proc. Annu. Comput. Graph. (SIGGRAPH’95), Aug., pp. 29–38.
Debevec, P. E., Taylor, C. J., & Malik, J. (1996). Modeling and rendering architecture from photographs: A hybrid geometry – and image-based approach. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’96), Aug., pp. 11–20.
Gortler, S. J., Grzeszczuk, R., Szeliski, R., & Cohen, M. F. (1996). The lumigraph. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’96), Aug., pp. 43–54.
Levoy, M., & Hanrahan, P. (1996). Light field rendering. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’96), Aug., pp. 31–42.
McMillan, L., & Bishop, G. (1995). Plenoptic modeling: An image-based rendering system. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’95), Aug., pp. 39–46.
Peleg, S., & Herman, J. (1997). Panoramic mosaics by manifold projection. In Proc. IEEE Comput. Soc. Conf. CVPR, Jun., pp. 338–343.
Szeliski, R., & Shum, H. Y. (1997). Creating full view panoramic image mosaics and environment maps. In Proc. Annu. Conf. Comput. Graph (SIGGRAPH’97), Aug., pp. 251–258.
Shade, J., Gortler, S., He, L. W., & Szeliski, R. (1998). Layered depth images. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’98), Jul, pp. 231–242.
Chen, S. E., & Williams, L. (1993). View interpolation for image synthesis. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’93), Aug., pp. 279–288.
Shum, H. Y., & He, L. W. (1999). Rendering with concentric mosaics. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’99), Aug, pp. 299–306.
Zhou, K., Hu, Y., Lin, S., Guo, B., & Shum, H. Y. (2005). Precomputed shadow fields for dynamic scenes. In Proc. Annu. Conf. Comput. Graph. (SIGGRAPH’05), Aug., pp. 1196–1201.
Shum, H. Y., Chan, S. C., & Kang, S. B. (2007). Image-based rendering. New York: Springer.
Adelson, E. H., & Bergen, J. (1991). The plenoptic function and the elements of early vision. In Comput. Models Visual Process (pp. 3–20). Cambridge: MIT Press.
Chan, S. C., Shum, H. Y., & Ng, K. T. (2007). Image-based rendering and synthesis: technological advances and challenges. IEEE Signal Processing Magazine, 24(6), 22–33.
Redert, P. A., Op de Beeck, M., Fehn, C., IJsselsteijn, W., Pollefeys, M., Van Gool, L., et al. (2002). ATTEST: Advanced Three-dimensional Television System Technologies. In Proc. of 1st Int. Symp. on 3D Processing, Visualization, Transmission (3DPVT) pp. 313–319.
Op de Beeck, M., Wilinski, P., Fehn, C., & Kauff, P. (2002). Towards an optimized 3D broadcast chain. In ITCOM 2002, 3D-TV, Video and Display, SPIE Int. Symposium pp. 42–50.
Blum, M., Springenberg, J. T., Wülfing, J., & Riedmiller, M. (2012). A learned feature descriptor for object recognition in RGB-D data. IEEE International Conference on Robotics and Automation (ICRA). St. Paul, Minnesota, USA.
Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. In IEEE Int. Conf. on Intell. Robots and Systems (IROS).
Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. In IEEE Int. Conf. Intell. Robots and Systems (IROS).
Merkle, P., Smolic, A., Müller, K., & Wiegand, T. (2007). Multi-view video plus depth representation and coding. In Proc. IEEE Int. Conf. Image Process. San Antonio, Texas, Sep., pp. 201–204.
Liu, Y., Huang, Q., Ma, S., Zhao, D., Gao, W., Ci, S., et al. (2011). A novel rate control technique for multiview video plus depth based 3D video coding. IEEE Transactions on Broadcasting, 57(2), 562–571.
Shao, F., Jiang, G., Yu, M., Chen, K., & Ho, Y. S. (2012). Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia, 14(1), 1–11.
Herrera Castro, D., Kannala, J., & Heikkila, J. (2011). Accurate and practical calibration of a depth and color camera pair. In Int. Conf. Computer Analysis of Images and Pattern, vol. II, LNCS 6855 pp. 437–445.
Zhang, C., & Zhang, Z. (2011). Calibration between depth and color sensors for commodity depth cameras. In Int. Workshop Hot Topics in 3D, in conjunction with ICME.
Herrera Castro, D., Kannala, J., & Heikkila, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Trans. Pattern Anal. Mach. Intell., vol. 99, no. PrePrints, May.
Smisek, J., Jancosek, M., & Pajdla, T. (2011). 3D with Kinect. In IEEE Workshop on Consumer Depth Cameras for Computer Vision.
Khoshelham, K., & Oude Elberink, S. (2012). Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors, 12(2), 1437–1454.
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In Proc. of International Conference on Robotics and Automation (ICRA).
Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the Kinect to work. In ICCV Workshop on Consumer Depth Cameras in Computer Vision.
Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proc. of the International Conference on Computer Vision- Workshop on 3D Representation and Recognition.
Matyunin, S., Vatolin, D., Berdnikov, Y., & Smirnov, M. (2011). Temporal filtering for depth maps generated by Kinect depth camera. In 3DTV Conference: The true vision—capture, transmission and display of 3D video (3DTV-CON) (pp.1–4) May.
Camplani, M., & Salgado, L. (2012). Efficient spatio temporal hole filling strategy for Kinect depth maps. In IS&T/SPIE Int. Conf. on 3D Image Processing (3DIP) and Applications, San Francisco Airport (CA), USA, SPIE vol. 8290, pp. 82900E 1–10, Jan.
[Online]. Available: http://nicolas.burrus.name.
Paris, S., & Durand, F. (2006). A fast approximation of the bilateral filter using a signal processing approach. In European Conf. Computer Vision (ECCV’06), Mar., pp. 568–580.
Schaffalizky, F., & Zisserman, A. (2000). A six point solution for structure and motion. In European Conf. Computer Vision(ECCV’00), pp. 632–648.
Salvi, J., Fernandez, S., Pribanic, T., & Llado, X. (2010). A state of the art in structured light patterns for surface profilometry. Pattern Recognition, 43(8), 2666–2680.
Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple images. Optical Engineering, 19(1), 139–144.
Zhang, L., & Seitz, S. (2005). Parameter estimation for MRF stereo. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 288–295.
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.
Sun, J., Li, Y., Kang, S. B., & Shum, H. Y. (2005). Symmetric stereo matching for occlusion handling. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 2, Aug., pp. 399–406.
Kolmogorov, V., & Zabih, R. (2001). Computation visual correspondence with occlusions using graph cuts. In Proc. Int. Conf. Comput. Vision, vol.2, Jul., pp. 508–515.
Yang, Q., Wang, L., Yang, R., Stewénius, H., & Nistér, D. (2009). Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 492–504.
Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In Proc. IEEE Int. Conf. Pattern Recognit., vol. 3, Sep., pp. 15–18.
Wang, Z., & Zheng, Z. (2008). A region based stereo matching algorithm using cooperative optimization. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 887–894.
Bleyer, M., Rother, C., & Kohli, P. (2010). Surface stereo with soft segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, Aug., pp. 1570–1577.
Taguchi, Y., Wilburn, B., & Zitnick, L. (2008). Stereo reconstruction with mixed pixels using adaptive over-segmentation. In Proc. IEEE Comput. Soc. Conf. CVPR, vol. 1, no. 12, Aug., pp. 2720–2727.
[Online]. Available: http://www.ptgrey.com/products/stereo.asp.
Robertson, D. P., & Cipolla, R. (2002). Building architectural models from many views using map constrains. In European Conf. Computer Vision(ECCV’02), pp. 155–169.
Hartley, R., & Sturm, P. (1997). Triangulation. Computer Vision and Image Understanding, 68, 146–157.
Triggs, P. M. B., Hartley, R., & Fiztgibbon, A. (2000). Bundle adjustment—a mordern synthesis. In Vision algorithms: Theory and practice. Springer-Verlag, (vol. 1883, pp. 298–372).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge Univ Press.
Triggs, B. (1996). Factorization methods for projective structure and motion. IEEE Int. Conf. on Computer Vision & Pattern Recognition, pp. 845–851.
Sturm, P., & Triggs, B. (1996). A factorization based algorithm for multi-image projective structure and motion. In European Conf. Computer Vision(ECCV’96), pp. 709–720.
Heyden, A., Berthilsson, R., & Sparr, G. (1999). An iterative factorization method for projective structure and motion from image sequences. Image and Vision Computing, 17, 981–991.
Heyden, A. (1997) Projective structure and motion from image sequences using subspace methods. In Conf. Image Analysis, pp. 963–968.
Tang, W. T., & Hung, Y. S. (2006). A column-space approach to projective reconstruction. Computer Vision and Image Understanding, 101, 166–176.
Tang, W. K., & Hung, Y. S. (2006). A subspace method for projective reconstruction from multiple images with missing data. Image and Vision Computing, 54, 515–524.
[Online]. Available: http://www.vicon.com/boujou/.
Papagiannakis, G., Schertenleib, S., O’Kennedy, B., Arevalo-Poizat, M., Magnenat-Thalmann, N., & Thalmann, D. (2005). Mixing virtual and real scenes in the site of ancient Pompeii. Computer Animation and Virtual Worlds, 16(1), 11–24.
Levoy, M., Pulli, K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., et al. (2000). The digital Michelangelo Project: 3D scanning of large statues. In Proc. Annu. Comput. Graph. (SIGGRAPH’00), pp. 131–144.
Ikeuchi, K., Nakazawa, A., Hasegawa, K., & Ohishi, T. (2003). The Great Buddha Project: Modeling cultural heritage for VR systems through observation. In Proc. of the 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 7–16.
Kovacs, L., Zimmermann, A., Brockmann, G., Baurecht, H., Schwenzer-Zimmerer, K., Papadopulos, N. A., et al. (2006). Accuracy and precision of the three-dimensional assessment of the facial surface using a 3-D laser scanner. IEEE Transactions on Medical Imaging, 25(6), 742–754.
Foix, S., Alenyà, G., & Torras, C. (2011). Lock-in time-of-flight (ToF) cameras: a survey. IEEE Sensors Journal, 11(9), 1917–1926.
Cui, Y., Schuon, S., Chan, D., Thrun, S., & Theobalt, C. (2010). 3D shape scanning with a time-of-flight camera. In Proc. IEEE Comput. Soc. Conf. CVPR, 2010, pp. 1173–1180.
Bartczak, B., Schiller, I., Beder, C., & Koch, R. (2008). Integration of a time-of-flight camera into a mixed reality system for handling dynamic scenes, moving viewpoints and occlusions in real-time. In Proc. of the 3DPVT Workshop, Jun.
Schiller, I., Bartczak, B., Kellner, F., & Koch, R. (2010). Increasing realism and supporting content planning for dynamic scenes in a mixed reality system incorporating a time-of-flight camera. Journal of Virtual Reality and Broadcasting, 7(4) urn:nbn:de:0009-6-25786, ISSN 1860–2037.
Bohme, M., Haker, M., Martinetz, T., & Barth, E. (2008). A facial feature tracker for human-computer interaction based on 3D time-of-flight cameras. International Journal of Intelligent Systems Technologies and Applications, 5(3/4), 264–273.
Kolb, A., Barth, E., Koch, R., & Larsen, R. (2010). Time-of-flight sensors in computer graphics. Computer Graphics Forum, 29(1), 141–159.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R. A., Kohli, P., et al. (2011). KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. ACM UIST Symposium, pp. 559–568.
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., et al. (2011). KinectFusion: Real-time dense surface mapping and tracking. In Proc. Int. Conf. Research, Tech. Application Mixed Augmented Reality, pp. 127–136.
Tong, J., Zhou, J., Liu, L., Pan, Z., & Yan, H. (2012) Scanning 3D full human bodies using kinects. IEEE Transactions on Visualization and Computer Graphics, 18(4), 643–650.
Cui, Y., & Stricker, D. (2011). 3D shape scanning with a Kinect. In Proc. Annu. Comput. Graph. (SIGGRAPH’11), pp. 57–57.
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). Sparse distance learning for object recognition combining RGB and depth information. In Proc. Int. Conf. Robotics Automation (ICRA), pp. 4007–4013.
Simonyan, K., Grishin, S., & Vatolin, D. (2008). Confidence measure for block-based motion vector field. In Proc. GraphiCon, pp. 110–113.
Chuang, Y., Curless, B., Salesin, D. H., & Szeliski, R. (2001). A Bayesian approach to digital matting. In Proc. IEEE Comput. Soc. Conf. CVPR, Dec., vol. II, pp. 264–271.
Berman, A., Dadourian, A., & Vlahos, P. (2000). Method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,346.
Berman, A., Vlahos, P., & Dadourian, A. (2000). Comprehensive method for removing from an image the background surrounding a selected object. U.S. Patent 6,134,345.
McMillan, L. (1997). An image-based approach to three-dimensional computer graphics, PhD thesis, University of North Carolina, Chapel Hill, USA, Apr..
Morvan, Y. (2009). Acquisition, compression and rendering of depth and texture for multi-view video, PhD thesis, Eindhoven University of Technology, The Netherlands, Jun.
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Image Processing, 54(11), 4311–4322.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by Hong Kong Research Grant Council (RGC) and the Innovation and Technology fund (ITF).
Rights and permissions
About this article
Cite this article
Wang, C., Zhu, ZY., Chan, SC. et al. Real-Time Depth Image Acquisition and Restoration for Image Based Rendering and Processing Systems. J Sign Process Syst 79, 1–18 (2015). https://doi.org/10.1007/s11265-013-0819-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-013-0819-2