Skip to main content
Log in

Depth map compression via 3D region-based representation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282. doi:10.1109/TPAMI.2012.120

    Article  Google Scholar 

  2. Arbelaez P, Maire M, Fowlkes C, Malik J (2009) From contours to regions: an empirical evaluation. In: IEEE Conference on computer vision and pattern recognition, Miami, pp 2294–2301. doi:10.1109/CVPR.2009.5206707

  3. Ataer-Cansizoglu E, Taguchi Y, Ramalingam S, Garaas T (2013) Tracking an RGB-D camera using points and planes. In: IEEE International conference on computer vision workshops, Sydney, pp 51–58. doi:10.1109/ICCVW.2013.14

  4. Bergh M, Boix X, Roig G, Capitani B, Gool L (2012) SEEDS: superpixels extracted via energy-driven sampling. In: European Conference on computer vision, lecture notes in computer science, vol 7578, pp 13–26. doi:10.1007/978-3-642-33786-4_2

  5. Cheung G, Kim WS, Ortega A, Ishida J, Kubota A (2011) Depth map coding using graph based transform and transform domain sparsification. In: International workshop on multimedia signal processing, pp 1–6. doi:10.1109/MMSP.2011.6093810

  6. Dodgson N (2005) Autostereoscopic 3D displays. Computer 38(8):31–36. doi:10.1109/MC.2005.252

    Article  Google Scholar 

  7. Farid M, Lucenteforte M, Grangetto M (2015) Panorama view with spatiotemporal occlusion compensation for 3D video coding. IEEE Trans Image Process 24(1):205–219. doi:10.1109/TIP.2014.2374533

    Article  MathSciNet  Google Scholar 

  8. Fehn C (2004) Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In: Proceedings of SPIE 5291, Stereoscopic displays and virtual reality systems, pp 93–104. doi:10.1117/12.524762

  9. Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. doi:10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  10. Freeman H (1961) On the coding of arbitrary geometric configurations. IRE Trans Electron Comput EC-10:260–268. doi:10.1109/TEC.1961.5219197

    Article  MathSciNet  Google Scholar 

  11. Gallup D, Frahm JM, Pollefeys M (2010) Piecewise planar and non-planar stereo for urban scene reconstruction. In: IEEE conference on computer vision and pattern recognition, San Francisco, pp 1418–1425. doi:10.1109/CVPR.2010.5539804

  12. Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from RGB-D images. In: IEEE conference on computer vision and pattern recognition, Portland, pp 564–571. doi:10.1109/CVPR.2013.79

  13. Hanhart P, Ebrahimi T (2012) Quality assessment of a stereo pair formed from decoded and synthesized views using objective metrics. In: 3DTV-Conference: the true vision - capture, transmission and display of 3D video, pp 1–4. doi:10.1109/3DTV.2012.6365478

  14. Hirschmuller H, Scharstein D (2007) Evaluation of cost functions for stereo matching. In: IEEE conference on computer vision and pattern recognition, pp 1–8. doi:10.1109/CVPR.2007.383248

  15. Jager F (2011) Contour-based segmentation and coding for depth map compression. In: Visual communications and image processing, pp 1–4. doi:10.1109/VCIP.2011.6115989

  16. Kim WS, Ortega A, Lai P, Tian D (2015) Depth map coding optimization using rendered view distortion for 3D video coding. IEEE Trans Image Process 24 (11):3534–3545. doi:10.1109/TIP.2015.2447737

    Article  MathSciNet  Google Scholar 

  17. Kowdle A, Sinha S, Szeliski R (2012) Multiple view object cosegmentation using appearance and stereo cues. In: European conference on computer vision, Firenze, pp 789–803. doi:10.1007/978-3-642-33715-4_57

  18. Lei J, Li S, Zhu C, Sun M, Hou C (2015) Depth coding based on depth-texture motion and structure similarities. IEEE Trans Circuits Syst Video Technol 25(2):275–286. doi:10.1109/TCSVT.2014.2335471

    Article  Google Scholar 

  19. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: International conference on computer vision, Vancouver, vol 2, pp 416–423. doi:10.1109/ICCV.2001.937655

  20. Maugey T, Ortega A, Frossard P (2015) Graph-based representation for multiview image geometry. IEEE Trans Image Process 24(5):1573–1586. doi:10.1109/TIP.2015.2400817

    Article  MathSciNet  Google Scholar 

  21. Merkle P, Smolic A, Muller K, Wiegand T (2007a) Efficient prediction structures for multiview video coding. IEEE Trans Circuits Syst Video Technol 17 (11):1461–1473. doi:10.1109/TCSVT.2007.903665

  22. Merkle P, Smolic A, Muller K, Wiegand T (2007b) Multi-view video plus depth representation and coding. In: IEEE international conference on image processing, San Antonio, vol 1, pp 201–204. doi:10.1109/ICIP.2007.4378926

  23. Merkle P, Morvan Y, Smolic A, Farin D, Muller K, de With P, Wiegand T (2008) The effect of depth compression on multiview rendering quality. In: 3DTV-conference: the true vision - capture, transmission and display of 3D video, pp 245–248. doi:10.1109/3DTV.2008.4547854

  24. Merkle P, Muller K, Marpe D, Wiegand T (2015) Depth intra coding for 3D video based on geometric primitives. IEEE Trans Circuits Syst Video Technol PP (99):1–1. doi:10.1109/TCSVT.2015.2407791

    Google Scholar 

  25. Milani S, Zanuttigh P, Zamarin M, Forchhammer S (2011) Efficient depth map compression exploiting segmented color data. In: IEEE international conference on multimedia and expo, pp 1–6. doi:10.1109/ICME.2011.6011969

  26. Muller K, Merkle P, Wiegand T (2011) 3-D video representation using depth maps. Proc IEEE 99(4):643–656. doi:10.1109/JPROC.2010.2091090

    Article  Google Scholar 

  27. Muller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee F, Tech G, Winken M, Wiegand T (2013) 3D high-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366–3378. doi:10.1109/TIP.2013.2264820

    Article  MathSciNet  Google Scholar 

  28. Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T (2004) Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits Syst Mag 4(1):7–28. doi:10.1109/MCAS.2004.1286980

    Article  Google Scholar 

  29. Ozkalayci B, Alatan A (2014) 3D planar representation of stereo depth images for 3DTV applications. IEEE Trans Image Process 23(12):5222–5232. doi:10.1109/TIP.2014.2360452

    Article  MathSciNet  Google Scholar 

  30. Peng J, Kim CS, Jay Kuo CC (2005) Technologies for 3D mesh compression: a survey. J Vis Commun Image Represent 16(6):688–733. doi:10.1016/j.jvcir.2005.03.001

    Article  Google Scholar 

  31. Rabbani T, van den Heuvel FA, Vosselman G (2006) Segmentation of point clouds using smoothness constraint. In: ISPRS commission V cymposium ‘image engineering and vision metrology’, pp 248–253

  32. Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576. doi:10.1109/83.841934

  33. Scharstein D, Pal C (2007) Learning conditional random fields for stereo. In: IEEE conference on computer vision and pattern recognition, pp 1–8. doi:10.1109/CVPR.2007.383191

  34. Scharstein D, Szeliski R (2003) High-accuracy stereo depth maps using structured light. In: IEEE conference on computer vision and pattern recognition, vol 1, pp 195–202. doi:10.1109/CVPR.2003.1211354

  35. Shahriyar S, Murshed M, Ali M, Paul M (2014) Efficient coding of depth map by exploiting temporal correlation. In: International conference on digital image computing: techniques and applications, pp 1–8. doi:10.1109/DICTA.2014.7008105

  36. Shao F, Lin W, Jiang G, Yu M, Dai Q (2014) Depth map coding for view synthesis based on distortion analyses. IEEE J. Emerging Sel Top Circuits Syst 4 (1):106–117. doi:10.1109/JETCAS.2014.2298314

    Article  Google Scholar 

  37. Shen L, Liu Z, Zhang X, Zhao W, Zhang Z (2013) An effective CU size decision method for HEVC encoders. IEEE Trans Multimedia 15(2):465–470. doi:10.1109/TMM.2012.2231060

    Article  Google Scholar 

  38. Smolic A, Mueller K, Merkle P, Fehn C, Kauff P, Eisert P, Wiegand T (2006) 3D video and free viewpoint video - technologies, applications and MPEG standards. In: IEEE international conference on multimedia and expo, Toronto, pp 2161–2164. doi:10.1109/ICME.2006.262683

  39. Sullivan G, Ohm J, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22 (12):1649–1668. doi:10.1109/TCSVT.2012.2221191

    Article  Google Scholar 

  40. Tech G, Schwarz H, Muller K, Wiegand T (2012) 3D video coding using the synthesized view distortion change. In: Picture coding symposium, pp 25–28. doi:10.1109/PCS.2012.6213277

  41. Vetro A, Wiegand T, Sullivan G (2011) Overview of the stereo and multiview video coding extensions of the H.264/MPEG-4 AVC standard. Proc IEEE 99(4):626–642. doi:10.1109/JPROC.2010.2098830

    Article  Google Scholar 

  42. Vilaplana V, Marqués F, Salembier P (2008) Binary partition trees for object detection. IEEE Trans Image Process 17(11):2201–2216. doi:10.1109/TIP.2008.2002841

    Article  MathSciNet  Google Scholar 

  43. Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014a) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576. doi:10.1109/LSP.2014.2310494

  44. Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014b) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089. doi:10.1109/TCSVT.2014.2335852

Download references

Acknowledgments

This work has been developed in the framework of the project BIGGRAPH-TEC2013-43935-R, financed by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Maceira Duch.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duch, M.M., Morros, JR. & Ruiz-Hidalgo, J. Depth map compression via 3D region-based representation. Multimed Tools Appl 76, 13761–13784 (2017). https://doi.org/10.1007/s11042-016-3727-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3727-1

Keywords

Navigation