Multimedia Tools and Applications

, Volume 77, Issue 17, pp 22475–22488 | Cite as

RGB-D joint modelling with scene geometric information for indoor semantic segmentation

  • Hong LiuEmail author
  • Wenshan Wu
  • Xiangdong Wang
  • Yueliang Qian


This paper focuses on the problem of RGB-D semantic segmentation for indoor scenes. We introduce a novel gravity direction detection method based on vertical lines fitting combined 2D vision information and 3D geometric information to improve the original HHA depth encoding. Then to fuse two-stream networks of deep convolutional networks from RGB and depth encoding, we propose a joint modelling method by learning a weighted summing layer to fuse the prediction results. Finally, to refine the pixel-wise score maps, we adopt fully-connected CRF as a post-processing and propose a pairwise potential function combined normal kernel to explore geometric information. Experimental results show our proposed approach achieves state-of-the-art performance of RGB-D semantic segmentation on public dataset.


RGB-D Gravity direction Semantic segmentation CRF 



This work is supported in part by Beijing Natural Science Foundation: 4142051.


  1. 1.
    Anand A, Koppula HS, Joachims T, Saxena A (2013) Contextually guided semantic labeling and search for three-dimensional point clouds. Int J Robot Res 32(1):19–34CrossRefGoogle Scholar
  2. 2.
    Banica D, Sminchisescu C (2015) Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in rgb-d images. In: Computer Vision and Pattern RecognitionGoogle Scholar
  3. 3.
    Bingjie W, Junpeng Z, Chunjie W (2014) Spatial straightness error evaluation based on three-dimensional least squares method. Journal of Beijing University of Aeronautics and Astronautics 40:1477–1480 (in Chinese)Google Scholar
  4. 4.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comp Sci 357–361.
  5. 5.
    Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. In: international conference on learning Representa- tions. Number arXiv preprint arXiv:1301.3572Google Scholar
  6. 6.
    Deng Z, Todorovic S, Latecki L J (2015) Semantic segmentation of rgbd images with mutex constraints. In: ICCVGoogle Scholar
  7. 7.
    Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658Google Scholar
  8. 8.
    Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929CrossRefGoogle Scholar
  9. 9.
    Filliat D, Battesti E, Bazeille S, et al (2012) RGBD object recognition and visual texture classification for indoor semantic mapping. Technologies for Practical Robot Applications (TePRA), 2012 I.E. International Conference on IEEE, pp. 127–132Google Scholar
  10. 10.
    Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In: CVPR. 564–571Google Scholar
  11. 11.
    Gupta S, Girshick R, Arbelaez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCVGoogle Scholar
  12. 12.
    He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: rgbd semantic segmentation using spatio-temporal data-driven pooling. In CVPR, 7158–7167Google Scholar
  13. 13.
    Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi- supervised semantic segmentation. NIPS 2015Google Scholar
  14. 14.
    Khan S, Bennamoun M, Sohel F, Togneri R (2014) Geometry driven semantic labeling of indoor scenes. ECCV 2014 8689:679–694Google Scholar
  15. 15.
    Koppula H S, Anand A, Joachims T, et al (2011) Semantic labeling of 3D point clouds for indoor scenes. International Conference on Neural Information Processing Systems. Curran Associates Inc, pp. 244–252Google Scholar
  16. 16.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In NIPSGoogle Scholar
  17. 17.
    Li Z, Gan Y, Liang X, et al (2016) LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling. In: European Conference on Computer Vision. Springer International Publishing, 541–557Google Scholar
  18. 18.
    Liu F, Lin G, Shen C (2016) Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss. arXiv preprint arXiv:1601.07649Google Scholar
  19. 19.
    Long J, Shelhamer E, and Darrell T (2015) Fully convolutional networks for semantic segmentation, In CVPR, pp. 3431–3440Google Scholar
  20. 20.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmen- tation. arXiv preprint arXiv:1505.04366Google Scholar
  21. 21.
    Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: features and algorithms. In: CVPR 2759–2766Google Scholar
  22. 22.
    Shuai B, Zuo Z, Wang B, et al (2016) DAG-recurrent neural networks for scene labeling. In: Computer Vision and Pattern Recognition. IEEE, pp. 3620–3629Google Scholar
  23. 23.
    Shuai B, Zuo Z, Wang G, Wang B (2016) Scene parsing with integration of parametric and non-parametric models. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 25(5):2379–2391MathSciNetCrossRefGoogle Scholar
  24. 24.
    Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: ICCV Workshops 601–608Google Scholar
  25. 25.
    Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV, pp. 746–760Google Scholar
  26. 26.
    Simonyan K and Zisserman A (2014) Very deep convolu- tional networks for large-scale image recognition. CoRR, abs/1409.1556Google Scholar
  27. 27.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A (2014) Going deeper with convolutions. CoRR, abs/1409.4842Google Scholar
  28. 28.
    Wang J, Wang Z, Tao D, et al (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: European Conference on Computer Vision. Springer International Publishing, pp. 664–679Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations