Deep Projective 3D Semantic Segmentation

Lawin, Felix Järemo; Danelljan, Martin; Tosteberg, Patrik; Bhat, Goutam; Khan, Fahad Shahbaz; Felsberg, Michael

doi:10.1007/978-3-319-64689-3_8

Deep Projective 3D Semantic Segmentation

Felix Järemo Lawin¹⁶,
Martin Danelljan¹⁶,
Patrik Tosteberg¹⁶,
Goutam Bhat¹⁶,
Fahad Shahbaz Khan¹⁶ &
…
Michael Felsberg¹⁶

Conference paper
First Online: 28 July 2017

2897 Accesses
175 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10424))

Abstract

Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets.

In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of \(7.9 \%\), compared to the previous best approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., Ng, A.Y.: Discriminative learning of markov random fields for segmentation of 3d scan data. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20–26 June 2005, San Diego, CA, USA, pp. 169–176 (2005)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-d object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016, pp. 1933–1941 (2016). http://dx.doi.org/10.1109/CVPR.2016.213
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_23
Google Scholar
Hackel, T., Savinov, N., Ladicky, L., Wegner, J.D., Schindler, K., Pollefeys, M.: Semantic3d.net: a new large-scale point cloud classification benchmark. arXiv preprint (2017). arXiv:1704.03847
Hackel, T., Wegner, J.D., Schindle, K.: Fast semantic segmentation of 3d point clouds with strongly varying density. In: ISPRS Annals - ISPRS Congress, Prague (2016)
Google Scholar
Hackel, T., Wegner, J.D., Schindler, K.: Fast semantic segmentation of 3d point clouds with strongly varying density. ISPRS Ann. Photogram. Remote Sensing Spatial Inf. Sci. 3, 177–184 (2016). Prague, Czech Republic
Article Google Scholar
Huang, J., You, S.: Point cloud labeling using 3d convolutional neural network. In: International Conference on Pattern Recognition (ICPR) (2016)
Google Scholar
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999)
Article Google Scholar
Kähler, O., Reid, I.D.: Efficient 3d scene labeling using fields of trees. In: IEEE International Conference on Computer Vision (ICCV 2013), Sydney, Australia, 1–8 December 2013, pp. 3064–3071 (2013)
Google Scholar
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (TOG) 32(3), 29 (2013)
Article MATH Google Scholar
Kim, B., Kohli, P., Savarese, S.: 3d scene understanding by voxel-CRF. In: IEEE International Conference on Computer Vision (ICCV 2013), Sydney, Australia, 1–8 December 2013, pp. 1425–1432 (2013)
Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv preprint (2015). arXiv:1506.04579
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Martinovic, A., Knopp, J., Riemenschneider, H., Gool, L.J.V.: 3d all the way: semantic segmentation of urban scenes from start to end in 3d. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA, 7–12 June 2015, pp. 4456–4465 (2015)
Google Scholar
Maturana, D., Scherer, S.: Voxnet: a 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015)
Google Scholar
Montoya-Zegarra, J.A., Wegner, J.D., Ladický, Ľ., Schindler, K.: Mind the gap: modeling local and global context in (road) networks. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 212–223. Springer, Cham (2014). doi:10.1007/978-3-319-11752-2_17
Google Scholar
Ogniewski, J., Forssén, P.E.: Pushing the limits for view prediction in video coding. In: 12th International Conference on Computer Vision Theory and Applications (VISAPP 2017). Scitepress Digital Library, Porto, Portugal (2017)
Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNS for object classification on 3d data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June, pp. 5648–5656 (2016)
Google Scholar
Salti, S., Tombari, F., di Stefano, L.: SHOT: unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 125, 251–264 (2014)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 8–13 2014, Montreal, Quebec, Canada, pp. 568–576 (2014). http://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, New York (2010)
MATH Google Scholar
Vedaldi, A., Lenc, K.: Matconvnet - convolutional neural networks for matlab. In: Proceeding of the ACM International Conference on Multimedia (2015)
Google Scholar
Wolf, D., Prankl, J., Vincze, M.: Fast semantic segmentation of 3d point clouds using a dense CRF with learned parameters. In: IEEE International Conference on Robotics and Automation (ICRA 2015), Seattle, WA, USA, 26–30 May 2015, pp. 4867–4873 (2015)
Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2015), Boston, MA, USA, 7–12 June 2015, pp. 1912–1920 (2015). http://dx.doi.org/10.1109/CVPR.2015.7298801
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Zwicker, M., Pfister, H., Van Baar, J., Gross, M.: Surface splatting. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 371–378. ACM (2001)
Google Scholar

Download references

Acknowledgements

This work has been supported by the EU’s Horizon 2020 Programme grant No 644839 (CENTAURO) and the Swedish Research Council in projects 2014-6227 (EMC2), the Swedish Foundation for Strategic Research (Smart Systems: RIT 15-0097) and the VR starting grant 2016-05543.

Author information

Authors and Affiliations

Computer Vision Lab, Department of Electrical Engineering, Linköping University, Linköping, Sweden
Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan & Michael Felsberg

Authors

Felix Järemo Lawin
View author publications
You can also search for this author in PubMed Google Scholar
Martin Danelljan
View author publications
You can also search for this author in PubMed Google Scholar
Patrik Tosteberg
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Shahbaz Khan
View author publications
You can also search for this author in PubMed Google Scholar
Michael Felsberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felix Järemo Lawin .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Lund University, Lund, Sweden
Anders Heyden
University of Southern Denmark, Odense, Denmark
Norbert Krüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lawin, F.J., Danelljan, M., Tosteberg, P., Bhat, G., Khan, F.S., Felsberg, M. (2017). Deep Projective 3D Semantic Segmentation. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10424. Springer, Cham. https://doi.org/10.1007/978-3-319-64689-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-64689-3_8
Published: 28 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64688-6
Online ISBN: 978-3-319-64689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics