Abstract
In this paper we present a novel street scene semantic recognition framework, which takes advantage of 3D point clouds captured by a high-definition LiDAR laser scanner. An important problem in object recognition is the need for sufficient labeled training data to learn robust classifiers. In this paper we show how to significantly reduce the need for manually labeled training data by reduction of scene complexity using non-supervised ground and building segmentation. Our system first automatically segments grounds point cloud, this is because the ground connects almost all other objects and we will use a connect component based algorithm to oversegment the point clouds. Then, using binary range image processing building facades will be detected. Remained point cloud will grouped into voxels which are then transformed to super voxels. Local 3D features extracted from super voxels are classified by trained boosted decision trees and labeled with semantic classes e.g. tree, pedestrian, car, etc. The proposed method is evaluated both quantitatively and qualitatively on a challenging fixed-position Terrestrial Laser Scanning (TLS) Velodyne data set and two Mobile Laser Scanning (MLS), Paris-rue-Madam and NAVTEQ True databases. Robust scene parsing results are reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009, pp. 1972–1979. IEEE (2009)
Csurka, G., Perronnin, F.: A simple high performance approach to semantic segmentation. In: BMVC, pp. 1–10 (2008)
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vision 75, 151–172 (2007)
Floros, G., Leibe, B.: Joint 2d–3d temporally consistent semantic segmentation of street scenes. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2823–2830. IEEE (2012)
Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009)
Lu, W.L., Murphy, K.P., Little, J.J., Sheffer, A., Fu, H.: A hybrid conditional random field for estimating the underlying ground surface from airborne lidar data. IEEE Trans. Geosci. Remote Sens. 47, 2913–2922 (2009)
Hernández, J., Marcotegui, B., et al.: Filtering of artifacts and pavement segmentation from mobile lidar data. In: ISPRS Workshop Laserscanning 2009 (2009)
Zhou, Y., Yu, Y., Lu, G., Du, S.: Super-segments based classification of 3d urban street scenes. Int. J. Adv. Rob. Syst. 9, 1–8 (2012)
Johnson, A.: Spin-Images: A Representation for 3-D Surface Matching. Ph.D. thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (1997)
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3 d shape descriptors. In: Symposium on Geometry Processing, vol. 6 (2003)
Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Computer Graphics Forum, vol. 28, pp. 1383–1392. Wiley Online Library (2009)
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. (TOG) 21, 807–832 (2002)
Knopp, J., Prasad, M., Van Gool, L.: Orientation invariant 3d object classification using hough transform based methods. In: Proceedings of the ACM Workshop on 3D Object Retrieval, pp. 15–20. ACM (2010)
Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockville (1982)
Klasing, K., Althoff, D., Wollherr, D., Buss, M.: Comparison of surface normal estimation methods for range sensing applications. In: IEEE International Conference on Robotics and Automation, 2009, ICRA 2009, pp. 3206–3211. IEEE (2009)
Zhang, C., Wang, L., Yang, R.: Semantic segmentation of urban scenes using dense depth maps. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 708–721. Springer, Heidelberg (2010)
Babahajiani, P., Fan, L., Gabbouj, M.: Semantic parsing of street scene images using 3d lidar point cloud. In: Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, vol. 13, pp. 714–721 (2013)
Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 686–693. IEEE (2009)
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, adaboost and bregman distances. Mach. Learn. 48, 253–285 (2002)
Lai, K., Fox, D.: Object recognition in 3d point clouds using web data and domain adaptation. Int. J. Rob. Res. 29, 1019–1037 (2010)
Serna, A., Marcotegui, B.: Attribute controlled reconstruction and adaptive mathematical morphology. In: Hendriks, C.L.L., Borgefors, G., Strand, R. (eds.) ISMM 2013. LNCS, vol. 7883, pp. 207–218. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Babahajiani, P., Fan, L., Gabbouj, M. (2015). Object Recognition in 3D Point Cloud of Urban Street Scene. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9008. Springer, Cham. https://doi.org/10.1007/978-3-319-16628-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-16628-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16627-8
Online ISBN: 978-3-319-16628-5
eBook Packages: Computer ScienceComputer Science (R0)