Joint Spatio-temporal Depth Features Fusion Framework for 3D Structure Estimation in Urban Environment

Nawaf, Mohamad Motasem; Trémeau, Alain

doi:10.1007/978-3-642-33885-4_53

Mohamad Motasem Nawaf¹⁹ &
Alain Trémeau¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7585))

Included in the following conference series:

European Conference on Computer Vision

4104 Accesses
1 Citations

Abstract

We present a novel approach to improve 3D structure estimation from an image stream in urban scenes. We consider a particular setup where the camera is installed on a moving vehicle. Applying traditional structure from motion (SfM) technique in this case generates poor estimation of the 3d structure due to several reasons such as texture-less images, small baseline variations and dominant forward camera motion. Our idea is to introduce the monocular depth cues that exist in a single image, and add time constraints on the estimated 3D structure. We assume that our scene is made up of small planar patches which are obtained using over-segmentation method, and our goal is to estimate the 3D positioning for each of these planes. We propose a fusion framework that employs Markov Random Field (MRF) model to integrate both spatial and temporal depth information. An advantage of our model is that it performs well even in the absence of some depth information. Spatial depth information is obtained through a global and local feature extraction method inspired by Saxena et al. [1]. Temporal depth information is obtained via sparse optical flow based structure from motion approach. That allows decreasing the estimation ambiguity by forcing some constraints on camera motion. Finally, we apply a fusion scheme to create unique 3D structure estimation.

Download to read the full chapter text

Chapter PDF

Sparse Depth Calculation Using Real-Time Key-Point Detection and Structure from Motion for Advanced Driver Assist Systems

Efficient multi-plane extraction from massive 3D points for modeling large-scale urban scenes

Article 02 March 2018

A Fast 3D Indoor-Localization Approach Based on Video Queries

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Saxena, A., Sun, M., Ng, A.: Learning 3-d scene structure from a single still image. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)
Google Scholar
Aanæs, H.: Methods for structure from motion. IMM, Informatik og Matematisk Modellering, Danmarks Tekniske Universitet (2003)
Google Scholar
Vedaldi, A., Guidi, G., Soatto, S.: Moving forward in structure from motion. In: IEEE Conference on CVPR 2007, pp. 1–7. IEEE (2007)
Google Scholar
Saxena, A., Chung, S., Ng, A.: 3-d depth reconstruction from a single still image. International Journal of Computer Vision 76, 53–69 (2008)
Article Google Scholar
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: IEEE Conference on CVPR, pp. 1253–1260. IEEE (2010)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. International Journal of Computer Vision 59, 167–181 (2004)
Article Google Scholar
Humayun, A., Mac Aodha, O., Brostow, G.: Learning to find occlusion regions. In: IEEE Conference on CVPR 2011, pp. 2161–2168. IEEE (2011)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Article Google Scholar
Hartley, R., Zisserman, A., Ebrary, I.: Multiple view geometry in computer vision, vol. 2. Cambridge Univ. Press (2003)
Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle Adjustment – A Modern Synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) ICCV-WS 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000)
Chapter Google Scholar
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
Lindeberg, T., Garding, J.: Shape from texture from a multi-scale perspective. In: Fourth International Conference on Computer Vision, pp. 683–691. IEEE (1993)
Google Scholar
Torralba, A., Oliva, A.: Depth estimation from image structure. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1226–1238 (2002)
Article Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Automatic photo pop-up. ACM Transactions on Graphics 24, 577–584 (2005)
Article Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. International Journal of Computer Vision 75, 151–172 (2007)
Article Google Scholar
Sturgess, P., Alahari, K., Ladicky, L., Torr, P.: Combining appearance and structure from motion features for road scene understanding (2009)
Google Scholar
Bao, S., Savarese, S.: Semantic structure from motion. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2025–2032. IEEE (2011)
Google Scholar
Saxena, A.: Monocular depth perception and robotic grasping of novel objects. Stanford University (2009)
Google Scholar
Saxena, A.: State-of-the-art results of the depth prediction from single image. Website (2012), http://make3d.cs.cornell.edu/results_stateoftheart.html
Civera, J., Davison, A., Montiel, J.: Structure from Motion Using the Extended Kalman Filter, vol. 75. Springer (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire Hubert Curien UMR CNRS 5516, Université Jean Monnet, Saint-Etienne, France
Mohamad Motasem Nawaf & Alain Trémeau

Authors

Mohamad Motasem Nawaf
View author publications
You can also search for this author in PubMed Google Scholar
Alain Trémeau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Ingegneria Elettrica, Gestionale e Meccanica (DIEGM), Università degli Studi di Udine, Via delle Scienze, 208, 33100, Udine, Italy
Andrea Fusiello
IIT Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy
Vittorio Murino
Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena e Reggio Emilia, Strada Vignolege, 905, 41125, Modena, Italy
Rita Cucchiara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nawaf, M.M., Trémeau, A. (2012). Joint Spatio-temporal Depth Features Fusion Framework for 3D Structure Estimation in Urban Environment. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33885-4_53

Download citation

DOI: https://doi.org/10.1007/978-3-642-33885-4_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33884-7
Online ISBN: 978-3-642-33885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Joint Spatio-temporal Depth Features Fusion Framework for 3D Structure Estimation in Urban Environment

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Depth Calculation Using Real-Time Key-Point Detection and Structure from Motion for Advanced Driver Assist Systems

Efficient multi-plane extraction from massive 3D points for modeling large-scale urban scenes

A Fast 3D Indoor-Localization Approach Based on Video Queries

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Joint Spatio-temporal Depth Features Fusion Framework for 3D Structure Estimation in Urban Environment

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Depth Calculation Using Real-Time Key-Point Detection and Structure from Motion for Advanced Driver Assist Systems

Efficient multi-plane extraction from massive 3D points for modeling large-scale urban scenes

A Fast 3D Indoor-Localization Approach Based on Video Queries

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation