Joint 3D Object and Layout Inference from a Single RGB-D Image
Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.
References
- 1.Aubry, M., Maturana, D., Efros, A., Russell, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: CVPR (2014)Google Scholar
- 2.Blake, A., Kohli, P., Rother, C.: Markov Random Fields for Vision and Image Processing. MIT Press, Cambridge (2011)MATHGoogle Scholar
- 3.Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. PAMI 34(7), 1312–1328 (2012)CrossRefGoogle Scholar
- 4.Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3D geometric phrases. In: CVPR (2013)Google Scholar
- 5.Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
- 6.Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A.: Cascade object detection with deformable part models. In: CVPR (2010)Google Scholar
- 7.Felzenszwalb, P.F., Mcauley, J.J.: Fast inference with min-sum matrix product. PAMI 33(12), 2549–2554 (2011)CrossRefGoogle Scholar
- 8.Fouhey, D.F., Gupta, A., Hebert, M.: Data-driven 3D primitives for single image understanding. In: ICCV (2013)Google Scholar
- 9.Gilks, W., Richardson, S.: Markov Chain Monte Carlo in Practice. Chapman & Hall, London (1995)MATHGoogle Scholar
- 10.Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
- 11.Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR (2015)Google Scholar
- 12.Guo, R., Hoiem, D.: Support surface prediction in indoor scenes. In: ICCV (2013)Google Scholar
- 13.Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: CVPR (2013)Google Scholar
- 14.Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 345–360. Springer, Heidelberg (2014) Google Scholar
- 15.Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: ICCV (2009)Google Scholar
- 16.Jia, Z., Gallagher, A., Saxena, A., Chen, T.: 3D-based reasoning with blocks, support, and stability. In: CVPR (2013)Google Scholar
- 17.Jiang, H., Xiao, J.: A linear approach to matching cuboids in RGB-D images. In: CVPR (2013)Google Scholar
- 18.Kim, B., Xu, S., Savarese, S.: Accurate localization of 3D objects from RGB-D data using segmentation hypotheses. In: CVPR (2013)Google Scholar
- 19.Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. IJCV 82(3), 302–324 (2009)CrossRefGoogle Scholar
- 20.Kohli, P., Kumar, M.P.: Energy minimization for linear envelope MRFs. In: CVPR (2010)Google Scholar
- 21.Komodakis, N., Paragios, N.: Beyond pairwise energies: efficient optimization for higher-order MRFs. In: CVPR (2009)Google Scholar
- 22.Lee, D., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: NIPS (2010)Google Scholar
- 23.Lim, J.J., Khosla, A., Torralba, A.: FPM: fine pose parts-based model with 3D CAD models. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 478–493. Springer, Heidelberg (2014) Google Scholar
- 24.Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing IKEA objects: fne pose estimation. In: ICCV (2013)Google Scholar
- 25.Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGB-D cameras. In: ICCV (2013)Google Scholar
- 26.Mansinghka, V., Kulkarni, T., Perov, Y., Tenenbaum, J.: Approximate bayesian image interpretation using generative probabilistic graphics programs. In: NIPS 2013 (2013)Google Scholar
- 27.Mcauley, J.J., Caetano, T.S.: Faster algorithms for max-product message-passing. JMLR 12, 1349–1388 (2011)MathSciNetMATHGoogle Scholar
- 28.Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)Google Scholar
- 29.Menze, M., Heipke, C., Geiger, A.: Joint 3d estimation of vehicles and scene flow. In: ISA (2015)Google Scholar
- 30.Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking. In: ISMAR (2011)Google Scholar
- 31.Potetz, B., Lee, T.S.: Efficient belief propagation for higher-order cliques using linear constraint nodes. CVIU 112(1), 39–54 (2008)Google Scholar
- 32.Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: CVPR (2012)Google Scholar
- 33.Roberts, L.G.: Machine perception of three-dimensional solids. Ph.D. thesis, Massachusetts Institute of Technology (1963)Google Scholar
- 34.Rother, C., Kohli, P., Feng, W., Jia, J.: Minimizing sparse higher order energy functions of discrete variables. In: CVPR (2009)Google Scholar
- 35.Satkin, S., Hebert, M.: 3DNN: viewpoint invariant 3D geometry matching for scene understanding. In: ICCV (2013)Google Scholar
- 36.Schwing, A.G., Urtasun, R.: Efficient exact inference for 3D indoor scene understanding. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 299–313. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 37.Schwing, A.G., Fidler, S., Pollefeys, M., Urtasun, R.: Box in the box: joint 3D layout and object reasoning from single images. In: ICCV (2013)Google Scholar
- 38.Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 39.Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 634–651. Springer, Heidelberg (2014) Google Scholar
- 40.Tarlow, D., Givoni, I.E., Zemel, R.S.: Hop-map: efficient message passing with high order potentials. In: AISTATS (2010)Google Scholar
- 41.Tighe, J., Niethammer, M., Lazebnik, S.: Scene parsing with object instances and occlusion ordering. In: CVPR (2014)Google Scholar
- 42.Tsai, G., Xu, C., Liu, J., Kuipers, B.: Real-time indoor scene understanding using Bayesian filtering with motion cues. In: ICCV (2011)Google Scholar
- 43.Wang, C., Komodakis, N., Paragios, N.: Markov random field modeling, inference & learning in computer vision & image understanding: a survey. CVIU 117(11), 1610–1627 (2013)Google Scholar
- 44.Yamaguchi, K., McAllester, D., Urtasun, R.: Robust monocular epipolar flow estimation. In: CVPR (2013)Google Scholar
- 45.Zhang, H., Geiger, A., Urtasun, R.: Understanding high-level semantics by modeling traffic patterns. In: ICCV (2013)Google Scholar
- 46.Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: a whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 668–686. Springer, Heidelberg (2014) Google Scholar
- 47.Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: scene understanding by reasoning geometry and physics. In: CVPR (2013)Google Scholar
- 48.Zia, M., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. PAMI 35(11), 2608–2623 (2013)CrossRefGoogle Scholar
Copyright information
Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.