Discriminatively Trained Dense Surface Normal Estimation
Conference paper
Abstract
In this work we propose the method for a rather unexplored problem of computer vision - discriminatively trained dense surface normal estimation from a single image. Our method combines contextual and segment-based cues and builds a regressor in a boosting framework by transforming the problem into the regression of coefficients of a local coding. We apply our method to two challenging data sets containing images of man-made environments, the indoor NYU2 data set and the outdoor KITTI data set. Our surface normal predictor achieves results better than initially expected, significantly outperforming state-of-the-art.
Keywords
Computer Vision Ground Truth Random Forest Visual Word Feature Representation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download
to read the full conference paper text
References
- 1.Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
- 2.Ladicky, L., Shi, J., Pollefeys, M.: Pulling things out of perspective. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
- 3.Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop on scene interpretation. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 4.Saxena, A., Chung, S.H., Ng, A.Y.: 3-D Depth Reconstruction from a Single Still Image. International Journal of Computer Vision (2007)Google Scholar
- 5.Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. Transactions on Pattern Analysis and Machine Intelligence (2009)Google Scholar
- 6.Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
- 7.Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 8.Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
- 9.Horn, B.K.P., Brooks, M.J. (eds.): Shape from Shading. MIT Press (1989)Google Scholar
- 10.Mallick, S.P., Zickler, T.E., Kriegman, D.J., Belhumeur, P.N.: Beyond lambert: reconstructing specular surfaces using color. In: Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
- 11.Ikehata, S., Aizawa, K.: Photometric stereo using constrained bivariate regression for general isotropic surfaces. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
- 12.Fouhey, D., Gupta, A., Hebert, M.: Data-driven 3d primitives for single image understanding. In: International Conference on Computer Vision (2013)Google Scholar
- 13.Hoiem, D., Efros, A.A., Hebert, M.: Recovering Surface Layout from an Image. International Journal of Computer Vision (2007)Google Scholar
- 14.Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: Image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 15.Delage, E., Lee, H., Ng, A.: A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image. In: Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
- 16.Barinova, O., Konushin, V., Yakubenko, A., Lee, K., Lim, H., Konushin, A.: Fast Automatic Single-View 3-d Reconstruction of Urban Scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 100–113. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 17.Lee, D.C., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
- 18.Flint, A., Mei, C., Reid, I., Murray, D.: Growing semantically meaningful models for visual SLAM. In: Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
- 19.Flint, A., Mei, C., Murray, D., Reid, I.: A dynamic programming approach to reconstructing building interiors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 394–407. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 20.Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. Transactions on Pattern Analysis and Machine Intelligence (2002)Google Scholar
- 21.Shi, J., Malik, J.: Normalized cuts and image segmentation. Transactions on Pattern Analysis and Machine Intelligence (2000)Google Scholar
- 22.Zhang, Y., Hartley, R.I., Mashford, J., Burn, S.: Superpixels via pseudo-boolean optimization. In: International Conference on Computer Vision (2011)Google Scholar
- 23.Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. Transactions on Pattern Analysis and Machine Intelligence (2012)Google Scholar
- 24.Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: textonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 25.Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 26.Shotton, J., Fitzgibbon, A., Cook, M., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
- 27.Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical CRFs for object class image segmentation. In: International Conference on Computer Vision (2009)Google Scholar
- 28.Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 29.Yang, L., Meer, P., Foran, D.J.: Multiple class segmentation using a unified framework over mean-shift patches. In: Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
- 30.Batra, D., Sukthankar, R., Tsuhan, C.: Learning class-specific affinities for image labelling. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 31.Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 32.Boix, X., Cardinal, G., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzalez, J.: Harmony potentials: Fusing global and local scale for semantic image segmentation. International Journal on Computer Vision (2011)Google Scholar
- 33.Guyon, I., Boser, B., Vapnik, V.: Automatic capacity tuning of very large vc-dimension classifiers. In: Advances in Neural Information Processing Systems (1993)Google Scholar
- 34.Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 35.Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 36.Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic Segmentation with Second-Order Pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 37.Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: International Conference on Computer Vision (2007)Google Scholar
- 38.Pantofaru, C., Schmid, C., Hebert, M.: Object recognition by integrating multiple image segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 481–494. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 39.Breiman, L.: Random forests. In: Machine Learning (2001)Google Scholar
- 40.Friedman, J., Hastie, T., Tibshirani, R.: Additive Logistic Regression: a Statistical View of Boosting. The Annals of Statistics (2000)Google Scholar
- 41.Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science (2000)Google Scholar
- 42.Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. In: Advances in Neural Information Processing Systems (2009)Google Scholar
- 43.Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
- 44.Torralba, A., Murphy, K., Freeman, W.: Sharing features: efficient boosting procedures for multiclass object detection. In: Conference on Computer Vision and Pattern Recognition (2004)Google Scholar
- 45.Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision (2001)Google Scholar
- 46.Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004)Google Scholar
- 47.Hussain, S.u., Triggs, B.: Visual Recognition Using Local Quantized Patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 716–729. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 48.Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
- 49.van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 50.Bredies, K., Kunisch, K., Pock, T.: Total Generalized Variation. SIAM Journal on Imaging Sciences 3, 492–526 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
- 51.Chambolle, A., Pock, T.: A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. Journal of Mathematical Imaging and Vision (2010)Google Scholar
- 52.Urtasun, R., Fergus, R., Hoiem, D., Torralba, A., Geiger, A., Lenz, P., Silberman, N., Xiao, J., Fidler, S.: Reconstruction Meets Recognition Challenge (2013), http://ttic.uchicago.edu/~rurtasun/rmrc/
Copyright information
© Springer International Publishing Switzerland 2014