Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds

  • Shuai Zheng
  • Victor Adrian Prisacariu
  • Melinos Averkiou
  • Ming-Ming Cheng
  • Niloy J. Mitra
  • Jamie Shotton
  • Philip H. S. Torr
  • Carsten Rother
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9358)


Man-made objects, such as chairs, often have very large shape variations, making it challenging to detect them. In this work we investigate the task of finding particular object shapes from a single depth image. We tackle this task by exploiting the inherently low dimensionality in the object shape variations, which we discover and encode as a compact shape space. Starting from any collection of 3D models, we first train a low dimensional Gaussian Process Latent Variable Shape Space. We then sample this space, effectively producing infinite amounts of shape variations, which are used for training. Additionally, to support fast and accurate inference, we improve the standard 3D object category proposal generation pipeline by applying a shallow convolutional neural network-based filtering stage. This combination leads to considerable improvements for proposal generation, in both speed and accuracy. We compare our full system to previous state-of-the-art approaches, on four different shape classes, and show a clear improvement.


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR, pp. 328–335 (2014)Google Scholar
  3. 3.
    Aubry, M., Maturana, D., Efros, A.A., Russel, B., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models. In: CVPR, pp. 3762–3769 (2014)Google Scholar
  4. 4.
    Averkiou, M., Kim, V., Zheng, Y., Mitra, N.J.: Shapesynth: parameterizing model collections for coupled shape exploration and synthesis. Comput. Graph. Forum 33(2), 125–134 (2014)CrossRefGoogle Scholar
  5. 5.
    Brown, M., Hua, G., Winder, S.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 43–57 (2011)CrossRefGoogle Scholar
  6. 6.
    Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1312–1328 (2012)CrossRefGoogle Scholar
  7. 7.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: BING: Binarized normed gradients for objectness estimation at 300 fps. In: CVPR, pp. 3286–3293 (2014)Google Scholar
  8. 8.
    Chiu, H.P., Kaelbling, L.P., Lozano-Perez, T.: Virtual training for multi-view object class recognition. In: CVPR, pp. 1–8 (2007)Google Scholar
  9. 9.
    Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.: Dense reconstruction using 3d object shape priors. In: CVPR, pp. 1288–1295 (2013)Google Scholar
  10. 10.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  11. 11.
    Endres, I., Hoiem, D.: Category independent object proposals. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 575–588. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  12. 12.
    Endres, I., Hoiem, D.: Category-independent object proposals with diverse ranking. IEEE Trans. PAMI 36(2), 222–234 (2014)CrossRefGoogle Scholar
  13. 13.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  15. 15.
    Gupta, S., Arbeláez, P.A., Girshick, R.B., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: CVPR, pp. 4731–4740 (2015)Google Scholar
  16. 16.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 345–360. Springer, Heidelberg (2014) Google Scholar
  17. 17.
    Jia, Y.: Caffe: An open source convolutional architecture for fast feature embedding (2013).
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556v2
  19. 19.
    Karpathy, A., Miller, S., Li, F.F.: Object discovery in 3d scenes via shape analysis. In: ICRA, pp. 2088–2095 (2013)Google Scholar
  20. 20.
    Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: CVPR, pp. 506–513 (2004)Google Scholar
  21. 21.
    Kim, Y.M., Mitra, N.J., Huang, Q., Guibas, L.: Guided real-time scanning of indoor objects. Comput. Graph. Forum (Proc. Pacific Graph.) 32, 177–186 (2013)CrossRefGoogle Scholar
  22. 22.
    Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 725–739. Springer, Heidelberg (2014) Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  24. 24.
    de La Gorce, M., Paragios, N., Fleet, D.: Model-based hand tracking with texture, shading and self-occlusions. In: CVPR, pp. 1–8 (2008)Google Scholar
  25. 25.
    Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. JMLR 6, 1783–1816 (2005)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Lawrence, N.D.: Gaussian process latent variable models for visualisation of high dimensional data. In: NIPS, pp. 329–336 (2003)Google Scholar
  27. 27.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)Google Scholar
  28. 28.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2013)Google Scholar
  29. 29.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Multi-view priors for learning detectors from sparse viewpoint data (2014). arXiv:1312.6095
  30. 30.
    Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D Pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  31. 31.
    Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, pp. 1297–1304 (2011)Google Scholar
  32. 32.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  33. 33.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1573–1585 (2014)CrossRefGoogle Scholar
  34. 34.
    Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 634–651. Springer, Heidelberg (2014) Google Scholar
  35. 35.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1372–1384 (2006)CrossRefzbMATHGoogle Scholar
  36. 36.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions (2014). arXiv:1409.4842
  37. 37.
    Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand Pose estimation using semi-supervised transductive regression forests. In: ICCV, pp. 3224–3231 (2013)Google Scholar
  38. 38.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRefGoogle Scholar
  39. 39.
    Zhang, Z., Warrell, J., Torr, P.H.: Proposal generation for object detection using cascaded ranking SVMS. In: CVPR, pp. 1497–1504 (2011)Google Scholar
  40. 40.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Shuai Zheng
    • 1
  • Victor Adrian Prisacariu
    • 1
  • Melinos Averkiou
    • 2
  • Ming-Ming Cheng
    • 1
    • 5
  • Niloy J. Mitra
    • 2
  • Jamie Shotton
    • 3
  • Philip H. S. Torr
    • 1
  • Carsten Rother
    • 4
  1. 1.University of OxfordOxfordUK
  2. 2.University College LondonLondonUK
  3. 3.Microsoft ResearchCambridgeUK
  4. 4.TU DresdenDresdenGermany
  5. 5.Nankai UniversityTianjinChina

Personalised recommendations