Robust Instance Recognition in Presence of Occlusion and Clutter

  • Ujwal Bonde
  • Vijay Badrinarayanan
  • Roberto Cipolla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


We present a robust learning based instance recognition framework from single view point clouds. Our framework is able to handle real-world instance recognition challenges, i.e, clutter, similar looking distractors and occlusion. Recent algorithms have separately tried to address the problem of clutter [9] and occlusion [16] but fail when these challenges are combined. In comparison we handle all challenges within a single framework. Our framework uses a soft label Random Forest [5] to learn discriminative shape features of an object and use them to classify both its location and pose. We propose a novel iterative training scheme for forests which maximizes the margin between classes to improve recognition accuracy, as compared to a conventional training procedure. The learnt forest outperforms template matching, DPM [7] in presence of similar looking distractors. Using occlusion information, computed from the depth data, the forest learns to emphasize the shape features from the visible regions thus making it robust to occlusion. We benchmark our system with the state-of-the-art recognition systems [9,7] in challenging scenes drawn from the largest publicly available dataset. To complement the lack of occlusion tests in this dataset, we introduce our Desk3D dataset and demonstrate that our algorithm outperforms other methods in all settings.


Point Cloud Depth Image Object Instance Kinect Sensor Dominant Orientation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Aldoma, A., Tombari, F., Di Stefano, L., Vincze, M.: A Global Hypotheses Verification Method for 3D Object Recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 511–524. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Bonde, U., Badrinarayanan, V., Cipolla, R.: Multi Scale Shape Index for 3D Object Recognition. In: SSVM (2013)Google Scholar
  4. 4.
    Browatzki, B., Fischer, J., Graf, B., Bülthoff, H.H., Wallraven, C.: Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset. In: ICCV Workshops on Consumer Depth Cameras (2011)Google Scholar
  5. 5.
    Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer (2013)Google Scholar
  6. 6.
    Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010)Google Scholar
  7. 7.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCV (2011)Google Scholar
  10. 10.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes. In: ACCV (2013)Google Scholar
  11. 11.
    Hsiao, E., Hebert, M.: Occlusion reasoning for object detection under arbitrary viewpoint. In: CVPR (2012)Google Scholar
  12. 12.
    Huynh, D.Q.: Metrics for 3D Rotations: Comparison and Analysis. JMIV 35, 155–164 (2009)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3d object dataset: Putting the kinect to work. In: Consumer Depth Cameras for Computer Vision (2013)Google Scholar
  14. 14.
    Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough Transform and 3D SURF for Robust Three Dimensional Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA (2011)Google Scholar
  16. 16.
    Meger, D., Wojek, C., Little, J.J., Schiele, B.: Explicit Occlusion Reasoning for 3D Object Detection. In: BMVC (2011)Google Scholar
  17. 17.
    Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.W.: KinectFusion: Real-time dense surface mapping and tracking. In: ISMAR (2011)Google Scholar
  18. 18.
    Pauly, M., Keiser, R., Gross, M.H.: Multi-scale Feature Extraction on Point-sampled Surfaces. Comput. Graph. Forum 22, 281–290 (2003)CrossRefGoogle Scholar
  19. 19.
  20. 20.
    Rios-Cabrera, R., Tuytelaars, T.: Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach. In: ICCV (2013)Google Scholar
  21. 21.
    Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time Human Pose Recognition in Parts from Single Depth Images (2011)Google Scholar
  22. 22.
    Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2012)Google Scholar
  23. 23.
    Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Bootstrapping Boosted Random Ferns for discriminative and efficient object classification. Pattern Recognition 45, 3141–3153 (2012)CrossRefGoogle Scholar
  25. 25.
    Wang, T., He, X., Barnes, N.: Learning structured hough voting for joint object detection and occlusion reasoning. In: CVPR (2013)Google Scholar
  26. 26.
    Zhu, M., Derpanis, K.G., Yang, Y., Brahmbhatt, S., Zhang, M., Phillips, C., Lecce, M., Daniilidis, K.: Single Image 3D Object Detection and Pose Estimation for Grasping. In: ICRA (2014)Google Scholar
  27. 27.
    Zia, M., Stark, M., Schindler, K.: Explicit Occlusion Modeling for 3D Object Class Representations. In: CVPR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ujwal Bonde
    • 1
  • Vijay Badrinarayanan
    • 1
  • Roberto Cipolla
    • 1
  1. 1.University of CambridgeUK

Personalised recommendations