6D Pose Estimation of 3D Objects in Scenes with Mutual Similarities and Occlusions

  • Tiandi Chen
  • Lifeng SunEmail author
Conference paper
Part of the Lecture Notes in Computational Vision and Biomechanics book series (LNCVB, volume 30)


Estimation of six degrees of freedom (6DoF) attitude of rigid bodies is an essential issue in such fields as robotics and virtual reality. This paper proposed a method that could accurately estimate 6DoF attitude of known rigid bodies with RGB and RGB-D input image data. Additionally, the method could also work well in the scenes in which objects exhibit mutual similarities and occlusion. As one of the contributions made by this paper, a modularized assembly line method was proposed, which integrated deep learning and multi-view geometry method. At first, a neural network for instance segmentation was used to identify the general locations of known objects in the images and give the bounding boxes and masks. Then 6DoF attitude was estimated roughly according to the local features of RGB-D images and templates. Finally, purely geometric method was used to refine the estimation. Another contribution of this paper was the correction of misclassification with the help of some information reserved in the process of training the network. The proposed method achieves a superior performance on a challenging public dataset.


Object detection Pose estimation Multi-view geometry 


  1. 1.
    Lowe DG (2001) Local feature view clustering for 3D object recognition. In: CVPRGoogle Scholar
  2. 2.
    Hinterstoisser S, Lepetit V, Rajkumar N, Konolige K (2016) Going further with point pair features. In: ECCVGoogle Scholar
  3. 3.
    Michel F, Kirillov A, Brachmann E, Krull A, Gumhold S, Savchynskyy B, Rother C (2017) Global hypothesis generation for 6D object pose estimation. In: CVPRGoogle Scholar
  4. 4.
    Hodan T et al (2017) T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEEGoogle Scholar
  5. 5.
    Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: ECCVGoogle Scholar
  6. 6.
    Brachmann E, Michel F, Krull A, Yang MY, Gumhold S, Rother C (2016) Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: CVPRGoogle Scholar
  7. 7.
    Huttenlocher D, Klanderman G, Rucklidge W (1993) Comparing images using the Hausdorff distance. IEEE Trans PAMIGoogle Scholar
  8. 8.
    WA MCR Kinect for Xbox 360Google Scholar
  9. 9.
    Hinterstoisser S, Holzer S, Cagniart C, Ilic S, Konolige K, Navab N, Lepetit V (2011) Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCVGoogle Scholar
  10. 10.
    Rios-Cabrera R, Tuytelaars T (2013) Discriminatively trained templates for 3D object detection: a real time scalable approach. In: ICCVGoogle Scholar
  11. 11.
    Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: Efficient and robust 3D object recognition. In: CVPRGoogle Scholar
  12. 12.
    Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: ICCVGoogle Scholar
  13. 13.
    Kehl W, Manhardt F, Tombari F, Ilic S, Navab N (2017) SSD6D: making RGB-based 3D detection and 6D pose estimation great again. In: ICCVGoogle Scholar
  14. 14.
    Tejani A, Tang D, Kouskouridas R, Kim T-K (2014) Latent Class Hough forests for 3D object detection and pose estimation. In: ECCVGoogle Scholar
  15. 15.
    Kehl W, Milletari F, Tombari F, Ilic S, Navab N (2016) Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: ECCVGoogle Scholar
  16. 16.
    Brachmann E, Krull A, Michel F, Gumhold S, Shotton J, Rother C (2014) Learning 6D object pose estimation using 3D object coordinates. In: ECCVGoogle Scholar
  17. 17.
    Rad M, Lepetit V (2017) BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: ICCV. 1Google Scholar
  18. 18.
    Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: CVPRGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations