Advertisement

Category-Level 6D Object Pose Recovery in Depth Images

  • Caner SahinEmail author
  • Tae-Kyun Kim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11129)

Abstract

Intra-class variations, distribution shifts among source and target domains are the major challenges of category-level tasks. In this study, we address category-level full 6D object pose estimation in the context of depth modality, introducing a novel part-based architecture that can tackle the above-mentioned challenges. Our architecture particularly adapts the distribution shifts arising from shape discrepancies, and naturally removes the variations of texture, illumination, pose, etc., so we call it as “Intrinsic Structure Adaptor (ISA)”. We engineer ISA based on the followings: (i) “Semantically Selected Centers (SSC)” are proposed in order to define the “6D pose” at the level of categories. (ii) 3D skeleton structures, which we derive as shape-invariant features, are used to represent the parts extracted from the instances of given categories, and privileged one-class learning is employed based on these parts. (iii) Graph matching is performed during training in such a way that the adaptation/generalization capability of the proposed architecture is improved across unseen instances. Experiments validate the promising performance of the proposed architecture using both synthetic and real datasets.

Keywords

Category-level 6D object pose 3D skeleton Graph matching Privileged one-class learning 

References

  1. 1.
    Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.K.: Recovering 6D object pose and predicting next-best-view in the crowd. In: CVPR (2016)Google Scholar
  2. 2.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_35CrossRefGoogle Scholar
  3. 3.
    Krull, A., Brachmann, E., Michel, F., Yang, M.Y., Gumhold, S., Rother, C.: Learning analysis-by-synthesis for 6D pose estimation in RGB-D images. In: ICCV (2015)Google Scholar
  4. 4.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)Google Scholar
  5. 5.
    Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_2CrossRefGoogle Scholar
  6. 6.
    Michel, F., et al.: Global hypothesis generation for 6D object pose estimation. In: CVPR (2017)Google Scholar
  7. 7.
    Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided RGBD feature learning for 3D object pose estimation. In: ICCV (2017)Google Scholar
  8. 8.
    Sock, J., Kasaei, S.H., Lopes, L.S., Kim, T.K.: Multi-view 6D object pose estimation and camera motion planning using RGBD images. In: 3rd International Workshop on Recovering 6D Object Pose (2017)Google Scholar
  9. 9.
    Brachmann, E., Michel, F., Krull, A., Yang, M., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: CVPR (2016)Google Scholar
  10. 10.
    Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: CVPR (2017)Google Scholar
  11. 11.
    Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: ICCV (2017)Google Scholar
  12. 12.
    Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. arxiv (2017)Google Scholar
  13. 13.
    Song, S., Xiao, J.: Sliding shapes for 3D object detection in depth images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 634–651. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_41CrossRefGoogle Scholar
  14. 14.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: CVPR (2016)Google Scholar
  15. 15.
    Gupta, S., Arbelez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: CVPR (2015)Google Scholar
  16. 16.
    Garcia-Hernando, G., Kim, T.K.: Transition forests: learning discriminative temporal transitions for action recognition. In: CVPR (2017)Google Scholar
  17. 17.
    Shi, Z., Kim, T.K.: Learning and refining of privileged information-based RNNs for action recognition from depth sequences. In: CVPR (2017)Google Scholar
  18. 18.
    Lin, Y.Y., Hua, J.H., Tang, N.C., Chen, M.H., Liao, H.Y.M.: Depth and skeleton associated action recognition without online accessible RGB-D cameras. In: CVPR (2014)Google Scholar
  19. 19.
    Ben-David, S., Blitzer, J., Crammer, K., Pereira, F.: Analysis of representations for domain adaptation. In: NIPS (2007)Google Scholar
  20. 20.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar
  21. 21.
    Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: ICCV (2013)Google Scholar
  22. 22.
    Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3D object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_30CrossRefGoogle Scholar
  23. 23.
    Bonde, U., Badrinarayanan, V., Cipolla, R.: Robust instance recognition in presence of occlusion and clutter. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 520–535. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_34CrossRefGoogle Scholar
  24. 24.
    Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_13CrossRefGoogle Scholar
  25. 25.
    Zach, C., Penate-Sanchez, A., Pham, M.: A dynamic programming approach for fast and robust object pose recognition from range images. In: CVPR (2015)Google Scholar
  26. 26.
    Sahin, C., Kouskouridas, R., Kim, T.K.: A learning-based variable size part extraction architecture for 6D object pose recovery in depth images. J. Image Vis. Comput. 63, 38–50 (2017)CrossRefGoogle Scholar
  27. 27.
    Sahin, C., Kouskouridas, R., Kim, T.K.: Iterative hough forest with histogram of control points for 6 DoF object registration from depth images. In: IROS (2016)Google Scholar
  28. 28.
    Sock, J., Kim, K., Sahin, C., Kim, T.K.: Multi-task deep networks for depth-based 6D object pose and joint registration in crowd scenarios. In: BMVC (2018)Google Scholar
  29. 29.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_23CrossRefGoogle Scholar
  30. 30.
    Hoffman, J., Gupta, S., Leong, J., Guadarrama, S., Darrell, T.: Cross-modal adaptation for RGB-D detection. In: ICRA (2016)Google Scholar
  31. 31.
    Baek, S., Shi, Z., Kawade, M., Kim, T.K.: Kinematic-layout-aware random forests for depth-based action recognition. In: BMVC (2017)Google Scholar
  32. 32.
    Cao, J., Tagliasacchi, A., Olson, M., Zhang, H., Su, Z.: Point cloud skeletons via Laplacian based contraction. In: Shape Modeling International Conference (SMI) (2010)Google Scholar
  33. 33.
    Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: CVPR (2015)Google Scholar
  34. 34.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.ICVLImperial College LondonLondonUK

Personalised recommendations