Adaptive Visual-Depth Fusion Transfer

  • Ziyun CaiEmail author
  • Yang Long
  • Xiao-Yuan Jing
  • Ling Shao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)


While RGB-D classification task has been actively researched in recent years, most existing methods focus on the RGB-D source to target transfer task. The application of such methods cannot address the real-world scenario where the paired depth images are not hold. This paper focuses on a more flexible task that recognizes RGB test images by transferring them into the depth domain. Such a scenario retains high performance due to gaining auxiliary information but reduces the cost of pairing RGB with depth sensors at test time. Existing methods suffer from two challenges: the utilization of the additional depth features, and the domain shifting problem due to the different mechanisms between conventional RGB cameras and depth sensors. As a step towards bridging the gap, we propose a novel method called adaptive Visual-Depth Fusion Transfer (aVDFT) which can take advantage of the depth information and handle the domain distribution mismatch simultaneously. Our key novelties are: (1) a global visual-depth metric construction algorithm that can effectively align RGB and depth data structure; (2) adaptive transformed component extraction for target domain that conditioned on invariant transfer on location, scale and depth measurement. To demonstrate the effectiveness of aVDFT, we conduct comprehensive experiments on six pairs of RGB-D datasets for object recognition, scene classification and gender recognition and demonstrate state-of-the-art performance.


RGB-D data Domain adaptation Visual categorization 



This work was sponsored by NUPTSF (Grant No. NY218120), and MRC Innovation Fellowship with ref MR/S003916/1.


  1. 1.
    Baktashmotlagh, M., Harandi, M.T., Lovell, B.C., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: IEEE International Conference on Computer Vision, pp. 769–776 (2013)Google Scholar
  2. 2.
    Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: IEEE International Conference on Intelligent Robots and Systems, pp. 821–826 (2011)Google Scholar
  3. 3.
    Cai, Z., Han, J., Liu, L., Shao, L.: RGB-D datasets using microsoft kinect or similar sensors: a survey. Multimed. Tools Appl. 76(3), 4313–4355 (2017)CrossRefGoogle Scholar
  4. 4.
    Cai, Z., Long, Y., Shao, L.: Adaptive RGB image recognition by visual-depth embedding. IEEE Trans. Image Process. 27(5), 2471–2483 (2018)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Cai, Z., Shao, L.: RGB-D scene classification via multi-modal feature learning. Cogn. Comput. 10, 1–16 (2018)CrossRefGoogle Scholar
  6. 6.
    Chen, L., Li, W., Xu, D.: Recognizing RGB images by learning from RGB-D data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1418–1425 (2014)Google Scholar
  7. 7.
    Cui, Z., Li, W., Xu, D., Shan, S., Chen, X., Li, X.: Flowing on Riemannian manifold: domain adaptation by shifting covariance. IEEE Trans. Cybern. 44(12), 2264–2273 (2014)CrossRefGoogle Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  9. 9.
    Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 465–479 (2012)CrossRefGoogle Scholar
  10. 10.
    Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J.S., Szedmak, S.: Two view learning: SVM-2K, theory and practice. In: Advances in Neural Information Processing Systems, pp. 355–362 (2005)Google Scholar
  12. 12.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 524–531 (2005)Google Scholar
  13. 13.
    Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, pp. 2960–2967 (2013)Google Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  15. 15.
    Gong, B., Grauman, K., Sha, F.: Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 222–230 (2013)Google Scholar
  16. 16.
    Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066–2073 (2012)Google Scholar
  17. 17.
    Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: International Conference on Computer Vision, pp. 999–1006 (2011)Google Scholar
  18. 18.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset (2007)Google Scholar
  19. 19.
    Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. Int. J. Comput. Vis. 112(2), 133–149 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). Scholar
  21. 21.
    Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)CrossRefGoogle Scholar
  22. 22.
    Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, pp. 601–608 (2006)Google Scholar
  23. 23.
    Huynh, T., Min, R., Dugelay, J.-L.: An efficient LBP-based descriptor for facial depth images applied to gender recognition using RGB-D face data. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7728, pp. 133–145. Springer, Heidelberg (2013). Scholar
  24. 24.
    Janoch, A., et al.: A category-level 3D object dataset: putting the kinect to work. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds.) Consumer Depth Cameras for Computer Vision. ACVPR, pp. 141–165. Springer, London (2013). Scholar
  25. 25.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  26. 26.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  27. 27.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation, pp. 1817–1824 (2011)Google Scholar
  28. 28.
    Li, W., Chen, L., Xu, D., Van Gool, L.: Visual recognition in RGB images and videos by learning from RGB-D data. IEEE Trans. Pattern Anal. Mach. Intell. 1, 1 (2017)Google Scholar
  29. 29.
    Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer joint matching for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1410–1417 (2014)Google Scholar
  30. 30.
    Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect database for face recognition. IEEE Trans. Syst. Man Cybern.: Syst. 44(11), 1534–1548 (2014)CrossRefGoogle Scholar
  31. 31.
    Motiian, S., Doretto, G.: Information bottleneck domain adaptation with privileged information for visual recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 630–647. Springer, Cham (2016). Scholar
  32. 32.
    Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)CrossRefGoogle Scholar
  33. 33.
    Redko, I., Bennani, Y.: Non-negative embedding for fully unsupervised domain adaptation. Pattern Recogn. Lett. 77, 35–41 (2016)CrossRefGoogle Scholar
  34. 34.
    Shao, L., Cai, Z., Liu, L., Lu, K.: Performance evaluation of deep feature learning for RGB-D image/video classification. Inf. Sci. 385, 266–283 (2017)CrossRefGoogle Scholar
  35. 35.
    Sharmanska, V., Quadrianto, N., Lampert, C.H.: Learning to rank using privileged information. In: IEEE International Conference on Computer Vision, pp. 825–832 (2013)Google Scholar
  36. 36.
    Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: IEEE International Conference on Computer Vision Workshops, pp. 601–608 (2011)Google Scholar
  37. 37.
    Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)CrossRefGoogle Scholar
  38. 38.
    Wolf, L., Hassner, T., Taigman, Y.: Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1978–1990 (2011)CrossRefGoogle Scholar
  39. 39.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ziyun Cai
    • 1
    Email author
  • Yang Long
    • 2
  • Xiao-Yuan Jing
    • 1
  • Ling Shao
    • 3
  1. 1.College of AutomationNanjing University of Posts and TelecommunicationsNanjingChina
  2. 2.Open Lab, School of ComputingUniversity of NewcastleNewcastle upon TyneUK
  3. 3.Inception Institute of Artificial IntelligenceAbu DhabiUnited Arab Emirates

Personalised recommendations