Complex-Valued Representation for RGB-D Object Recognition
Object recognition methods usually tend to focus on single cues coming from traditional vision based systems but ignore to incorporate multi-modal data. With the advent of depth RGB-D sensors which provide synchronized multi-modal data with good quality, new opportunities have been emerged. In this paper, we make use of RGB and depth images to propose a new object recognition approach. Using a pixel-wise scheme, we propose a novel method to describe RGB-D images with a complex-valued representation. By means of neural network, we introduce a new CVNN (Complex-Valued Neural Network) with RBF neurons. Different from many RGB-D features, the proposed approach is able to jointly use RGB and depth data within a unified end-to-end learning framework. Category and instance object recognition tasks are evaluated through experiments carried out on a large scale RGB-D object dataset. Results show that our method can efficiently recognize objects in RGB-D images and outperforms state-of-the-art approaches.
KeywordsRGB-D representation Object recognition Complex-valued neural networks Multi-modal data
This work was supported by the European Union funding through ALYSSA program (ERASMUS-MUNDUS action 2 lot 6) and by the research grant from Singapore Agency for Science, Technology and Research (A*STAR) through the ARAP program.
- 3.Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)Google Scholar
- 4.Held, D., Thrun, S., Savarese, S.: Robust single-view instance recognition. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2152–2159. IEEE (2016)Google Scholar
- 5.Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23 Google Scholar
- 9.Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 821–826. IEEE (2011)Google Scholar
- 10.Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: AAAI, vol. 1, p. 2 (2011)Google Scholar
- 11.Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Desai, J., Dudek, G., Khatib, O., Kumar, V. (eds.) Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 88, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00065-7_27 CrossRefGoogle Scholar
- 12.Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: Advances in Neural Information Processing Systems, pp. 2115–2123 (2011)Google Scholar
- 13.Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)Google Scholar