Abstract
This paper presents a novel descriptor, geodesic invariant feature (GIF), for representing objects in depth images. Especially in the context of parts classification of articulated objects, it is capable of encoding the invariance of local structures effectively and efficiently. The contributions of this paper lie in our multi-level feature extraction hierarchy. (1) Low-level feature encodes the invariance to articulation. Geodesic gradient is introduced, which is covariant with the non-rigid deformation of objects and is utilized to rectify the feature extraction process. (2) Mid-level feature reduces the noise and improves the efficiency. With unsupervised clustering, the primitives of objects are changed from pixels to superpixels. The benefit is two-fold: firstly, superpixel reduces the effect of the noise introduced by depth sensors; secondly, the processing speed can be improved by a big margin. (3) High-level feature captures nonlinear dependencies between the dimensions. Deep network is utilized to discover the high-level feature representation. As the feature propagates towards the deeper layers of the network, the ability of the feature capturing the data’s underlying regularities is improved. Comparisons with the state-of-the-art methods reveal the superiority of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Surf: speeded up robust features. Comput. Vis. Image Underst. 110, 346–359 (2008)
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1281–1298 (2012)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010)
Chen, J., Shan, S., He, C., Zhao, G., Pietikinen, M., Chen, X., Gao, W.: Wld: a robust local image descriptor. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1705–1720 (2009)
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors (2004)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Valle, E.: Local-Descriptor Matching for Image Identification Systems. Thesis (2008)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1644 (2010)
Chen, J., Zhao, G., Salo, M., Rahtu, E., Pietikinen, M.: Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Trans. Image Process. 22, 326–339 (2013)
Rahmani, R., Goldman, S.A., Zhang, H., Cholleti, S.R., Fritts, J.E.: Localized content based image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1902–1912 (2008)
Shen, X., Lin, Z., Brandt, J., Wu, Y.: Detecting and aligning faces by image retrieval (2013)
Subrahmanyam, M., Maheshwari, R., Balasubramanian, R.: Local maximum edge binary patterns: a new descriptor for image retrieval and object tracking. Sig. Process. 92, 1467–1479 (2012)
Ta, D.N., Chen, W.C., Gelfand, N., Pulli, K.: Surftrac: efcient tracking and continuous object recognition using local feature descriptors (2009)
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006)
Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H.: Local gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition (2005)
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fitzgibbon, A.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera (2011)
Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking (2011)
Helten, T., Baak, A., Bharaj, G., Mller, M., Seidel, H.P., Theobalt, C.: Personalization and evaluation of a real-time depth-based full body tracker (2013)
Lallemand, J., Pauly, O., Schwarz, L.: Multi-task forest for human pose estimation in depth images (2013)
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2821–2840 (2013)
Ye, M., Yang, R.: Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera (2014)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: Cipolla, R., Battiato, S., Farinella, G.M. (eds.) Machine Learning for Computer Vision, vol. 411, pp. 119–135. Springer, Heidelberg (2011)
Ojala, T., Pietikinen, M., Menp, T.: Multiresolution gray scale and rotation invariant texture analysis with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Ssstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012)
Arel, I., Rose, D.C., Karnowski, T.P.: Deep machine learning a new frontier in artificial intelligence research. IEEE Comput. Intell. Mag. 5, 13–18 (2010)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler (2010)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1915–1929 (2013)
Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierarchies for visual recognition (2010)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks (2012)
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing (2012)
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection (2011)
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions (2011)
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images (2010)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks (2007)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders (2008)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection (2004)
Acknowledgment
This work is supported by NSFC (Grant No 61300161, 61371168 and 61273251), Doctoral Fund of Ministry of Education of China (Grant No 20133219120033), Open Project Program of Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Grant No JSKL201306) and Programme of Introducing Talents of Discipline to Universities (Grant NoB13022).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, Y., Lasang, P., Sun, Q., Siegel, M. (2015). Learning Hierarchical Feature Representation in Depth Image. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-16811-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)