Unsupervised Feature Learning for RGB-D Image Classification

Jhuo, I-Hong; Gao, Shenghua; Zhuang, Liansheng; Lee, D. T.; Ma, Yi

doi:10.1007/978-3-319-16865-4_18

Unsupervised Feature Learning for RGB-D Image Classification

I-Hong Jhuo⁵,
Shenghua Gao⁶,
Liansheng Zhuang⁷,
D. T. Lee^5,8 &
…
Yi Ma^6,9

Conference paper
First Online: 01 January 2015

2315 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Abstract

Motivated by the success of Deep Neural Networks in computer vision, we propose a deep Regularized Reconstruction Independent Component Analysis network (R\(^2\)ICA) for RGB-D image classification. In each layer of this network, we include a R\(^2\)ICA as the basic building block to determine the relationship between the gray-scale and depth images corresponding to the same object or scene. Implementing commonly used local contrast normalization and spatial pooling, we gradually enhance our network to be resilient to local variance resulting in a robust image representation for RGB-D image classification. Moreover, compared with conventional handcrafted feature-based RGB-D image representation, the proposed deep R\(^2\)ICA is a feedforward network. Hence, it is more efficient for image representation. Experimental results on three publicly available RGB-D datasets demonstrate that the proposed method consistently outperforms the state-of-the-art conventional, manually designed RGB-D image representation confirming its effectiveness for RGB-D image classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
[25] applies the DNNs on RGB and depth image representation separately, and simply concatenates the resultant representations for the RGB-D image presentation.
2.
Here, we only discuss one modality data and \(x\) is to represent an input.
3.
For simplification, we apply the same \(W\) to the gray-scale and depth patches. Experimental results show that the performance is promising.
4.
To de-correlate the input data, it was individually normalized by subtracting the mean and dividing by the standard deviation of the high dimensional data before our unsupervised filter learning.

References

Banerjee, J., Moelker, A., Niessen, W.J., van Walsum, T.: 3D LBP-based rotationally invariant region description. In: Park, J.-I., Kim, J. (eds.) ACCV Workshops 2012, Part I. LNCS, vol. 7728, pp. 26–37. Springer, Heidelberg (2013)
Chapter Google Scholar
Bariya, P., Novatnack, J., Schwartz, G., Nishino, K.: 3D geometric scale variability in range images: features and descriptors. IJCV 99, 232–255 (2012)
Article MathSciNet Google Scholar
Blum, M., Springenberg, J., Wlfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D Data. In: ICRA (2012)
Google Scholar
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Desai, J.P., Dudek, G., Khatib, O., Kumar, V. (eds.) ISER 2012. STAR, vol. 88, pp. 387–402. Springer, Heidelberg (2012)
Google Scholar
Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2012)
Google Scholar
Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: NIPS (2011)
Google Scholar
Browatzki, B., Fischer, J., Graf, B., Blthoff, H.H., Wallraven, C.: Going into depth: evaluating. 2D and 3D cues for object classification on a new, large-scale object dataset. In: ICCV Workshop (2011)
Google Scholar
Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)
Chapter Google Scholar
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGBD images. In: CVPR (2013)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley Interscience, New York (2001)
Google Scholar
Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-D object dataset: putting the kinect to work. In: ICCV Workshop (2011)
Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., LeCun. Y.: What is the best multi-stage architecture for object recognition? In: ICCV (2009)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA (2011)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Le, Q.V., Karpenko, A., Ngiam, J., Ng, A.Y.: ICA with reconstruction cost for efficient overcomplete feature learning. In: NIPS (2011)
Google Scholar
Le, Q.V., Ranzato, M.A., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: ICML (2012)
Google Scholar
Le, Q.V., Ngiam, J., Chen, Z., Chia, D., Koh, P., Ng, A.Y.: Tiled convolutional neural networks. In: NIPS (2010)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code rsecognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Lyu, S., Simoncelli, E.: Nonlinear image representation using divisive normalization. In: ICCV (2009)
Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Google Scholar
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: CVPR (2012)
Google Scholar
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML (2011)
Google Scholar
Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)
Google Scholar
Silberman, N., Fergus R.: Indoor scene segmentation using a structured light sensor. In: ICCV Workshop (2011)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Chapter Google Scholar
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. In: NIPS (2012)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11(5), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Wang, N., Yeung, D.-Y.: Learning a deep compact image representation for visual tracking. In: NIPS (2013)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: Deconvolutional networks. In: CVPR (2010)
Google Scholar

Download references

Acknowledgements

This work was supported by grant MOST 103-2911-I-001-531.

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinca, Taipei, Taiwan
I-Hong Jhuo & D. T. Lee
School of Information Science and Technology, ShanghaiTech University, Shanghai, China
Shenghua Gao & Yi Ma
CAS Key Laboratory of Electromagnetic Space Information, USTC, Hefei, China
Liansheng Zhuang
Department of Computer Science, National Chung Hsing University, Taichung, Taiwan
D. T. Lee
Department of ECE, University of Illinois at Urbana-Champaign, Champaign, USA
Yi Ma

Authors

I-Hong Jhuo
View author publications
You can also search for this author in PubMed Google Scholar
Shenghua Gao
View author publications
You can also search for this author in PubMed Google Scholar
Liansheng Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
D. T. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I-Hong Jhuo .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jhuo, IH., Gao, S., Zhuang, L., Lee, D.T., Ma, Y. (2015). Unsupervised Feature Learning for RGB-D Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-16865-4_18
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics