Skip to main content

Unsupervised Feature Learning for RGB-D Image Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Abstract

Motivated by the success of Deep Neural Networks in computer vision, we propose a deep Regularized Reconstruction Independent Component Analysis network (R\(^2\)ICA) for RGB-D image classification. In each layer of this network, we include a R\(^2\)ICA as the basic building block to determine the relationship between the gray-scale and depth images corresponding to the same object or scene. Implementing commonly used local contrast normalization and spatial pooling, we gradually enhance our network to be resilient to local variance resulting in a robust image representation for RGB-D image classification. Moreover, compared with conventional handcrafted feature-based RGB-D image representation, the proposed deep R\(^2\)ICA is a feedforward network. Hence, it is more efficient for image representation. Experimental results on three publicly available RGB-D datasets demonstrate that the proposed method consistently outperforms the state-of-the-art conventional, manually designed RGB-D image representation confirming its effectiveness for RGB-D image classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    [25] applies the DNNs on RGB and depth image representation separately, and simply concatenates the resultant representations for the RGB-D image presentation.

  2. 2.

    Here, we only discuss one modality data and \(x\) is to represent an input.

  3. 3.

    For simplification, we apply the same \(W\) to the gray-scale and depth patches. Experimental results show that the performance is promising.

  4. 4.

    To de-correlate the input data, it was individually normalized by subtracting the mean and dividing by the standard deviation of the high dimensional data before our unsupervised filter learning.

References

  1. Banerjee, J., Moelker, A., Niessen, W.J., van Walsum, T.: 3D LBP-based rotationally invariant region description. In: Park, J.-I., Kim, J. (eds.) ACCV Workshops 2012, Part I. LNCS, vol. 7728, pp. 26–37. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Bariya, P., Novatnack, J., Schwartz, G., Nishino, K.: 3D geometric scale variability in range images: features and descriptors. IJCV 99, 232–255 (2012)

    Article  MathSciNet  Google Scholar 

  3. Blum, M., Springenberg, J., Wlfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D Data. In: ICRA (2012)

    Google Scholar 

  4. Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Desai, J.P., Dudek, G., Khatib, O., Kumar, V. (eds.) ISER 2012. STAR, vol. 88, pp. 387–402. Springer, Heidelberg (2012)

    Google Scholar 

  5. Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2012)

    Google Scholar 

  6. Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: NIPS (2011)

    Google Scholar 

  7. Browatzki, B., Fischer, J., Graf, B., Blthoff, H.H., Wallraven, C.: Going into depth: evaluating. 2D and 3D cues for object classification on a new, large-scale object dataset. In: ICCV Workshop (2011)

    Google Scholar 

  8. Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGBD images. In: CVPR (2013)

    Google Scholar 

  10. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  12. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley Interscience, New York (2001)

    Google Scholar 

  13. Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., Darrell, T.: A category-level 3-D object dataset: putting the kinect to work. In: ICCV Workshop (2011)

    Google Scholar 

  14. Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., LeCun. Y.: What is the best multi-stage architecture for object recognition? In: ICCV (2009)

    Google Scholar 

  15. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA (2011)

    Google Scholar 

  16. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

    Google Scholar 

  17. Le, Q.V., Karpenko, A., Ngiam, J., Ng, A.Y.: ICA with reconstruction cost for efficient overcomplete feature learning. In: NIPS (2011)

    Google Scholar 

  18. Le, Q.V., Ranzato, M.A., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: ICML (2012)

    Google Scholar 

  19. Le, Q.V., Ngiam, J., Chen, Z., Chia, D., Koh, P., Ng, A.Y.: Tiled convolutional neural networks. In: NIPS (2010)

    Google Scholar 

  20. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code rsecognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  21. Lyu, S., Simoncelli, E.: Nonlinear image representation using divisive normalization. In: ICCV (2009)

    Google Scholar 

  22. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)

    Google Scholar 

  23. Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: CVPR (2012)

    Google Scholar 

  24. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML (2011)

    Google Scholar 

  25. Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)

    Google Scholar 

  26. Silberman, N., Fergus R.: Indoor scene segmentation using a structured light sensor. In: ICCV Workshop (2011)

    Google Scholar 

  27. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  28. Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep boltzmann machines. In: NIPS (2012)

    Google Scholar 

  29. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11(5), 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  30. Wang, N., Yeung, D.-Y.: Learning a deep compact image representation for visual tracking. In: NIPS (2013)

    Google Scholar 

  31. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)

    Google Scholar 

  32. Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: Deconvolutional networks. In: CVPR (2010)

    Google Scholar 

Download references

Acknowledgements

This work was supported by grant MOST 103-2911-I-001-531.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I-Hong Jhuo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jhuo, IH., Gao, S., Zhuang, L., Lee, D.T., Ma, Y. (2015). Unsupervised Feature Learning for RGB-D Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16865-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16864-7

  • Online ISBN: 978-3-319-16865-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics