Skip to main content

Efficient Feature Coding Based on Auto-encoder Network for Image Classification

  • Conference paper
  • First Online:
Computer Vision – ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Included in the following conference series:

Abstract

Local descriptor coding is one crucial step in traditional Bag of Words (BoW) framework for image categorization. However, the slow coding speed of previous methods is one limitation for applications in large scale problems. Recently, neural network based models have been widely applied in various classification tasks. Using neural network models for descriptor coding is straightforward and efficient due to their fast forward propagation. In this paper, we propose to use the Auto-Encoder (AE) network as a local descriptor coding block, and further embed AE network in the BoW framework for the purpose of image classification. To make the hidden activities of AE network to be both selective and sparse, we add an efficient and effective regularization term into the learning process of AE network, which can promote sparsity of the hidden layer for each input descriptor as well as the selectivity for each hidden node. By incorporating the AE network coding with the BoW framework, we can achieve better results and faster speeds than other state-of-the-art feature coding methods on Caltech101, Scene15 and UIUC 8-Sports databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We utilized lib-linear toolkit [32] in this paper for SVM training.

References

  1. Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: ECCV, Workshop on Statistical Learning in Computer Vision, pp. 1–22 (2004)

    Google Scholar 

  2. Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR, pp. 1794–1801 (2009)

    Google Scholar 

  3. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, pp. 3360–3367 (2010)

    Google Scholar 

  4. Liu, L., Lei, W., Liu, X.: In defense of soft-assignment coding. In: ICCV, pp. 2486–2493 (2011)

    Google Scholar 

  5. Huang, Y., Huang, K., Yu, Y., Tan, T.: Salient coding for image classification. In: CVPR, pp. 1753–1760 (2011)

    Google Scholar 

  6. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)

    Google Scholar 

  8. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)

    Google Scholar 

  10. Huang, Y., Wu, Z., Wang, L., Tan, T.: Feature coding in image classification: A comprehensive study. IEEE Trans. Pattern Anal. Mach. Intell. 36, 493–506 (2014)

    Article  Google Scholar 

  11. van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Gao, S., Tsang, I.W.H., Chia, L.T.: Laplacian sparse coding, hypergraph laplacian sparse coding, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 92–104 (2013)

    Article  Google Scholar 

  13. Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  14. Morioka, N., Satoh, S.: Compact correlation coding for visual object categorization. In: ICCV, pp. 1639–1646 (2011)

    Google Scholar 

  15. Su, Y., Jurie, F.: Visual word disambiguation by semantic contexts. In: ICCV, pp. 311–318 (2011)

    Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)

    Google Scholar 

  17. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  18. Goh, H., Thome, N., Cord, M., Lim, J.-H.: Unsupervised and supervised visual codes with restricted boltzmann machines. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 298–311. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  19. Sohn, K., Jung, D.Y., Lee, H., Hero, A.O.: Efficient learning of sparse, distributed, convolutional feature representations for object recognition. In: ICCV, pp. 2643–2650 (2011)

    Google Scholar 

  20. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)

    Article  Google Scholar 

  21. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  22. Rumelhart, D., Hintont, G., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

  23. Ngiam, J., Koh, P.W., Chen, Z., Bhaskar, S.A., Ng, A.Y.: Sparse filtering. In: NIPS, pp. 1125–1133 (2011)

    Google Scholar 

  24. Wu, Z., Huang, Y., Wang, L., Tan, T.: Group encoding of local features in image classification. In: ICPR, pp. 1505–1508 (2012)

    Google Scholar 

  25. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  26. Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)

    Google Scholar 

  27. Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  28. Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. In: NIPS, pp. 2223–2231 (2009)

    Google Scholar 

  29. Lehky, S.R., Sejnowski, T.J., Desimone, R.: Selectivity and sparseness in the responses of striate complex cells. Vis. Res. 45, 57–73 (2005)

    Article  Google Scholar 

  30. Ng, A.: Sparse autoencoder. CS294 A Lecture notes (2011)

    Google Scholar 

  31. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag New York Inc., New York (1995)

    Book  Google Scholar 

  32. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  33. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples an incremental bayesian approach tested on 101 object categories. In: Proceedings of the Workshop on Generative-Model Based Vision (2004)

    Google Scholar 

  34. Li, L.J., Li, F.F.: What, where and who? classifying events by scene and object recognition. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  35. Jiang, Z., Lin, Z., Davis, L.S.: Label consistent k-svd: Learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2651–2664 (2013)

    Article  Google Scholar 

Download references

Acknowledgement

This work has been supported in part by the National Basic Research Program of China (973 Program) Grant 2012CB316302 and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant XDA06040102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo-Sen Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xie, GS., Zhang, XY., Liu, CL. (2015). Efficient Feature Coding Based on Auto-encoder Network for Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16865-4_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16864-7

  • Online ISBN: 978-3-319-16865-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics