Multiple Stage Residual Model for Accurate Image Classification

Bai, Song; Wang, Xinggang; Yao, Cong; Bai, Xiang

doi:10.1007/978-3-319-16865-4_28

Multiple Stage Residual Model for Accurate Image Classification

Song Bai⁵,
Xinggang Wang⁵,
Cong Yao⁵ &
…
Xiang Bai⁵

Conference paper
First Online: 01 January 2015

2070 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9003))

Abstract

Image classification is an important topic in computer vision. As a key procedure, encoding the local features to get a compact representation for image affects the final classification accuracy largely. There is no doubt that encoding procedure leads to information loss, due to the existence of quantization error. The residual vector, defined as the difference between the local image feature and its corresponding visual word, is the chief culprit that should be responsible for the quantization error. Many previous algorithms consider it as a coding issue, and focus on reducing the quantization error by reconstructing the feature with more than one visual words, or by the so-called soft-assignment strategy. In this paper, we consider the problem from a different view, and propose an effective and efficient model, which is called Multiple Stage Residual Model (MSRM), to make full use of the residual vector to generate a multiple stage code. Our proposed model is a generic framework, which can be built upon many coding algorithms and improves the image classification performance of the coding algorithms significantly. The experimental results on the image classification benchmarks, such as UIUC 8-Sport, Scene-15, Caltech-101 image dataset, confirm the validity of MSRM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Jégou, H., Zisserman, A., et al.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)
Google Scholar
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: CVPR (2014)
Google Scholar
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM Sigkdd Explor. Newslett. 2, 1–15 (2000)
Article Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV (2004)
Google Scholar
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Google Scholar
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Google Scholar
Zhang, T., Ghanem, B., Liu, S., Xu, C., Ahuja, N.: Low-rank sparse coding for image classification. In: ICCV (2013)
Google Scholar
Shabou, A., Le Borgne, H.: Locality-constrained and spatially regularized coding for scene categorization. In: CVPR (2012)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Chapter Google Scholar
Huang, Y., Huang, K., Yu, Y., Tan, T.: Salient coding for image classification. In: CVPR (2011)
Google Scholar
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Chapter Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Google Scholar
Shaban, A., Rabiee, H.R., Farajtabar, M., Ghazvininejad, M.: From local similarity to global coding: an application to image classification. In: CVPR (2013)
Google Scholar
Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. Adv. Neural Inf. Process. Syst. 22, 2223–2231 (2009)
Google Scholar
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction. IEEE Trans. Cybern. 44, 1053–1066 (2014)
Article Google Scholar
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23, 3368–3380 (2014)
Article MathSciNet Google Scholar
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction and tagging. In: CVPR, pp. 1784–1791 (2012)
Google Scholar
Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML (2010)
Google Scholar
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Google Scholar
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. CVIU 117(5), 479–492 (2013)
Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Google Scholar
Huang, Y., Wu, Z., Wang, L., Tan, T.: Feature coding in image classification: a comprehensive study. PAMI 35(8), 1798–1828 (2013)
Article Google Scholar
Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)
Google Scholar
McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)
Chapter Google Scholar
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: CVPR, IEEE, pp. 961–968 (2011)
Google Scholar
Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)
Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A., et al.: Supervised dictionary learning. In: NIPS (2008)
Google Scholar
Yang, J., Yu, K., Huang, T.: Supervised translation-invariant sparse coding. In: CVPR (2010)
Google Scholar
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Article Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Acknowledgement

This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61222308), and in part by NSFC (No. 61173120), Program for New Century Excellent Talents in University (No. NCET-12-0217), Fundamental Research Funds for the Central Universities (No. HUST 2013TS115). X.Wang was supported by Microsoft Research Asia Fellowship 2012.

Author information

Authors and Affiliations

Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, People’s Republic of China
Song Bai, Xinggang Wang, Cong Yao & Xiang Bai

Authors

Song Bai
View author publications
You can also search for this author in PubMed Google Scholar
Xinggang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Bai .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, S., Wang, X., Yao, C., Bai, X. (2015). Multiple Stage Residual Model for Accurate Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-16865-4_28
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics