Abstract
Image classification is an important topic in computer vision. As a key procedure, encoding the local features to get a compact representation for image affects the final classification accuracy largely. There is no doubt that encoding procedure leads to information loss, due to the existence of quantization error. The residual vector, defined as the difference between the local image feature and its corresponding visual word, is the chief culprit that should be responsible for the quantization error. Many previous algorithms consider it as a coding issue, and focus on reducing the quantization error by reconstructing the feature with more than one visual words, or by the so-called soft-assignment strategy. In this paper, we consider the problem from a different view, and propose an effective and efficient model, which is called Multiple Stage Residual Model (MSRM), to make full use of the residual vector to generate a multiple stage code. Our proposed model is a generic framework, which can be built upon many coding algorithms and improves the image classification performance of the coding algorithms significantly. The experimental results on the image classification benchmarks, such as UIUC 8-Sport, Scene-15, Caltech-101 image dataset, confirm the validity of MSRM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jégou, H., Zisserman, A., et al.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: CVPR (2014)
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM Sigkdd Explor. Newslett. 2, 1–15 (2000)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV (2004)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: ICCV (2011)
Zhang, T., Ghanem, B., Liu, S., Xu, C., Ahuja, N.: Low-rank sparse coding for image classification. In: ICCV (2013)
Shabou, A., Le Borgne, H.: Locality-constrained and spatially regularized coding for scene categorization. In: CVPR (2012)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 141–154. Springer, Heidelberg (2010)
Huang, Y., Huang, K., Yu, Y., Tan, T.: Salient coding for image classification. In: CVPR (2011)
van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)
Shaban, A., Rabiee, H.R., Farajtabar, M., Ghazvininejad, M.: From local similarity to global coding: an application to image classification. In: CVPR (2013)
Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. Adv. Neural Inf. Process. Syst. 22, 2223–2231 (2009)
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction. IEEE Trans. Cybern. 44, 1053–1066 (2014)
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23, 3368–3380 (2014)
Shen, W., Deng, K., Bai, X., Leyvand, T., Guo, B., Tu, Z.: Exemplar-based human action pose correction and tagging. In: CVPR, pp. 1784–1791 (2012)
Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: ICML (2010)
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR (2010)
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. CVIU 117(5), 479–492 (2013)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Huang, Y., Wu, Z., Wang, L., Tan, T.: Feature coding in image classification: a comprehensive study. PAMI 35(8), 1798–1828 (2013)
Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)
McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)
Wang, X., Bai, X., Liu, W., Latecki, L.J.: Feature context for image classification and object detection. In: CVPR, IEEE, pp. 961–968 (2011)
Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A., et al.: Supervised dictionary learning. In: NIPS (2008)
Yang, J., Yu, K., Huang, T.: Supervised translation-invariant sparse coding. In: CVPR (2010)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106(1), 59–70 (2007)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Acknowledgement
This work was primarily supported by National Natural Science Foundation of China (NSFC) (No. 61222308), and in part by NSFC (No. 61173120), Program for New Century Excellent Talents in University (No. NCET-12-0217), Fundamental Research Funds for the Central Universities (No. HUST 2013TS115). X.Wang was supported by Microsoft Research Asia Fellowship 2012.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bai, S., Wang, X., Yao, C., Bai, X. (2015). Multiple Stage Residual Model for Accurate Image Classification. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision – ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-16865-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16864-7
Online ISBN: 978-3-319-16865-4
eBook Packages: Computer ScienceComputer Science (R0)