Skip to main content
Log in

Evaluation of local features and classifiers in BOW model for image classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Bag-of-word (BOW) is used in many state-of-the-art methods of image classification, and it is especially suitable for multi-class classification. Many kinds of local features and classifiers are applicable for the BOW model. However, it is unclear which kind of local feature is the most distinctive and meanwhile robust, and which classifier can optimize classification performance. In this paper, we discuss the implementation choices in the BOW model. Further, we evaluate the influences of local features and classifiers on object and texture recognition methods in the framework of the BOW model. To evaluate the implementation choices, we use two popular datasets: the Xerox7 dataset and the UIUCTex dataset. Extensive experiments are carried out to compare the performance of different detectors, descriptors and classifiers in term of classification accuracy on the object category dataset and the texture dataset. We find that the combinational detector which combines the MSER detector with the Hessian-Laplacian detector is efficient to find discriminative regions. We also find that the SIFT descriptor performs better than the other descriptors for image classification, and that the SVM classifier with the EMD kernel is superior to other classifiers. More than that, we propose an EMD spatial kernel to encode the spatial information of local features. The EMD spatial kernel is implemented on the Xerox7 dataset, the 4-class VOC2006 dataset and the 4-class Caltech101 dataset. The experimental results show that the proposed kernel outperforms the EMD kernel which does not consider the spatial information in image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bay H, Tuytelaars T, Van Gool L (2006) SURF: Speeded Up Robust Features. In proceeding of European Conference on Computer Vision

  2. Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intel 24:509–522

    Article  Google Scholar 

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  4. Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In proceeding of Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp 3352–3359

  5. Cheng YY, Qu YY, Huang JX, Fang TZ, Lu S, Xie Y (2010) Optimal operations for visual categorization. In proceeding of 2nd International Conference on Internet Multimedia Computing and Service, pp 73–76

  6. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In proceeding of ECCV Workshop on Statistical Learning in Computer Vision

  7. Farquhar J, Szedmak S, Meng H, Shawe-Taylor J (2005) Improving “bag-of-keypoints” image categorisation. In Technical report, University of Southampton

  8. Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s image search. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1816–1823

  9. Freeman WT, Adelson EH (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intel 13:891–906

    Article  Google Scholar 

  10. Larlus D, Jurie F (2006) Latent mixture vocabularies for object categorization. In proceeding of British Machine Vision Conference

  11. Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intel 27:1265–1278

    Article  Google Scholar 

  12. Lazebnik S, Schmid C, Ponce J (2005) A maximum entropy framework for part-based texture and object recognition. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 1, pp 832–838

  13. Levina E, Bickel P (2001) The earth mover’s distance is the mallows distance: some insights from statistics. In proceeding of Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2, pp 251–256

  14. Ling HB, Jacobs DW (2005) Deformation invariant image matching. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1466–1473

  15. Liu Y, Rong J, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In proceeding of Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–8

  16. Lowe DG (1999) Object recognition from local scale-invariant features. In proceeding of computer vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp 1150–1157

  17. Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. In proceeding of British Machine Vision Conference

  18. Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60:63–86

    Article  Google Scholar 

  19. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intel 27:1615–1630

    Article  Google Scholar 

  20. Moosmann F, Triggs B, Jurie F (2006) Randomized clustering forests for building fast and discriminative visual vocabularies. In proceeding of Neural Information Processing Systems

  21. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In proceeding of European Conference on Computer Vision

  22. Perronnin F, Dance C, Csurka G, Bressan M (2006) Adopted vocabularies for generic visual categorization. In proceeding of European Conference on Computer Vision

  23. Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66:231–259

    Article  Google Scholar 

  24. Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In proceeding of European Conference on Computer Vision, pp 255–271

  25. Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In proceeding of Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp II-691-8

  26. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1800–1807

  27. Wu Z, Ke W, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In proceeding of Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp 25–32

  28. Zhang S, Huang Q, Hua G, Jiang S, Gao W, Tian Q (2010) Building contextual visual vocabulary for large-scale image applications. In proceeding of Proceedings of the international conference on Multimedia, Firenze, Italy, pp 501–510

  29. Zhang J, Marsza M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the reviewers for their valuable comments, which greatly helped to improve the quality of the paper. The research work was supported by the Fundamental Research Funds for the Central Universities (2010121067), National Defense Basic Scientific Research program of China under Grant (B1420110155), National Natural Science Foundation of China (61170179), the Special Research Fund for the Doctoral Program of Higher Education of China under Project (20110121110033), and Xiamen Science & Technology Planning Project Fund (3502Z20116005) of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanzi Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qu, Y., Wu, S., Liu, H. et al. Evaluation of local features and classifiers in BOW model for image classification. Multimed Tools Appl 70, 605–624 (2014). https://doi.org/10.1007/s11042-012-1107-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1107-z

Keywords

Navigation