Abstract
Bag-of-word (BOW) is used in many state-of-the-art methods of image classification, and it is especially suitable for multi-class classification. Many kinds of local features and classifiers are applicable for the BOW model. However, it is unclear which kind of local feature is the most distinctive and meanwhile robust, and which classifier can optimize classification performance. In this paper, we discuss the implementation choices in the BOW model. Further, we evaluate the influences of local features and classifiers on object and texture recognition methods in the framework of the BOW model. To evaluate the implementation choices, we use two popular datasets: the Xerox7 dataset and the UIUCTex dataset. Extensive experiments are carried out to compare the performance of different detectors, descriptors and classifiers in term of classification accuracy on the object category dataset and the texture dataset. We find that the combinational detector which combines the MSER detector with the Hessian-Laplacian detector is efficient to find discriminative regions. We also find that the SIFT descriptor performs better than the other descriptors for image classification, and that the SVM classifier with the EMD kernel is superior to other classifiers. More than that, we propose an EMD spatial kernel to encode the spatial information of local features. The EMD spatial kernel is implemented on the Xerox7 dataset, the 4-class VOC2006 dataset and the 4-class Caltech101 dataset. The experimental results show that the proposed kernel outperforms the EMD kernel which does not consider the spatial information in image classification.
Similar content being viewed by others
References
Bay H, Tuytelaars T, Van Gool L (2006) SURF: Speeded Up Robust Features. In proceeding of European Conference on Computer Vision
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intel 24:509–522
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Cao Y, Wang C, Li Z, Zhang L, Zhang L (2010) Spatial-bag-of-features. In proceeding of Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp 3352–3359
Cheng YY, Qu YY, Huang JX, Fang TZ, Lu S, Xie Y (2010) Optimal operations for visual categorization. In proceeding of 2nd International Conference on Internet Multimedia Computing and Service, pp 73–76
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In proceeding of ECCV Workshop on Statistical Learning in Computer Vision
Farquhar J, Szedmak S, Meng H, Shawe-Taylor J (2005) Improving “bag-of-keypoints” image categorisation. In Technical report, University of Southampton
Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s image search. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1816–1823
Freeman WT, Adelson EH (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intel 13:891–906
Larlus D, Jurie F (2006) Latent mixture vocabularies for object categorization. In proceeding of British Machine Vision Conference
Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intel 27:1265–1278
Lazebnik S, Schmid C, Ponce J (2005) A maximum entropy framework for part-based texture and object recognition. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 1, pp 832–838
Levina E, Bickel P (2001) The earth mover’s distance is the mallows distance: some insights from statistics. In proceeding of Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2, pp 251–256
Ling HB, Jacobs DW (2005) Deformation invariant image matching. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1466–1473
Liu Y, Rong J, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In proceeding of Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp 1–8
Lowe DG (1999) Object recognition from local scale-invariant features. In proceeding of computer vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp 1150–1157
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. In proceeding of British Machine Vision Conference
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60:63–86
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intel 27:1615–1630
Moosmann F, Triggs B, Jurie F (2006) Randomized clustering forests for building fast and discriminative visual vocabularies. In proceeding of Neural Information Processing Systems
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In proceeding of European Conference on Computer Vision
Perronnin F, Dance C, Csurka G, Bressan M (2006) Adopted vocabularies for generic visual categorization. In proceeding of European Conference on Computer Vision
Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int J Comput Vis 66:231–259
Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In proceeding of European Conference on Computer Vision, pp 255–271
Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In proceeding of Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 2, pp II-691-8
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In proceeding of Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2, pp 1800–1807
Wu Z, Ke W, Isard M, Sun J (2009) Bundling features for large scale partial-duplicate web image search. In proceeding of Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp 25–32
Zhang S, Huang Q, Hua G, Jiang S, Gao W, Tian Q (2010) Building contextual visual vocabulary for large-scale image applications. In proceeding of Proceedings of the international conference on Multimedia, Firenze, Italy, pp 501–510
Zhang J, Marsza M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238
Acknowledgments
The authors would like to thank the reviewers for their valuable comments, which greatly helped to improve the quality of the paper. The research work was supported by the Fundamental Research Funds for the Central Universities (2010121067), National Defense Basic Scientific Research program of China under Grant (B1420110155), National Natural Science Foundation of China (61170179), the Special Research Fund for the Doctoral Program of Higher Education of China under Project (20110121110033), and Xiamen Science & Technology Planning Project Fund (3502Z20116005) of China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qu, Y., Wu, S., Liu, H. et al. Evaluation of local features and classifiers in BOW model for image classification. Multimed Tools Appl 70, 605–624 (2014). https://doi.org/10.1007/s11042-012-1107-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1107-z