Abstract
Bag Of visual Words (BoW) is widely regarded as the standard representation of visual information present in the images and is broadly used for retrieval and concept detection in videos. The generation of visual vocabulary in the BoW framework generally includes a quantization step to cluster the image features into a limited number of visual words. This quantization achieved through unsupervised clustering does not take any advantage of the relationship between the features coming from images belonging to similar concept(s), thus enlarging the semantic gap. We present a new dictionary construction technique to improve the BoW representation by increasing its discriminative power. Our solution is based on a two step quantization: we start with k-means clustering followed by a bottom-up supervised clustering using features’ label information. Results on the TRECVID 2007 data [8] show improvements with the proposed construction of the BoW.
We equally give upperbounds of improvement over the baseline for the retrieval rate of each concept using the best supervised merging criteria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, Philadelphia, PA, USA, pp. 1027–1035 (2007), http://portal.acm.org/citation.cfm?id=1283383.1283494
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Hao, J., Jie, X.: Improved bags-of-words algorithm for scene recognition. In: 2010 2nd International Conference on Signal Processing Systems (ICSPS), vol. 2, pp. V2-279 –V2-282 (2010)
Lin, C., Li, S., Su, S.: Image classification using adapted codebook. In: ITIME 2009, vol. 1, pp. 1307–1312 (2009)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004), http://portal.acm.org/citation.cfm?id=993451.996342
Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS (2006), http://lear.inrialpes.fr/pubs/2006/MTJ06
Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted Vocabularies for Generic Visual Categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/1174408536
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR 2006, pp. 321–330 (2006), http://doi.acm.org/10.1145/1178677.1178722
Wang, L.: Toward a discriminative codebook: Codeword selection across multi-resolution. In: CVPR (2007), http://dx.doi.org/10.1109/CVPR.2007.383374
Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: ICCV 2005, pp. 1800–1807 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Niaz, U.F., Merialdo, B. (2012). Entropy Based Supervised Merging for Visual Categorization. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., ZemÄŤĂk, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2012. Lecture Notes in Computer Science, vol 7517. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33140-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-33140-4_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33139-8
Online ISBN: 978-3-642-33140-4
eBook Packages: Computer ScienceComputer Science (R0)