A Hybrid Supervised-Unsupervised Vocabulary Generation Algorithm for Visual Concept Recognition

  • Alexander Binder
  • Wojciech Wojcikiewicz
  • Christina Müller
  • Motoaki Kawanabe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6494)


Vocabulary generation is the essential step in the bag-of-words image representation for visual concept recognition, because its quality affects classification performance substantially. In this paper, we propose a hybrid method for visual word generation which combines unsupervised density-based clustering with the discriminative power of fast support vector machines. We aim at three goals: breaking the vocabulary generation algorithm up into two sections, with one highly parallelizable part, reducing computation times for bag of words features and keeping concept recognition performance at levels comparable to vanilla k-means clustering. On the two recent data sets Pascal VOC2009 and Image-CLEF2010 PhotoAnnotation, our proposed method either outperforms various baseline algorithms for visual word generation with almost same computation time or reduces training/test time with on par classification performance.


Support Vector Machine Visual Word Area Under Curve Word Feature Visual Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, Czech Republic, pp. 1–22 (2004)Google Scholar
  2. 2.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC 2007) (2007),
  3. 3.
    Tahir, M., van de Sande, K., Uijlings, J., Yan, F., Li, X., Mikolajczyk, K., Kittler, J., Gevers, T., Smeulders, A.: Surreyuva srkda method (2008),
  4. 4.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2009 (VOC 2009) (2009),
  5. 5.
    Nowak, S., Dunker, P.: Overview of the CLEF 2009 large-scale visual concept detection and annotation task. In: Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.) CLEF 2009. LNCS, vol. 6242, pp. 94–109. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV 2005, vol. I, pp. 604–610 (2005)Google Scholar
  7. 7.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Advances in Neural Information Processing Systems (2006)Google Scholar
  8. 8.
    Moosmann, F., Nowak, E., Jurie, F.: Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis & Machine Intelligence 30, 1632–1646 (2008)CrossRefGoogle Scholar
  9. 9.
    Uijlings, J., Smeulders, A., Scha, R.: Real-time bag-of-words, approximately. In: CIVR (2009)Google Scholar
  10. 10.
    Bosch, A., Zisserman, A., Muñoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR 2007), pp. 401–408 (2007)Google Scholar
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision (2004)Google Scholar
  12. 12.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Trans. Pat. Anal. & Mach. Intel. (2010)Google Scholar
  13. 13.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR 2006: Proceedings of Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  14. 14.
    Fulkerson, B., Vedaldi, A., Soatto, S.: Localizing objects with smart dictionaries. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 179–192. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Lowe, D.: Distinctive image features from scale invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., de Bona, F., Binder, A., Gehl, C., Franc, V.: The shogun machine learning toolbox. Journal of Machine Learning Research (2010)Google Scholar
  17. 17.
    van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: Kernel codebooks for scene categorization. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 696–709. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  19. 19.
    Wojcikiewicz, W., Binder, A., Kawanabe, M.: Enhancing image classification with class-wise clustered vocabularies. In: Proceedings of the 20th International Conference on Pattern Recognition (ICPR) (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Alexander Binder
    • 1
  • Wojciech Wojcikiewicz
    • 1
    • 2
  • Christina Müller
    • 1
    • 2
  • Motoaki Kawanabe
    • 1
    • 2
  1. 1.Machine Learning GroupBerlin Institute of TechnologyBerlinGermany
  2. 2.Fraunhofer Institute FIRSTBerlinGermany

Personalised recommendations