Multimedia Tools and Applications

, Volume 74, Issue 2, pp 595–611 | Cite as

Polysemious visual representation based on feature aggregation for large scale image applications

  • Xinghang SongEmail author
  • Shuqiang JiangEmail author
  • Shuhui Wang
  • Liang Li
  • Qingming Huang


Multiple image features and multiple semantic concepts from the images have intrinsic and complex relations. These relations influence the effectiveness of image semantic analysis methods, especially on the large scale problems. In this paper, a framework of generating polysemious image representation through three levels of feature aggregation is proposed. In the codebook level aggregation, visual dictionaries are learned for each feature type, and each image feature can be reconstructed with this dictionary. In the semantic level aggregation, the multiple concept distributions are learned with each feature codebook by using the improved local anchor embedding. Then the polysemious representation for for single feature type can be established after this level. In the multiple feature level aggregation, final image polysemious representation is obtained through multiple feature fusion with a weighted pooling approach. Through the proposed framework, multiple feature fusion and multiple semantic descriptions are both achieved in an integrated way. Experimental evaluations on large scale image dataset validate the effectiveness of the proposed method.


Polysemious representation Feature aggregation Max pooling Large scale 



This work was supported in part by National Basic Research Program of China (973 Program):2012CB316400, in part by National Natural Science Foundation of China: 61322212, 61025011, 61332016 in part by the Key Technologies R&D Program of China:2012BAH18B02 and in part by in part by National Hi-Tech Development Program (863 Program) of China: 2014AA015202.


  1. 1.
    Binder A, Mller KR, Kawanabe M (2011) On taxonomies for multi-class image categorization. IJCV, pp 1–21Google Scholar
  2. 2.
    Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In: NIPSGoogle Scholar
  3. 3.
    Bosch A, Zisserman A, Muoz X (2006) Scene classification via plsa, vol 4, pp 517–530Google Scholar
  4. 4.
    Cao L, Ji R, Gao Y, Yang Y, Tian Q (2012) Weakly supervised sparse coding with geometric consistency pooling. In: CVPRGoogle Scholar
  5. 5.
    Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27. software available at CrossRefGoogle Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: CVPRGoogle Scholar
  7. 7.
    Farhadi A, Endres I, Hoiem D, Forsyth DA (2009) Describing objects by their attributes. In: CVPR, pp 1778–1785Google Scholar
  8. 8.
    Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPRGoogle Scholar
  9. 9.
    Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: CVPRGoogle Scholar
  10. 10.
    Feng J, Yuan X, Wang Z, Xu H, Yan S (2012) Auto-grouped sparse representation for visual analysis. In: ECCVGoogle Scholar
  11. 11.
    Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCVGoogle Scholar
  12. 12.
    Hwang S J, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: CVPRGoogle Scholar
  13. 13.
    Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: ECCVGoogle Scholar
  14. 14.
    Li L, Jiang S, Huang Q (2012) Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans Multimed 14(5):1401–1413Google Scholar
  15. 15.
    Li L, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPSGoogle Scholar
  16. 16.
    Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088CrossRefGoogle Scholar
  17. 17.
    Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recog 42:218–228CrossRefzbMATHGoogle Scholar
  18. 18.
    Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: ICMLGoogle Scholar
  19. 19.
    Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11(1):19–60zbMATHMathSciNetGoogle Scholar
  20. 20.
    Muja M (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPPGoogle Scholar
  21. 21.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPRGoogle Scholar
  22. 22.
    Ojala T, Pietikäinen M, Maenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefGoogle Scholar
  23. 23.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175CrossRefzbMATHGoogle Scholar
  24. 24.
    Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: NIPSGoogle Scholar
  25. 25.
    Parikh D, Grauman K (2011) Interactively building a discriminative vocabulary of nameable attributes. In: CVPRGoogle Scholar
  26. 26.
    Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917CrossRefGoogle Scholar
  27. 27.
    Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRefGoogle Scholar
  28. 28.
    Siddiquie B, Feris RS (2011) Image ranking and retrieval based on multi-attribute queries. In: CVPRGoogle Scholar
  29. 29.
    Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans Intell Syst Technol 2(2)Google Scholar
  30. 30.
    Tang J, Zha ZJ, Tao D, Chua TS (2012) Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans Image Process 21(4):2354–2360CrossRefMathSciNetGoogle Scholar
  31. 31.
    Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: ECCVGoogle Scholar
  32. 32.
    Vailaya A, Figueiredo A, Jain A, Zhang H (2001) Image classification for content-based indexing. IEEE Trans Image Process 10:117–129CrossRefzbMATHGoogle Scholar
  33. 33.
    Wang H, Ding C, Huang H (2010) Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAIGoogle Scholar
  34. 34.
    Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2011) Locality-constrained linear coding for image classification. In: CVPRGoogle Scholar
  35. 35.
    Wang S, Jiang S, Huang Q, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: CVPRGoogle Scholar
  36. 36.
    Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans CybernGoogle Scholar
  37. 37.
    Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199CrossRefGoogle Scholar
  38. 38.
    Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28CrossRefGoogle Scholar
  39. 39.
    Wu F, Han Y, Tian Q, Zhuang Y (2010) Multi-label boosting for image annotation by structural grouping sparsity. In: ACM MultimediaGoogle Scholar
  40. 40.
    Xu H, Wang J, Hua X, Li S (2009) Tag refinement by regularized lda. In: ACM multimediaGoogle Scholar
  41. 41.
    Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann AG (2013) Feature weighting via optimal thresholding for video analysis. In: ICCVGoogle Scholar
  42. 42.
    Yang Y, Wu F, Nie F, Shen H T, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology(ICT)BeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations