Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3473–3491 | Cite as

Multimedia automatic annotation by mining label set correlation

  • Feng Tian
  • Xukun Shen
  • Xianmei Liu


Organizing and retrieving multimedia data heavily rely on the relevant textual descriptions. Multimedia automatic annotation, which assigns text labels to multimedia samples, has been widely studied. Among others, search-based annotation methods are well suited for annotation tasks on large-scale datasets and are studied in depth because of their simplicity and scalability. However, classical search based annotation methods address this problem by treating each label independently, which ignores the correlation between different labels in the assigned label set. This paper aims to integrate the relevant information of the label set with respect to the multimedia content and the inner correlated information of the label set into a joint learning framework. We evaluate the performance of the proposed method on MIRFLICKR-25000 and NUS-WIDE datasets. Experimental results show that the proposed annotation method achieves excellent performance.


Multimedia annotation Automatic annotation Label set correlation mining 



Special thanks should go to the collaborators in the Lab for Media Search of National University of Singapore, for their instructive advice and useful suggestions on this work. This work is supported by the Natural Science Foundation of China (No.61502094,61402099) and Natural Science Foundation of Heilongjiang Province of China(No.F2016002,F2015020).


  1. 1.
    Chang XJ, Shen HQ, Wang S, Liu JJ, Li X (2014) Semi-supervised feature analysis for multimedia annotation by mining label correlation. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 74–85.
  2. 2.
    Chang XJ, Yi Y Semi-supervised feature analysis by mining correlations among multiple tasks, IEEE Transactions on Neural Networks and Learning Systems, Early access articles,
  3. 3.
    Chang XJ, Ma ZG, Yi Y, Zeng ZQ, Alexander GH (2016) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197. CrossRefGoogle Scholar
  4. 4.
    Chang XJ, Ma ZG, Lin M, Yi Y, Alexander GH (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920. MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chang XJ, YU YL, Yi Y, Xing EP Semantic pooling for complex event analysis in untrimmed videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, Early access articles,
  6. 6.
    Chen L, Xu D, Tsang IW, Luo J (2012) Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans Multimed 14(4):1057–1067. CrossRefGoogle Scholar
  7. 7.
    Chua TS, Tang JH, Hong RC, Li H J, Luo ZP, Zheng YT (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, no, p 48
  8. 8.
    Duygulu P, Barnard K, De Freitas JF, Forsyth DA (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision, pp 97–112
  9. 9.
    Feng Z, Feng S, Jin R, Jain AK (2014) Image tag completion by noisy matrix recovery. In: Proceedings of the European Conference on Computer Vision, pp 424–438
  10. 10.
    Gao Y, Wang M, Zha ZJ, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376. MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Guillaumin M, Mensink T, Verbeek J (2009) Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 309–316
  12. 12.
    Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of 1st ACM International Conference on Multimedia Information Retrieval, pp 39–43
  13. 13.
    Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidenceandamp; WordNet. In: Proceedings of 13th ACM International Conference on Multimedia, pp 706–715
  14. 14.
    Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: Image annotation using weighted multi-view non-negative matrix factorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 184–191
  15. 15.
    Kuo YH, Cheng WH, Lin HT, Hsu WH (2012) Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Trans Multimed 14(4):1079–1090. CrossRefGoogle Scholar
  16. 16.
    Lee S, De Neve W, Ro YM (2014) Visually weighted neighbor voting for image tag relevance learning. Multimed Tools Appl 72(2):1363–1386. Google Scholar
  17. 17.
    Li XR (2014) Tag relevance fusion for social image retrieval. Multimed Syst 23(1):29–40. MathSciNetCrossRefGoogle Scholar
  18. 18.
    Li X, Snoek CG (2013) Classifying tag relevance with relevant positive and negative examples. In: Proceedings of the 21st ACM International Conference on Multimedia, pp 485–488
  19. 19.
    Li XR, Snoek CG, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7):1310–1322. CrossRefGoogle Scholar
  20. 20.
    Li X, Uricchio T, Ballan L, Bertini M, Snoek CG, Bimbo AD (2016) Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Comput Surv 49(1):14. CrossRefGoogle Scholar
  21. 21.
    Liu J, Li M, Ma WY, Liu Q, Lu H (2006) An adaptive graph model for automatic image annotation. In: Proceedings of 14th ACM International Conference on Multimedia, pp 61–70
  22. 22.
    Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009) Tag ranking. In: Proceedings of the 18th International Conference on World Wide Web, pp 351–360
  23. 23.
    Liu D, Wang M, Yang L, Hua XS, Zhang HJ (2009) Tag quality improvement for social images. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 350–353
  24. 24.
    Liu J, Zhang Y, Li Z, Lu H (2013) Correlation consistency constrained probabilistic matrix factorization for social tag refinement. Neurocomputing 119(16):3–9. CrossRefGoogle Scholar
  25. 25.
    Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced Multitask structural learning. IEEE Trans Cybern 45(6):1194–1208. CrossRefGoogle Scholar
  26. 26.
    Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116. MathSciNetCrossRefGoogle Scholar
  27. 27.
    Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114. CrossRefGoogle Scholar
  28. 28.
    Liu AA, Xu N, Nie WZ, Su Y, Wong Y, Kankanhalli M (2017) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Trans Cybern 47(7):1781–1794. CrossRefGoogle Scholar
  29. 29.
    Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of the 12th annual ACM international conference on Multimedia, pp 348–351
  30. 30.
    Nie LQ, Wang M, Zha ZJ, Li G, Chua TS (2011) Multimedia answering:enriching text QA with media information. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 695–704
  31. 31.
    Nie LQ, Yan SC, Wang M, Hong RC, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the ACM International Conference on Multimedia, pp 59–68
  32. 32.
    Nie LQ, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search:a content-based approach to performance prediction. Acm Trans Inform Syst 30(2):13. CrossRefGoogle Scholar
  33. 33.
    Nie WZ, Liu AA, Su YT (2016) Cross-domain semantic transfer from large-scale social media. Multimed Syst 22(1):75–85. CrossRefGoogle Scholar
  34. 34.
    Richter F, Romberg S, Horster E, Lienhart R (2012) Leveraging community metadata for multimodal image ranking. Multimed Tools Appl 56(1):35–62. CrossRefGoogle Scholar
  35. 35.
    Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Trans Multimed 14(3):883–895. CrossRefGoogle Scholar
  36. 36.
    Sigurbjrnsson B, Van ZR (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of 17th ACM International Conference of World Wide Web, pp 327–336
  37. 37.
    Socher R, Li FF (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: Proceedings of the 23th IEEE Conference on Computer Vision and Pattern Recognition, pp 966–973
  38. 38.
    Tian F, Shen XK (2015) Learning semantic concepts from noisy media collection for automatic image annotation. Chin J Electron 24(4):790–794. CrossRefGoogle Scholar
  39. 39.
    Tian F, Liu XM, Liu ZX, Sun N, Wang M, Wang HC et al Multimedia integrated annotation based on common space learning, Multimedia Tools and Applications, Early access articles,
  40. 40.
    Tian F, Shen XK, Shang FH Automatic image annotation with realworld community contributed data set, Multimedia Systems, Early access articles,
  41. 41.
    Verbeek J, Guillaumin M, Mensink T, Schmid C (2010) Image annotation with tagprop on the mirflickr set. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp 537–546
  42. 42.
    Wang XJ, Zhang L, Li XR, Ma WY (2008) Annotating images by mining image search results. IEEE Trans Pattern Anal Mach Intell 30(11):1919–1932. CrossRefGoogle Scholar
  43. 43.
    Wang H, Huang H, Ding C (2009) Image annotation using multi-label correlated green’s function. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp 2029–2034
  44. 44.
    Wang H, Huang H, Ding C (2010) Multi-label feature transform for image classifications. In: Proceedings of the European Conference on Computer Vision, pp 793–806
  45. 45.
    Wang M, Ni B, Hua XS (2012) Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv 44(4):1–24. CrossRefGoogle Scholar
  46. 46.
    Wang J, Zhou J, Xu H, Mei T, Hua XS, Li S (2014) Image tag refinement by regularized latent Dirichlet allocation. Comput Vis Image Understand 124:61–70. CrossRefGoogle Scholar
  47. 47.
    Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716–727. CrossRefGoogle Scholar
  48. 48.
    Xu X, Shimada A, Taniguchi RI (2014) Tag completion with defective tag assignments via image-tag re-weighting. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp 1–6
  49. 49.
    Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD, pp 1–7
  50. 50.
    Zhou B, Jagadeesh V, Piramuthu R (2015) Conceptlearner: Discovering visual concepts from weakly labeled image collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
  51. 51.
    Zhu G, Yan S, Ma Y (2010) Image tag refinement towards low-rank, content-tag prior and error sparsity. In: Proceedings of the 18th ACM international conference on Multimedia, pp 461–470
  52. 52.
    Zhu X, Nejdl W, Georgescu M (2014) An adaptive teleportation random walk model for learning social tag relevance. In: Proceedings of the 37th International ACM SIGIR Conference on Research andamp; Development in Information Retrieval, pp 223–232

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Computer and Information TechnologyNortheast Petroleum UniversityDaQingChina
  2. 2.State Key Laboratory of Virtual Reality Technology and SystemsBeiHang UniversityBeijingChina

Personalised recommendations