Multimedia Tools and Applications

, Volume 74, Issue 2, pp 561–576 | Cite as

Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

  • Kai Liu
  • Shikui WeiEmail author
  • Yao Zhao
  • Zhenfeng Zhu
  • Yunchao Wei
  • Changsheng Xu


Cross-media retrieval aims to automatically perform the content-based search procedure among various media types (e.g., image, video and text), in which media representation plays an important role for providing the heterogeneous similarity measure. In this work, a novel semantic representation of cross-media, called accumulated reconstruction error vector (AREV), is proposed, which includes category-specific dictionary learning, media sample reconstruction, and accumulative reconstruction error concatenation. Instead of directly learning the correlation relationship among heterogeneous items in the same semantic groups, the AREV projects individually their original feature descriptions into a shared semantic space, in which each component is semantic consistent for various media types due to the consistency in category information. Experiments on the commonly used datasets, i.e. Wikipedia dataset and NUS-Wide dataset, show the good performance in terms of effectiveness and efficiency.


Cross-media Accumulated reconstruction error vector Retrieval Consistency Dictionary learning 



This work was supported in part by the 973 Program (No. 2012CB316400), PCSIRT (No.IRT201206), the National Science Foundation of China (No.61202241, No.61210006, and No.61025013), the Fundamental Research Funds for the Central Universities (No.2013JBM024), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).


  1. 1.
    Baidu Image Search,
  2. 2.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  3. 3.
    Broilo M, De Natale FG (2010) A stochastic approach to image retrieval using relevance feedback and particle swarm optimization. IEEE Trans Multimed 12(4):267–277CrossRefGoogle Scholar
  4. 4.
    Chandrasekhar V, Sharifi M, Ross DA (2011) Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications. In: International Society for Music Information Retrieval, pp. 801–806, ISMIRGoogle Scholar
  5. 5.
    Chua TS, Tang JH, Hong RC, Li HJ, Luo ZP, Zheng YT (2009) NUS-WIDE: a real-world web image database from National University of Singapore. ACM Int Conf Image Video Retr. Greece. Jul. 8–10Google Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, CVPR 2005. IEEE Computer Society Conference on vol. 1, pp. 886–893, IEEEGoogle Scholar
  7. 7.
    Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimed 14(3):734–746CrossRefGoogle Scholar
  8. 8.
    Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262, ACMGoogle Scholar
  9. 9.
    Google Image Search,
  10. 10.
    Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215CrossRefGoogle Scholar
  11. 11.
    Han YH, Yang Y, Ma ZG, Shen HQ, Sebe N, Zhou XF (2014) Image attribute adaptation. IEEE Trans Multimed. doi: 10.1109/TMM.2014.2306092 Google Scholar
  12. 12.
    Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision. Springer, Berlin Heidelberg, pp. 304–317, ECCVGoogle Scholar
  13. 13.
    Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126, ACMGoogle Scholar
  14. 14.
    Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Tran Multimed Comput Commun Appl (TOMCCAP) 2(1):1–19CrossRefGoogle Scholar
  15. 15.
    Ling L, Zhai X, Peng Y (2012) Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 230–233, IEEEGoogle Scholar
  16. 16.
    Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331CrossRefGoogle Scholar
  17. 17.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  18. 18.
    Lu G (2001) Indexing and retrieval of audio: a survey. Multimed Tools Appl 15(3):269–290CrossRefzbMATHGoogle Scholar
  19. 19.
    Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In Proceedings of the 21st ACM International Conference on Multimedia, pp. 897–906, ACMGoogle Scholar
  20. 20.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260, ACMGoogle Scholar
  21. 21.
    Salton G, Allan J, Buckley C (1993) Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, ACMGoogle Scholar
  22. 22.
    Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24(4):35–43Google Scholar
  23. 23.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 1470–1477, IEEEGoogle Scholar
  24. 24.
    Wang SH, Huang QM, Jiang SQ, Tian Q (2012) S3MKL: scalable semi-supervised multiple kernel learning for real world image data mining. IEEE Trans Multimed 14(4):1259–1274CrossRefGoogle Scholar
  25. 25.
    Wang SH, Huang QM, Jiang SQ, Tian Q (2012) Nearest-neighbor method using multiple neighborhood similarities for social media data mining. Neurocomputing 95(15):105–116CrossRefGoogle Scholar
  26. 26.
    Wang SH, Jiang SQ, Huang QM, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  27. 27.
    Wang Z, Liu G, Yang Y (2012) A new ROI based image retrieval system using an auxiliary Gaussian weighting scheme. Multimed Tools Appl 1–21Google Scholar
  28. 28.
    Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern, online publishedGoogle Scholar
  29. 29.
    Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199CrossRefGoogle Scholar
  30. 30.
    Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28CrossRefGoogle Scholar
  31. 31.
    Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann A (2013) Feature weighting via optimal thresholding for video analysis. Int Conf Comput VisGoogle Scholar
  32. 32.
    Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Analy Mach Intell 34(4):723–742CrossRefGoogle Scholar
  33. 33.
    Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184, ACMGoogle Scholar
  34. 34.
    Yang Y, Zhuang YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446CrossRefGoogle Scholar
  35. 35.
    Younessian E, Rajan D (2013) Multi-modal fusion for associated news story retrieval. Multimed Tools Appl, pp. 1–23Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Kai Liu
    • 1
    • 3
  • Shikui Wei
    • 1
    • 3
    Email author
  • Yao Zhao
    • 1
    • 3
  • Zhenfeng Zhu
    • 1
    • 3
  • Yunchao Wei
    • 1
    • 3
  • Changsheng Xu
    • 2
  1. 1.Institute of Information ScienceBeijing Jiaotong UniversityBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina
  3. 3.Beijing Key Laboratory of Advanced Information Science and Network TechnologyBeijingChina

Personalised recommendations