Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

Liu, Kai; Wei, Shikui; Zhao, Yao; Zhu, Zhenfeng; Wei, Yunchao; Xu, Changsheng

doi:10.1007/s11042-014-1968-4

Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

Published: 12 April 2014

Volume 74, pages 561–576, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Kai Liu^1,3,
Shikui Wei^1,3,
Yao Zhao^1,3,
Zhenfeng Zhu^1,3,
Yunchao Wei^1,3 &
…
Changsheng Xu²

378 Accesses
Explore all metrics

Abstract

Cross-media retrieval aims to automatically perform the content-based search procedure among various media types (e.g., image, video and text), in which media representation plays an important role for providing the heterogeneous similarity measure. In this work, a novel semantic representation of cross-media, called accumulated reconstruction error vector (AREV), is proposed, which includes category-specific dictionary learning, media sample reconstruction, and accumulative reconstruction error concatenation. Instead of directly learning the correlation relationship among heterogeneous items in the same semantic groups, the AREV projects individually their original feature descriptions into a shared semantic space, in which each component is semantic consistent for various media types due to the consistency in category information. Experiments on the commonly used datasets, i.e. Wikipedia dataset and NUS-Wide dataset, show the good performance in terms of effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-media retrieval based on semi-supervised regularization and correlation learning

Article 05 May 2018

Hong Zhang, Gang Dai, … Xin Xu

Joint graph regularization based modality-dependent cross-media retrieval

Article 15 June 2017

Jihong Yan, Huaxiang Zhang, … Xiao Dong

A cross-media distance metric learning framework based on multi-view correlation mining and matching

Article 21 April 2015

Hong Zhang, Xingyu Gao, … Xin Xu

References

Baidu Image Search, http://stu.baidu.com/
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Broilo M, De Natale FG (2010) A stochastic approach to image retrieval using relevance feedback and particle swarm optimization. IEEE Trans Multimed 12(4):267–277
Article Google Scholar
Chandrasekhar V, Sharifi M, Ross DA (2011) Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications. In: International Society for Music Information Retrieval, pp. 801–806, ISMIR
Chua TS, Tang JH, Hong RC, Li HJ, Luo ZP, Zheng YT (2009) NUS-WIDE: a real-world web image database from National University of Singapore. ACM Int Conf Image Video Retr. Greece. Jul. 8–10
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, CVPR 2005. IEEE Computer Society Conference on vol. 1, pp. 886–893, IEEE
Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimed 14(3):734–746
Article Google Scholar
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262, ACM
Google Image Search, http://images.google.com/
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1):209–215
Article Google Scholar
Han YH, Yang Y, Ma ZG, Shen HQ, Sebe N, Zhou XF (2014) Image attribute adaptation. IEEE Trans Multimed. doi:10.1109/TMM.2014.2306092
Google Scholar
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European Conference on Computer Vision. Springer, Berlin Heidelberg, pp. 304–317, ECCV
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 119–126, ACM
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Tran Multimed Comput Commun Appl (TOMCCAP) 2(1):1–19
Article Google Scholar
Ling L, Zhai X, Peng Y (2012) Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 230–233, IEEE
Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lu G (2001) Indexing and retrieval of audio: a survey. Multimed Tools Appl 15(3):269–290
Article MATH Google Scholar
Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In Proceedings of the 21st ACM International Conference on Multimedia, pp. 897–906, ACM
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260, ACM
Salton G, Allan J, Buckley C (1993) Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, ACM
Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24(4):35–43
Google Scholar
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of Ninth IEEE International Conference on Computer Vision, pp. 1470–1477, IEEE
Wang SH, Huang QM, Jiang SQ, Tian Q (2012) S³MKL: scalable semi-supervised multiple kernel learning for real world image data mining. IEEE Trans Multimed 14(4):1259–1274
Article Google Scholar
Wang SH, Huang QM, Jiang SQ, Tian Q (2012) Nearest-neighbor method using multiple neighborhood similarities for social media data mining. Neurocomputing 95(15):105–116
Article Google Scholar
Wang SH, Jiang SQ, Huang QM, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
Wang Z, Liu G, Yang Y (2012) A new ROI based image retrieval system using an auxiliary Gaussian weighting scheme. Multimed Tools Appl 1–21
Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern, online published
Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199
Article Google Scholar
Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28
Article Google Scholar
Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann A (2013) Feature weighting via optimal thresholding for video analysis. Int Conf Comput Vis
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Analy Mach Intell 34(4):723–742
Article Google Scholar
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184, ACM
Yang Y, Zhuang YT, Wu F, Pan YH (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446
Article Google Scholar
Younessian E, Rajan D (2013) Multi-modal fusion for associated news story retrieval. Multimed Tools Appl, pp. 1–23

Download references

Acknowledgments

This work was supported in part by the 973 Program (No. 2012CB316400), PCSIRT (No.IRT201206), the National Science Foundation of China (No.61202241, No.61210006, and No.61025013), the Fundamental Research Funds for the Central Universities (No.2013JBM024), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Kai Liu, Shikui Wei, Yao Zhao, Zhenfeng Zhu & Yunchao Wei
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Changsheng Xu
Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, 100044, China
Kai Liu, Shikui Wei, Yao Zhao, Zhenfeng Zhu & Yunchao Wei

Authors

Kai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shikui Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yunchao Wei
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shikui Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, K., Wei, S., Zhao, Y. et al. Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval. Multimed Tools Appl 74, 561–576 (2015). https://doi.org/10.1007/s11042-014-1968-4

Download citation

Published: 12 April 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11042-014-1968-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

Abstract

Access this article

Similar content being viewed by others

Cross-media retrieval based on semi-supervised regularization and correlation learning

Joint graph regularization based modality-dependent cross-media retrieval

A cross-media distance metric learning framework based on multi-view correlation mining and matching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accumulated reconstruction error vector (AREV): a semantic representation for cross-media retrieval

Abstract

Access this article

Similar content being viewed by others

Cross-media retrieval based on semi-supervised regularization and correlation learning

Joint graph regularization based modality-dependent cross-media retrieval

A cross-media distance metric learning framework based on multi-view correlation mining and matching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation