Abstract
In this paper, we propose a modality-dependent cross-media retrieval approach under semi-supervised conditions. The approach utilizes both labeled samples and unlabeled ones to obtain two couples of projection matrices and uses feature distance to represent the semantic information of unlabeled samples in the optimization process, so as to fully utilize the data structural information. Different from supervised modality-dependent cross-media retrieval approaches which use labeled samples and fixed semantic information, the proposed approach makes full use of the global data distribution property and the semantic information of both labeled and unlabeled samples. Experiments on benchmark datasets show its superiority over the compared methods.
Similar content being viewed by others
References
Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang X, Yang Y (2016) Semi-supervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2016.2582746
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-eighth AAAI conference on artificial intelligence, vol 2, pp 1171–1177
Chang X, Yu Y, Yang Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2608901
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM international conference on image and video retrieval, 48
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106:210–233
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: IEEE computer society conference on computer vision & pattern recognition IEEE computer society, vol 2, pp 506–513
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 1097–1105
Ranjan V, Rasiwasia N, Jawahar CV (2016) Multi-label cross-modal retrieval. In: IEEE international conference on computer vision. IEEE, pp 4094–4102
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International conference on multimedia. ACM, pp 251–260
Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. AISTATS, 823–831
Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: IEEE conference on computer vision and pattern recognition (CVPR), 2011, pp 593–600
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp 2160–2167
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. Adv Neural Inf Proces Syst 2222–2230
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7:57
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with CNN visual features a new baseline. IEEE Transactions on Cybernetics 47:449–460
Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, 2006, pp 1465–1468
Wu F, Lu X, Zhang Z, Yan S, Rui Y, Zhuang Y (2013) Cross-media semantic representation via bi-directional learning to rank. In: ACM international conference on multimedia, pp 877–886
Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp 3441–3450
Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems 19:395–406
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24:965–978
Zhang H, Jing L (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl-Based Syst 22:477–481
Zhang H, Jing L (2010) SCTWC: an online semi-supervised clustering approach to topical web crawlers. Appl Soft Comput 10:490–495
Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161:1790–1802
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47:3168–3178
Acknowledgements
The work is partially supported by the National Natural Science Foundation of China (Nos. 61373081, 61572298, 61772322), the Key Research and Development Foundation of Shandong Province (No. 2016GGX101009) and the Natural Science Foundation of Shandong China (Nos. ZR2016FB12, ZR2014FM012, ZR2015PF006). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Dong, X., Sun, J., Duan, P. et al. Semi-supervised modality-dependent cross-media retrieval. Multimed Tools Appl 77, 3579–3595 (2018). https://doi.org/10.1007/s11042-017-5164-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5164-1