Skip to main content
Log in

Semi-supervised modality-dependent cross-media retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a modality-dependent cross-media retrieval approach under semi-supervised conditions. The approach utilizes both labeled samples and unlabeled ones to obtain two couples of projection matrices and uses feature distance to represent the semantic information of unlabeled samples in the optimization process, so as to fully utilize the data structural information. Different from supervised modality-dependent cross-media retrieval approaches which use labeled samples and fixed semantic information, the proposed approach makes full use of the global data distribution property and the semantic information of both labeled and unlabeled samples. Experiments on benchmark datasets show its superiority over the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255

  2. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Chang X, Yang Y (2016) Semi-supervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2016.2582746

    Article  MathSciNet  Google Scholar 

  4. Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-eighth AAAI conference on artificial intelligence, vol 2, pp 1171–1177

  5. Chang X, Yu Y, Yang Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2016.2608901

    Article  Google Scholar 

  6. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM international conference on image and video retrieval, 48

  7. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106:210–233

    Article  Google Scholar 

  8. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: IEEE computer society conference on computer vision & pattern recognition IEEE computer society, vol 2, pp 506–513

  9. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 1097–1105

  10. Ranjan V, Rasiwasia N, Jawahar CV (2016) Multi-label cross-modal retrieval. In: IEEE international conference on computer vision. IEEE, pp 4094–4102

  11. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International conference on multimedia. ACM, pp 251–260

  12. Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. AISTATS, 823–831

  13. Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: IEEE conference on computer vision and pattern recognition (CVPR), 2011, pp 593–600

  14. Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp 2160–2167

  15. Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. Adv Neural Inf Proces Syst 2222–2230

  16. Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7:57

    Article  Google Scholar 

  17. Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with CNN visual features a new baseline. IEEE Transactions on Cybernetics 47:449–460

    Google Scholar 

  18. Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, 2006, pp 1465–1468

  19. Wu F, Lu X, Zhang Z, Yan S, Rui Y, Zhuang Y (2013) Cross-media semantic representation via bi-directional learning to rank. In: ACM international conference on multimedia, pp 877–886

  20. Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp 3441–3450

  21. Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems 19:395–406

    Article  Google Scholar 

  22. Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24:965–978

    Article  Google Scholar 

  23. Zhang H, Jing L (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl-Based Syst 22:477–481

    Article  Google Scholar 

  24. Zhang H, Jing L (2010) SCTWC: an online semi-supervised clustering approach to topical web crawlers. Appl Soft Comput 10:490–495

    Article  Google Scholar 

  25. Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161:1790–1802

    Article  MathSciNet  Google Scholar 

  26. Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47:3168–3178

    Article  Google Scholar 

Download references

Acknowledgements

The work is partially supported by the National Natural Science Foundation of China (Nos. 61373081, 61572298, 61772322), the Key Research and Development Foundation of Shandong Province (No. 2016GGX101009) and the Natural Science Foundation of Shandong China (Nos. ZR2016FB12, ZR2014FM012, ZR2015PF006). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jiande Sun or Huaxiang Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, X., Sun, J., Duan, P. et al. Semi-supervised modality-dependent cross-media retrieval. Multimed Tools Appl 77, 3579–3595 (2018). https://doi.org/10.1007/s11042-017-5164-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5164-1

Keywords

Navigation