Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3579–3595 | Cite as

Semi-supervised modality-dependent cross-media retrieval

  • Xiao Dong
  • Jiande Sun
  • Peiyong Duan
  • Lili Meng
  • Yanyan Tan
  • Wenbo Wan
  • Hongchen Wu
  • Bin Zhang
  • Huaxiang Zhang
Article
  • 167 Downloads

Abstract

In this paper, we propose a modality-dependent cross-media retrieval approach under semi-supervised conditions. The approach utilizes both labeled samples and unlabeled ones to obtain two couples of projection matrices and uses feature distance to represent the semantic information of unlabeled samples in the optimization process, so as to fully utilize the data structural information. Different from supervised modality-dependent cross-media retrieval approaches which use labeled samples and fixed semantic information, the proposed approach makes full use of the global data distribution property and the semantic information of both labeled and unlabeled samples. Experiments on benchmark datasets show its superiority over the compared methods.

Keywords

Cross-media retrieval Subspace learning Semantic information Semi-supervised learning 

Notes

Acknowledgements

The work is partially supported by the National Natural Science Foundation of China (Nos. 61373081, 61572298, 61772322), the Key Research and Development Foundation of Shandong Province (No. 2016GGX101009) and the Natural Science Foundation of Shandong China (Nos. ZR2016FB12, ZR2014FM012, ZR2015PF006). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.

References

  1. 1.
    Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255Google Scholar
  2. 2.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATHGoogle Scholar
  3. 3.
    Chang X, Yang Y (2016) Semi-supervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems.  https://doi.org/10.1109/TNNLS.2016.2582746
  4. 4.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-eighth AAAI conference on artificial intelligence, vol 2, pp 1171–1177Google Scholar
  5. 5.
    Chang X, Yu Y, Yang Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell.  https://doi.org/10.1109/TPAMI.2016.2608901
  6. 6.
    Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM international conference on image and video retrieval, 48Google Scholar
  7. 7.
    Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106:210–233CrossRefGoogle Scholar
  8. 8.
    Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: IEEE computer society conference on computer vision & pattern recognition IEEE computer society, vol 2, pp 506–513Google Scholar
  9. 9.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 1097–1105Google Scholar
  10. 10.
    Ranjan V, Rasiwasia N, Jawahar CV (2016) Multi-label cross-modal retrieval. In: IEEE international conference on computer vision. IEEE, pp 4094–4102Google Scholar
  11. 11.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International conference on multimedia. ACM, pp 251–260Google Scholar
  12. 12.
    Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. AISTATS, 823–831Google Scholar
  13. 13.
    Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: IEEE conference on computer vision and pattern recognition (CVPR), 2011, pp 593–600Google Scholar
  14. 14.
    Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp 2160–2167Google Scholar
  15. 15.
    Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. Adv Neural Inf Proces Syst 2222–2230Google Scholar
  16. 16.
    Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2016) Modality-dependent cross-media retrieval. ACM Trans Intell Syst Technol 7:57CrossRefGoogle Scholar
  17. 17.
    Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with CNN visual features a new baseline. IEEE Transactions on Cybernetics 47:449–460Google Scholar
  18. 18.
    Wu F, Zhang H, Zhuang Y (2006) Learning semantic correlations for cross-media retrieval. In: IEEE international conference on image processing, 2006, pp 1465–1468Google Scholar
  19. 19.
    Wu F, Lu X, Zhang Z, Yan S, Rui Y, Zhuang Y (2013) Cross-media semantic representation via bi-directional learning to rank. In: ACM international conference on multimedia, pp 877–886Google Scholar
  20. 20.
    Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp 3441–3450Google Scholar
  21. 21.
    Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems 19:395–406CrossRefGoogle Scholar
  22. 22.
    Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24:965–978CrossRefGoogle Scholar
  23. 23.
    Zhang H, Jing L (2009) Semi-supervised fuzzy clustering: a kernel-based approach. Knowl-Based Syst 22:477–481CrossRefGoogle Scholar
  24. 24.
    Zhang H, Jing L (2010) SCTWC: an online semi-supervised clustering approach to topical web crawlers. Appl Soft Comput 10:490–495CrossRefGoogle Scholar
  25. 25.
    Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161:1790–1802MathSciNetCrossRefGoogle Scholar
  26. 26.
    Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47:3168–3178CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Xiao Dong
    • 1
  • Jiande Sun
    • 1
    • 2
  • Peiyong Duan
    • 1
  • Lili Meng
    • 1
    • 2
  • Yanyan Tan
    • 1
    • 2
  • Wenbo Wan
    • 1
    • 2
  • Hongchen Wu
    • 1
    • 2
  • Bin Zhang
    • 1
  • Huaxiang Zhang
    • 1
    • 2
  1. 1.School of Information Science and EngineeringShandong Normal UniversityJinanChina
  2. 2.Institute of Data Science and TechnologyShandong Normal UniversityJinanChina

Personalised recommendations