Multimedia Tools and Applications

, Volume 77, Issue 3, pp 3009–3027 | Cite as

Joint graph regularization based modality-dependent cross-media retrieval



Cross-media retrieval returns heterogeneous multimedia data of the same semantics for a query object, and the key problem for cross-media retrieval is how to deal with the correlations of heterogeneous multimedia data. Many works focus on mapping different modal data into an isomorphic space, so the similarities between different modal data can be measured. Inspired by this idea, we propose a joint graph regularization based modality-dependent cross-media retrieval approach (JGRMDCR), which takes into account the one-to-one correspondence between different modal data pairs, the inter-modality similarities and the intra-modality similarities. Meanwhile, according to the modality of the query object, this method learns different projection matrices for different retrieval tasks. Experimental results on benchmark datasets show that the proposed approach outperforms the other state-of-the-art algorithms.


Cross-media retrieval Correlation analysis Joint graph regularization 



The work is partially supported by the National Natural Science Foundation of China (Nos. 61373081, 61572298, 61402268, 61401260, 61601268), the Key Research and Development Foundation of Shandong Province (No. 2016GGX101009) and the Natural Science Foundation of Shandong China (No.BS2014DX006, ZR2014FM012). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.


  1. 1.
    André B, Vercauteren T, Buchner AM, Wallace MB, Ayache N (2012) Learning semantic and visual similarity for endomicroscopy video retrieval. IEEE Trans Med Imaging 31(6):1276–1288CrossRefGoogle Scholar
  2. 2.
    Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Learn Syst 27(7):1502–1513MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection, Twenty-Eighth AAAI Conference on Artificial Intelligence, the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, the Symposium on Educational Advances in Artificial Intelligence, 2, 1171–1177Google Scholar
  4. 4.
    Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. doi: 10.1109/TNNLS.2016.2582746
  5. 5.
    Chang X, Yu YL, Yang Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell PP(99):1–1. doi: 10.1109/TPAMI.2016.2608901 Google Scholar
  6. 6.
    Escalante HJ, Hérnadez CA, Sucar LE, Montes M (2008) Late fusion of heterogeneous methods for multimedia image retrieval. In: ACM Sigmm international conference on multimedia information retrieval, pp 172–179Google Scholar
  7. 7.
    Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233CrossRefGoogle Scholar
  8. 8.
    Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia AD, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. Int Conf Softw Eng 8114:842–851Google Scholar
  9. 9.
    Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664CrossRefMATHGoogle Scholar
  10. 10.
    Hu P, Liu W, Jiang W, Yang Z (2014) Latent topic model for audio retrieval. Pattern Recogn 47(3):1138–1143CrossRefGoogle Scholar
  11. 11.
    Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19CrossRefGoogle Scholar
  12. 12.
    Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Eleventh ACM International conference on multimedia, pp 604–611Google Scholar
  13. 13.
    Lin W, Lu T, Su F (2012) A novel multi-modal integration and propagation model for cross-media information retrieval. Int Conf Multimed Model 7131:740–749Google Scholar
  14. 14.
    Liu GH, Yang JY (2013) Content-based image retrieval using color difference histogram. Pattern Recogn 46(1):188–198CrossRefGoogle Scholar
  15. 15.
    Nie X, Yin Y, Liu J, Sun J, Cui C (2017) Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Trans Multimed 19(4):785–796CrossRefGoogle Scholar
  16. 16.
    Peng Y, Zhai X, Zhao Y, Huang X (2015) Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans Circ Syst Video Technol 26(3):1–1Google Scholar
  17. 17.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International conference on multimedia, pp 251–260Google Scholar
  18. 18.
    Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster Canonical Correlation Analysis, Aistats, pp 823–831Google Scholar
  19. 19.
    Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: A discriminative latent space. Comput Vis Pattern Recognit 157:2160–2167Google Scholar
  20. 20.
    Shehata S, Karray F, Kamel MS (2013) An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 5(2):411–434CrossRefGoogle Scholar
  21. 21.
    Singha M, Hemachandran K (2012) Content based image retrieval using color and texture. Signal Image Process Int J 3(1):271–273Google Scholar
  22. 22.
    Song W, Cui Y, Peng Z (2015) A full-text retrieval algorithm for encrypted data in cloud storage applications. In: National CCf conference on natural language processing and Chinese computing, pp 229–241Google Scholar
  23. 23.
    Sun L, Ji S, Ye J (2011) Canonical correlation analysis for multilabel classification, A least-squares formulation, extensions, and analysis. IEEE Trans Pattern Anal Mach Intell 33(1):194–200CrossRefGoogle Scholar
  24. 24.
    Sun J, Liu X, Wan W, Li J, Zhao D, Zhang H (2016) Video hashing based on appearance and attention features fusion via DBN. Neurocomputing 213:84–94CrossRefGoogle Scholar
  25. 25.
    Virtanen S, Klami A, Kaski S (2011) Bayesian CCA via group sparsity. In: International conference on machine learning, pp 457–464Google Scholar
  26. 26.
    Vitola CPJ, Sepúlveda J, Martínez JI (2013) Fast content-based audio retrieval algorithm. In: Symposium of signals, images and artificial vision, pp 1–5Google Scholar
  27. 27.
    Wang Y, Zhang H, Yang F (2017) A weighted sparse neighbourhood-preserving projections for face recognition, IETE J Res, 1–10Google Scholar
  28. 28.
    Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2015) Modality-dependent Cross-media Retrieval. ACM Trans Intell Syst Technol 7(4):57Google Scholar
  29. 29.
    Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978CrossRefGoogle Scholar
  30. 30.
    Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178CrossRefMATHGoogle Scholar
  31. 31.
    Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119(16):10–16CrossRefGoogle Scholar
  32. 32.
    Zhang H, Liu X (2012) Cross-media semantics mining based on sparse canonical correlation analysis and relevance feedback. In: Advances in multimedia information processing - PCM 2012. Springer, Berlin Heidelberg, pp 759–768Google Scholar
  33. 33.
    Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161(13):1790–1802MathSciNetCrossRefGoogle Scholar
  34. 34.
    Zhou P, Du L, Fan M, Shen YD (2015) An LLE based heterogeneous metric learning for cross-media retrieval. In: Proceedings of the 2015 SIAM international conference on data mining, pp 64–72Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Jihong Yan
    • 1
  • Huaxiang Zhang
    • 1
    • 2
  • Jiande Sun
    • 1
    • 2
  • Qiang Wang
    • 1
    • 2
  • Peilian Guo
    • 1
    • 2
  • Lili Meng
    • 1
    • 2
  • Wenbo Wan
    • 1
    • 2
  • Xiao Dong
    • 1
  1. 1.School of Information Science and EngineeringShandong Normal UniversityJinanChina
  2. 2.Institute of Data Science and TechnologyShandong Normal UniversityJinanChina

Personalised recommendations