A cross-modal multimedia retrieval method using depth correlation mining in big data environment

  • Dongliang Xia
  • Lu MiaoEmail author
  • Aiwan Fan


Cross-media retrieval is a technology aimed at breaking through the shackles of single-mode retrieval technology, which is limited to the same multimedia form. It is also hoped to be able to search each other across the media form. Comprehensive processing of different multimedia morphological data is an urgent problem to be solved in cross-media retrieval area, in other words, the semantic relationship between potential features should be mined, which will improve their similarity. To solve the above problems, a deep correlation mining method is proposed, which trains different media features by deep learning, and then fuses the correlation between the trained features to solve the heterogeneity between different features, which will make the features of different multimedia data comparable. On this basis, Levenberg-Marquart method is applied to solve the problem that deep learning is easy to fall into local minimum solution in gradient training. Experiments on different databases show that the proposed method is effective in the field of cross-media retrieval. Compared with other advanced multimedia retrieval methods, the proposed method has achieved better retrieval results.


Deep correlation mining Cross-modal Big data Multimedia retrieval Similarity Levenberg-Marquart method 



This work was supported by the applied research plan of key scientific research projects in Henan colleges and Universities (No. 18B520028); The Technology Plan Project of Henan Science (No. 182102210471).


  1. 1.
    Cao G, Iosifidis A, Chen K et al (2018) Generalized multi-view embedding for visual recognition and cross-modal retrieval[J]. IEEE Trans Cybern 48(9):2542–2555CrossRefGoogle Scholar
  2. 2.
    Carvalho M, Cadène R, Picard D, et al. (2018) Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings[C]//Google Scholar
  3. 3.
    Irie G, Asami T, Tarashima S, et al. (2017) Cross-modal transfer with neural word vectors for image feature learning[C]// IEEE International Conference on AcousticsGoogle Scholar
  4. 4.
    Jia Y, Liang B, Shuang L et al (2018) Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval[J]. Multimed Tools Appl 2:1–20Google Scholar
  5. 5.
    Kai L, Qi G J, Ye J, et al. (2017) Linear Subspace Ranking Hashing for Cross-Modal Retrieval[J]. IEEE Trans Pattern Anal Mach Intell, pp(99): 1825–1838.Google Scholar
  6. 6.
    Kumalasari R, Srigutomo W, Djamal M, et al. (2018) Location of Sinabung volcano magma chamber on 2013 using lavenberg-marquardt inversion scheme[C]//Google Scholar
  7. 7.
    Liang Z, Ma B, Li G et al (2017) Cross-modal retrieval using multiordered discriminative structured subspace learning[J]. IEEE Trans Multimed 19(6):1–1CrossRefGoogle Scholar
  8. 8.
    Liu X, Li A, Du JX et al (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing[J]. Multimed Tools Appl 77(21):28665–28683CrossRefGoogle Scholar
  9. 9.
    Tran T Q N, Crucianu M. (2016) Cross-modal Classification by Completing Unimodal Representations[C]// Acm Workshop on Vision & Language Integration Meets Multimedia FusionGoogle Scholar
  10. 10.
    Uma R, Muneeswaran K (2017) OMIR: ontology-based multimedia information retrieval system for web usage mining[J]. Cybern Syst 48(4):1–22CrossRefGoogle Scholar
  11. 11.
    Wang S, Wu Y, Huang Q. (2015) Improving cross-modal correlation learning with hyperlinks[C]// IEEE International Conference on Multimedia & ExpoGoogle Scholar
  12. 12.
    Wang B , Yang Y , Xu X , et al. (2017) Adversarial Cross-Modal Retrieval[C]// Proceedings of the 25th ACM international conference on Multimedia. ACMGoogle Scholar
  13. 13.
    Wang L, Sun W, Zhao Z, et al. (2017) Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval[J]. Signal Processing, 131(Complete):249–260.CrossRefGoogle Scholar
  14. 14.
    Wang J, Li G, Peng P et al (2017) Semi-supervised semantic factorization hashing for fast cross-modal retrieval[J]. Multimed Tools Appl 76(3):1–19Google Scholar
  15. 15.
    Wang L, Ma C, Tu E, et al. (2018) Discrete Sparse Hashing for Cross-Modal Similarity Search[C]// International Conference on Neural Information ProcessingGoogle Scholar
  16. 16.
    Wu Y, Wang S, Zhang W, et al. (2017) Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval[C]// IEEE International Conference on Multimedia & ExpoGoogle Scholar
  17. 17.
    Xin H, Peng Y, Yuan M. (2017) MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval[J]. IEEE Transactions on Cybernetics, PP(99):1–13Google Scholar
  18. 18.
    Xing X, Shimada A, Taniguchi R I, et al. (2015) Coupled dictionary learning and feature mapping for cross-modal retrieval[C]// IEEE International Conference on Multimedia & ExpoGoogle Scholar
  19. 19.
    Xing X, Li H, Lu H et al (2018) Deep adversarial metric learning for cross-modal retrieval[J]. World Wide Web 22(3):1–16Google Scholar
  20. 20.
    Yan H, Hu T, Cai A, et al. (2016) Cross-modal correlation learning with deep convolutional architecture[C]// Visual Communications & Image ProcessingGoogle Scholar
  21. 21.
    Yang Y, Zheng X, Chang V et al (2017) Lattice assumption based fuzzy information retrieval scheme support multi-user for secure multimedia cloud[J]. Multimed Tools Appl 77(1):1–15CrossRefGoogle Scholar
  22. 22.
    Yao L, Yuan Y, Huang Q, et al. (2016) Hashing for Cross-Modal Similarity Retrieval[C]// International Conference on SemanticsGoogle Scholar
  23. 23.
    Yue C, Long M, Wang J, et al. (2016) Correlation Autoencoder Hashing for Supervised Cross-Modal Search[C]// Acm on International Conference on Multimedia RetrievalGoogle Scholar
  24. 24.
    Zhang L, Ma B, Li G, et al. (2017) Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval[C]// Acm on Multimedia ConferenceGoogle Scholar
  25. 25.
    Zhong F, Chen Z, Min G (2018) Deep discrete cross-modal hashing for cross-media retrieval[J]. Pattern Recogn 83:64–77CrossRefGoogle Scholar
  26. 26.
    Zou F, Bai X, Luan C et al (2018) Semi-supervised cross-modal learning for cross modal retrieval and image annotation[J]. World Wide Web 22(41):825–841Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of SoftwarePingdingshan UniversityPingdingshanChina

Personalised recommendations