Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 23, pp 30269–30290 | Cite as

Deep-MATEM: TEM query image based cross-modal retrieval for material science literature

  • Hailiang Li
  • Qingxiao Guan
  • Haidong Wang
  • Jing Dong
Article
  • 76 Downloads

Abstract

With the rapid increasing of published material science literatures, an effective literature retrieving system is important for researchers to obtain relevant information. In this paper we propose a cross-modal material science literatures retrieval method using transmission electron microscopy(TEM) image as query information, which provide a access of using material experiment generated TEM image data to retrieve literatures. In this method, terminologies are extracted and topic distribution are inferred from text part of literatures by using LDA, and we design a multi-task Convolutional Neuron Network(CNN) mapping query TEM image to the relevant terminologies and topic distribution predictions. The ranking score is calculated from output for query image and text data. Experimental results shows our method achieves better performance than multi-label CCA, Deep Semantic Matching(Deep SM) and Modality-Specific Deep Structure(MSDS).

Keywords

Cross-Modal Document retrieval Convolutional network Material science 

Notes

Acknowledgements

This work was supported by the National Natural Science Foundation of China under U1536105, NO.51474237, U1536120, U1636201, the National Key Research and Development Program of China(No. 2016YFB1001003)

References

  1. 1.
    Alex K, Ilya S, Geoffrey H (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, pp 1097–1105Google Scholar
  2. 2.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359CrossRefGoogle Scholar
  3. 3.
    Blei D, Jordan M (2003) Modeling annotated data. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 127–134Google Scholar
  4. 4.
    Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  5. 5.
    Callister W, Rethwisch D (2013) Materials science and engineering an introduction, 9th. Wiley, USAGoogle Scholar
  6. 6.
    Cao G, Iosifidis A, Chen K, Gabbouj M (2018) Generalized multi-view embedding for visual recognition and cross-modal retrieval. IEEE Trans Cybern 99:1–14Google Scholar
  7. 7.
    Cheng MM, Zhang Z, Lin WY, Torr P (2014) Bing: Binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, pp 3286–3293Google Scholar
  8. 8.
    Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE conference on computer vision, pp 1440–1448Google Scholar
  9. 9.
    He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition, pp 346–361Google Scholar
  10. 10.
    He K, Zhang X, Ren S, Sun J (2016) Learning and transferring representations for image steganalysis using convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770778Google Scholar
  11. 11.
    Jiang X, Wu F, Li X, Zhao Z, Lu W, Tang S, Zhuang Y (2015) Deep compositional cross-modal learning to rank via local-global alignment. In: Proceedings of the 23rd ACM international conference on multimedia, pp 69–78Google Scholar
  12. 12.
    Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: fully convolutional localization networks for dense captioning. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4565–4574Google Scholar
  13. 13.
    Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39:664–676CrossRefGoogle Scholar
  14. 14.
    Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838CrossRefGoogle Scholar
  15. 15.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110MathSciNetCrossRefGoogle Scholar
  16. 16.
    Nikhil R, Jose CP, Emanuele C, Gabriel D, Lanckriet1 GRG, Roger L, Nuno V (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260Google Scholar
  17. 17.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefGoogle Scholar
  18. 18.
    Qian YL, Dong J, Wang W, Tan T (2015) Learning representations for steganalysis from regularized cnn model with auxiliary tasks. In: International conference on communications, signal processing, and systems (CSPS2015)Google Scholar
  19. 19.
    Qian YL, Dong J, Wang W, Tan T (2016) Learning and transferring representations for image steganalysis using convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), pp 2752–2756Google Scholar
  20. 20.
    Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval. In: 2015 IEEE international conference on computer vision (ICCV), pp 4094–4102Google Scholar
  21. 21.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
  22. 22.
    Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581MathSciNetCrossRefGoogle Scholar
  23. 23.
    Wang Y, Wu F, Song J, Li X, Zhuang Y (2014) Multi-modal mutual topic reinforce modeling for cross-media retrieval. In: Proceedings of the international ACM SIGIR conference on research and development in informaion retrieval, pp 307–316Google Scholar
  24. 24.
    Wang J, He Y, Kang C, Xiang S, Pan C (2015) Image-text cross-modal retrieval via modality-specific feature learning. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, pp 347–354Google Scholar
  25. 25.
    Wang D, Gao X, Wang X, He L, Yuan B (2016) Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans Image Process 25:4540–4554MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2285–2294Google Scholar
  27. 27.
    Wei Y, Xia W, Huang J, Ni B, Dong J, Zhao Y, Yan S (2014) CNN: single-label to multi-label. CoRR arXiv:1406.5726
  28. 28.
    Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z, Yan S (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Trans Cyber 47 (2):449–460Google Scholar
  29. 29.
    Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(8):207–224zbMATHGoogle Scholar
  30. 30.
    Xu X, Shen F, Yang Y, Shen H, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26:2494–2507MathSciNetCrossRefGoogle Scholar
  31. 31.
    You X, Li Q, Tao D, Ou W, Gong M (2014) Local metric learning for exemplar-based object detection. IEEE Trans Circ Syst Video Technol 24(8):1265–1276CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Minerals Processing and BioengineeringCentral South UniversityChangshaChina
  2. 2.State Key Laboratory of Information Security, Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  3. 3.School of Cyber SecurityUniversity of Chinese Academy of SciencesBeijingChina
  4. 4.Center for Research on Intelligent Perception and Computing, Institute of AutomationChinese Academy of ScienceBeijingChina

Personalised recommendations