Cross Modal Retrieval for Different Modalities in Multimedia

Osheen, T. J.; Mathew, Linda Sara

doi:10.1007/978-3-030-37218-7_19

T. J. Osheen¹⁸ &
Linda Sara Mathew¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1108))

Included in the following conference series:

International Conference On Computational Vision and Bio Inspired Computing

1835 Accesses

Abstract

Multimedia data like text, image and video is used widely, along with the development of social media. In order, to obtain the accurate multimedia information rapidly and effectively for a huge amount of sources remains as a challenging task. Cross-modal retrieval tries to break through the modality of different media objects that can be regarded as a unified multimedia retrieval approach. For many real-world applications, cross-modal retrieval is becoming essential from inputting the image to load the connected text documents or considering text to choose the accurate results. Video retrieval depends on semantics that includes characteristics like graphical and notion based video. Because of the combined exploitation of all these methodologies, the cross-modal content framework of multimedia data is effectively conserved, when this data is mapped into the combined subspace. The aim is to group together the text, image and video components of multimedia document which shows the similarity in features and to retrieve the most accurate image, text, or video according to the query given, based on the semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, L., Sun, W., Zhao, Z., Su, F.: Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Sig. Process. 131, 249–260 (2017)
Article Google Scholar
Bai, X., Yan, C., Yang, H., Bai, L., Zhou, J., Hancock, E.R.: Adaptive hash retrieval with kernel based similarity (2017)
Google Scholar
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for crossmodal matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2095 (2013)
Google Scholar
Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)
Article Google Scholar
Wang, J., He, Y., Kang, C., Xiang, S., Pan, C.: Image-text cross-modal retrieval via modality-specific feature learning. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 347–354 (2015)
Google Scholar
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM on Multimedia Conference, pp. 251–260 (2010)
Google Scholar
Jiang, B., Yang, J., Lv, Z., Tian, K., Meng, Q., Yan, Y.: Internet cross-media retrieval based on deep learning. J. Vis. Commun. Image Retrieval 48, 356–366 (2017)
Article Google Scholar
Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. 47, 251–260 (2016)
Google Scholar
Pereira, J.C., Vasconcelos, N.: Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems. Comput. Vis. Image Underst. 124, 123–135 (2014)
Article Google Scholar
He, J., Ma, B., Wang, S., Liu, Y.: Multi-label double-layer learning for cross-modal retrieval. Neuro-computing 123–135 (2017)
Google Scholar
Hu, X., Yu, Z., Zhou, H., Lv, H., Jiang, Z., Zhou, X.: An adaptive solution for large-scale, cross-video, and real-time visual analytics. In: IEEE International Conference on Multimedia Big Data, pp. 251–260 (2015)
Google Scholar
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam, Ernakulam, 686666, Kerala, India
T. J. Osheen & Linda Sara Mathew

Authors

T. J. Osheen
View author publications
You can also search for this author in PubMed Google Scholar
Linda Sara Mathew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to T. J. Osheen .

Editor information

Editors and Affiliations

Department of CSE, RVS Technical Campus, Coimbatore, India
S. Smys
Faculty of Engineering, Faculdade de Engenharia da Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
Faculty of Engineering, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Abdullah M. Iliyasu

Ethics declarations

✓ All authors declare that there is no conflict of interest.

✓ No humans/animals involved in this research work.

✓ We have used our own data.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Osheen, T.J., Mathew, L.S. (2020). Cross Modal Retrieval for Different Modalities in Multimedia. In: Smys, S., Tavares, J., Balas, V., Iliyasu, A. (eds) Computational Vision and Bio-Inspired Computing. ICCVBIC 2019. Advances in Intelligent Systems and Computing, vol 1108. Springer, Cham. https://doi.org/10.1007/978-3-030-37218-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-37218-7_19
Published: 07 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37217-0
Online ISBN: 978-3-030-37218-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics