Boosting Multimodal Semantic Understanding by Local Similarity Adaptation and Global Correlation Propagation

Zhang, Hong; Liu, Xiaoli

doi:10.1007/978-3-642-15702-8_14

Hong Zhang²² &
Xiaoli Liu²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6297))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1466 Accesses
1 Citations

Abstract

An important trend in multimedia semantic understanding is the utilization and support of multimodal data which are heterogeneous in low-level features, such as image and audio. The main challenge is how to measure different kinds of correlations among multimodal data. In this paper, we propose a novel approach to boost multimodal semantic understanding from local and global perspectives. First, cross-media correlation between images and audio clips is estimated with Kernel Canonical Correlation Analysis; secondly, a multimodal graph is constructed to enable global correlation propagation with adapted intra-media similarity; then cross-media retrieval algorithm is discussed as an application of our approach. A prototype system is developed to demonstrate the feasibility and capability. Experimental results are encouraging and show that the performance of our approach is effective.

This work is supported by Scientific Research Project funded by Education Department of Hubei Province (Q20091101), Science Foundation of Wuhan University of Science and Technology(2008TD04).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based Multimedia Information Retrieval: State-of-the-art and Challenges. ACM Transactions on Multimedia Computing, Communication, and Applications 2(1), 1–19 (2006)
Article Google Scholar
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-media Retrieval. IEEE Transactions on Multimedia 10(3), 437–446 (2008)
Article Google Scholar
Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: ACM Multimedia, pp. 175–184 (2009)
Google Scholar
Swain, M., Ballard, D.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)
Article Google Scholar
Zhao, R., Grosky, W.I.: Negotiating the Semantic Gap: from Feature Maps to Semantic Landscapes. Pattern Recognition 35(3), 593–600 (2002)
Article MATH Google Scholar
Zhou, Z.-H., Ng, M., She, Q.-Q., Jiang, Y.: Budget Semi-supervised Learning, pp. 588–595 (2009)
Google Scholar
Kim, T.-K., Wong, S.-F., Cipolla, R.: Tensor Canonical Correlation Analysis for Action Classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Google Scholar
Rui, Y., Huang, T.S., Ortega, M., Mehrotra, S.: Relevance Feedback: A Power Tool in Interactive Content-based Image Retrieval. IEEE Trans. on Circuits and Systems for Video Technology 8, 644–655 (1998)
Article Google Scholar
He, X., Ma, W.Y., Zhang, H.J.: Learning an Image Manifold for Retrieval. In: Proceedings of ACM Multimedia Conference (2004)
Google Scholar
Jafari-Khouzani, K., Soltanian-Zadeh, H.: Radon Transform Orientation Estimation for Rotation Invariant Texture Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 1004–1008 (2005)
Article MathSciNet Google Scholar
Srivastava, A., Joshi, S.H., Mio, W., Liu, X.: Statistical Shape Analysis: Clustering, Learning, and Testing. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(4), 590–602 (2005)
Article Google Scholar
Guo, G., Li, S.Z.: Content-based Audio Classification and Retrieval by Support Vector Machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)
Article Google Scholar
Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)
Article Google Scholar
Müller, M., Röder, T., Clausen, M.: Efficient Content-Based Retrieval of Motion Capture Data. In: Proceedings of ACM SIGGRAPH 2005 (2005)
Google Scholar
McGurk, H., MacDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)
Article Google Scholar
Zhang, H., Weng, J.: Measuring Multi-modality Similarities via Subspace Learning for Cross-media Retrieval. In: Proceedings of 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)
Google Scholar
Wang, X.-j., Ma, W.-Y., Zhang, L., Li, X.: Multi-graph Enabled Active Learning for Multimodal Web Image Retrieval. In: The 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore (2005)
Google Scholar
Yang, Y., Wu, F., Xu, D., et al.: Cross-media Retrieval using query dependent search methods. Pattern Recognition 43(8), 2927–2936 (2010)
Article MATH Google Scholar
Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: ACM International Conference on Multimedia, Germany (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan, 430081
Hong Zhang & Xiaoli Liu

Authors

Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Nottingham, Jubilee Campus, NG8 1BB, Nottingham, UK
Guoping Qiu
The Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hong Kong, China
Kin Man Lam
Faculty of System Design, Tokyo Metropolitan University, 6-6, Asahigaoka, 191-0065, Hino-city, Tokyo
Hitoshi Kiya
Shanghai Key Laboratory of Intelligent Information Processing, Department of Computer Science & Engineering, Fudan University, Shanghai, China
Xiang-Yang Xue
Department of Electrical Engineering, University of Southern California, 90089-2564, Los Angeles, CA
C.-C. Jay Kuo
LIACS Media Lab, Leiden University,
Michael S. Lew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Liu, X. (2010). Boosting Multimodal Semantic Understanding by Local Similarity Adaptation and Global Correlation Propagation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-15702-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics