An Adversarial Learning and Canonical Correlation Analysis Based Cross-Modal Retrieval Model

Vuong, Thi-Hong; Pham, Thanh-Huyen; Nguyen, Tri-Thanh; Ha, Quang-Thuy

doi:10.1007/978-3-030-14799-0_13

An Adversarial Learning and Canonical Correlation Analysis Based Cross-Modal Retrieval Model

Thi-Hong Vuong¹⁸,
Thanh-Huyen Pham^18,19,
Tri-Thanh Nguyen¹⁸ &
…
Quang-Thuy Ha¹⁸

Conference paper
First Online: 07 March 2019

1877 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11431))

Abstract

The key of cross-modal retrieval approaches is to find a maximally correlated subspace among multiple datasets. This paper introduces a novel Adversarial Learning and Canonical Correlation Analysis based Cross-Modal Retrieval (ALCCA-CMR) model. For each modality, the ALCCA phase finds an effective common subspace and calculates the similarity by canonical correlation analysis embedding for cross-modal retrieval. We demonstrate an application of ALCCA-CMR model implemented for the dataset of two modalities. Experimental results on real music data show the efficacy of the proposed method in comparison with other existing ones.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andrew, G., Arora, R., Bilmes, J., Livescu, K.: Deep canonical correlation analysis. In: International Conference on Machine Learning, pp. 1247–1255 (2013)
Google Scholar
Boutell, M., Luo, J.: Photo classification by integrating image content and camera metadata. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 4, pp. 901–904. IEEE (2004)
Google Scholar
Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 129–136. ACM (2009)
Google Scholar
De Bie, T., De Moor, B.: On the regularization of canonical correlation analysis. In: International Symposium on ICA and BSS, pp. 785–790 (2003)
Google Scholar
Feng, F., Li, R., Wang, X.: Deep correspondence restricted boltzmann machine for cross-modal retrieval. Neurocomputing 154, 50–60 (2015)
Article Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. Am. Music 183(5,049), 2–209 (2009)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Mandal, A., Maji, P.: Regularization and shrinkage in rough set based canonical correlation analysis. In: Polkowski, L., et al. (eds.) IJCRS 2017. LNCS (LNAI), vol. 10313, pp. 432–446. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60837-2_36
Chapter Google Scholar
Mandal, A., Maji, P.: FaRoC: fast and robust supervised canonical correlation analysis for multimodal omics data. IEEE Trans. Cybern. 48(4), 1229–1241 (2018)
Article Google Scholar
McAuley, J., Leskovec, J.: Image labeling on a network: using social-network metadata for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 828–841. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_59
Chapter Google Scholar
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 689–696 (2011)
Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI, pp. 3846–3853 (2016)
Google Scholar
Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 154–162. ACM (2017)
Google Scholar
Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2095 (2013)
Google Scholar
Wang, K., Yin, Q., Wang, W., Wu, S., Wang, L.: A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016)
Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: AAAI, vol. 1, p. 2 (2014)
Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3441–3450 (2015)
Google Scholar
Yao, T., Mei, T., Ngo, C.W.: Learning query and image similarities with ranking canonical correlation analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 28–36 (2015)
Google Scholar
Yu, Y., Tang, S., Raposo, F., Chen, L.: Deep cross-modal correlation learning for audio and lyrics in music retrieval. arXiv preprint arXiv:1711.08976 (2017)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. arXiv preprint (2017)
Google Scholar
Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358 (2017)

Download references

Author information

Authors and Affiliations

Vietnam National University, Hanoi (VNU), VNU-University of Engineering and Technology (UET), No. 144, Xuan Thuy, Cau Giay, Hanoi, Vietnam
Thi-Hong Vuong, Thanh-Huyen Pham, Tri-Thanh Nguyen & Quang-Thuy Ha
Ha Long University, Quang Ninh, Vietnam
Thanh-Huyen Pham

Authors

Thi-Hong Vuong
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Huyen Pham
View author publications
You can also search for this author in PubMed Google Scholar
Tri-Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Quang-Thuy Ha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi-Hong Vuong .

Editor information

Editors and Affiliations

Ton Duc Thang University, Ho Chi Minh City, Vietnam
Ngoc Thanh Nguyen
Bina Nusantara University, Jakarta, Indonesia
Ford Lumban Gaol
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vuong, TH., Pham, TH., Nguyen, TT., Ha, QT. (2019). An Adversarial Learning and Canonical Correlation Analysis Based Cross-Modal Retrieval Model. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11431. Springer, Cham. https://doi.org/10.1007/978-3-030-14799-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-14799-0_13
Published: 07 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14798-3
Online ISBN: 978-3-030-14799-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics