Skip to main content

Multiview Deep Learning

  • Chapter
  • First Online:
Multiview Machine Learning

Abstract

The multiview deep learning described in this chapter deals with multiview data or simulates constructing its intrinsic structure by using deep learning methods. We highlight three major categories of multiview deep learning methods through three different thoughts. The first category of approaches focuses on obtaining a shared joint representation from different views by building a hierarchical structure. The second category of approaches focuses on constructing structured spaces with different representations of multiple views which gives some constraints between representations on a different view. The third major category approaches focuses on explicitly constructing connections or relationships between different views or representations, which allows different views to be translated or mapped to each other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, pp 1247–1255

    Google Scholar 

  • Andrews S, Hofmann T, Tsochantaridis I (2002) Multiple instance learning with generalized support vector machines. In: Proceedings of the 8th international association for the advancement of artificial intelligence, pp 943–944

    Google Scholar 

  • Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) Vqa: Visual question answering. In: Proceedings of the international conference on computer vision, pp 2425–2433

    Google Scholar 

  • AP SC, Lauly S, Larochelle H, Khapra M, Ravindran B, Raykar VC, Saha A (2014) An autoencoder approach to learning bilingual word representations. In: Advances in neural information processing systems, pp 1853–1861

    Google Scholar 

  • Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. In: Proceedings of the 3rd international conference on learning representations

    Google Scholar 

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd international conference on learning representations

    Google Scholar 

  • Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  Google Scholar 

  • Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res 49(1):1–47

    Article  MathSciNet  Google Scholar 

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078

  • Denton EL, Chintala S, Fergus R, et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494

    Google Scholar 

  • Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition, pp 2625–2634

    Google Scholar 

  • Dong J, Li X, Snoek CGM (2016) Word2visualvec: Cross-media retrieval by visual feature prediction

    Google Scholar 

  • Ehrlich M, Shields TJ, Almaev T, Amer MR (2016) Facial attributes classification using multi-task representation learning. In: Proceedings of the 29th IEEE conference on computer vision and pattern recognition workshops, pp 47–55

    Google Scholar 

  • Fang H, Gupta S, Iandola FN, Srivastava RK, Deng L, Dollar P, Gao J, He X, Mitchell M, Platt J, et al (2015) From captions to visual concepts and back. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition pp 1473–1482

    Google Scholar 

  • Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM international conference on multimedia, ACM, pp 7–16

    Google Scholar 

  • Feng F, Wang X, Li R, Ahmad I (2015) Correspondence autoencoders for cross-modal retrieval. ACM Trans Multimed Comput Commun Appl (TOMM) 12(1s):26

    Article  Google Scholar 

  • Feng Y, Lapata M (2010) Visual information in semantic representation. In: The annual conference of the North American chapter of the association for computational linguistics, association for computational linguistics, pp 91–99

    Google Scholar 

  • Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T, et al (2013) DeViSE: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129

    Google Scholar 

  • Ge L, Gao J, Li X, Zhang A (2013) Multi-source deep learning for information trustworthiness estimation. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 766–774

    Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

    Google Scholar 

  • Gregor K, Danihelka I, Graves A, Rezende DJ, Wierstra D (2015) Draw: a recurrent neural network for image generation. In: Proceedings of the 32nd international conference on machine learning, pp 1462–1471

    Google Scholar 

  • Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  • Hinton GE (2009) Deep belief networks. Scholarpedia 4(5):5947

    Article  Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4):321–377

    Article  Google Scholar 

  • Hu H, Liu B, Wang B, Liu M, Wang X (2013) Multimodal DBN for predicting high-quality answers in CQA portals. In: Proceedings of the 51st annual meeting of the association for computational linguistics, vol 2, pp 843–847

    Google Scholar 

  • Huang J, Kingsbury B (2013) Audio-visual deep learning for noise robust speech recognition. In: Proceedings of the 38th international conference on acoustics, speech, and signal processing, IEEE, pp 7596–7599

    Google Scholar 

  • Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364

    Article  Google Scholar 

  • Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition, pp 3128–3137

    Google Scholar 

  • Karpathy A, Joulin A, Fei-Fei LF (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in neural information processing systems, pp 1889–1897

    Google Scholar 

  • Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings of the 38th international conference on acoustics, speech and signal processing, IEEE, pp 3687–3691

    Google Scholar 

  • Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: Proceedings of the 2nd international conference on learning representations

    Google Scholar 

  • Kiros R, Salakhutdinov R, Zemel R (2014a) Multimodal neural language models. In: Proceedings of the 31st international conference on machine learning, pp 595–603

    Google Scholar 

  • Kiros R, Salakhutdinov R, Zemel RS (2014b) Unifying visual-semantic embeddings with multimodal neural language models. arXiv:Learning

  • Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. In: Proceedings of the 25th international conference on machine learning, ACM, pp 536–543

    Google Scholar 

  • Lazaridou A, Baroni M (2015) Combining language and vision with a multimodal skip-gram model. In: The annual conference of the north american chapter of the association for computational linguistics, pp 153–163

    Google Scholar 

  • Lu A, Wang W, Bansal M, Gimpel K, Livescu K (2015) Deep multilingual correlation for improved word embeddings. In: Proceedings of the North American chapter of the association for computational linguistics: human language technologies, pp 250–256

    Google Scholar 

  • Mansimov E, Parisotto E, Ba J, Salakhutdinov R (2016) Generating images from captions with attention. In: Proceedings of the 4th international conference on learning representations

    Google Scholar 

  • Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille AL (2015) Deep captioning with multimodal recurrent neural networks (m-rnn). In: Proceedings of the 3rd international conference on learning representations

    Google Scholar 

  • Mikolov T, Karafiát M, Burget L, ÄŒernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of the 11th Annual conference of the international speech communication association

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781

  • Mnih V, Heess N, Graves A, et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212

    Google Scholar 

  • Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning, pp 689–696

    Google Scholar 

  • Nojavanasghari B, Gopinath D, Koushik J, BaltruÅ¡aitis T, Morency LP (2016) Deep multimodal fusion for persuasiveness prediction. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM, pp 284–288

    Google Scholar 

  • Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado G, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: Proceedings of the 2nd international conference on learning representations

    Google Scholar 

  • Ouyang W, Chu X, Wang X (2014) Multi-source deep learning for human pose estimation. In: Proceedings of the 27th IEEE conference on computer vision and pattern recognition, pp 2329–2336

    Google Scholar 

  • Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward RK (2016) Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans Audio Speech Lang Process 24(4):694–707

    Article  Google Scholar 

  • Pang L, Ngo CW (2015) Mutlimodal learning with deep boltzmann machine for emotion prediction in user generated videos. In: Proceedings of the 5th ACM on international conference on multimedia retrieval, ACM, pp 619–622

    Google Scholar 

  • Reed S, Akata Z, Lee H, Schiele B (2016a) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the 29th IEEE conference on computer vision and pattern recognition, pp 49–58

    Google Scholar 

  • Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016b) Learning what and where to draw. In: Advances in neural information processing systems, pp 217–225

    Google Scholar 

  • Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016c) Generative adversarial text to image synthesis. In: Proceedings of the 33rd international conference on machine learning, pp 1060–1069

    Google Scholar 

  • Salakhutdinov R, Larochelle H (2010) Efficient learning of deep boltzmann machines. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 693–700

    Google Scholar 

  • Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1, pp 721–732

    Google Scholar 

  • Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist 2(1):207–218

    Article  Google Scholar 

  • Sohn K, Shang W, Lee H (2014) Improved multimodal deep learning with variation of information. In: Advances in neural information processing systems, pp 2141–2149

    Google Scholar 

  • Song Y, Morency LP, Davis R (2012) Multi-view latent variable discriminative models for action recognition. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition, IEEE, pp 2120–2127

    Google Scholar 

  • Srivastava N, Salakhutdinov R (2012a) Learning representations for multimodal data with deep belief nets. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition workshops, vol 79

    Google Scholar 

  • Srivastava N, Salakhutdinov R (2012b) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230

    Google Scholar 

  • Suk HI, Lee SW, Shen D, Initiative ADN et al (2014) Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis. NeuroImage 101:569–582

    Article  Google Scholar 

  • Sun S, Liu Q (2018) Multi-view deep gaussian processes. In: Proceedings of the international conference on neural information processing

    Google Scholar 

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

    Google Scholar 

  • Usunier N, Buffoni D, Gallinari P (2009) Ranking with ordered weighted pairwise classification. In: Proceedings of the 26th international conference on machine learning, ACM, pp 1057–1064

    Google Scholar 

  • Venugopalan S, Xu H, Donahue J, Rohrbach M, Mooney R, Saenko K (2014) Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:14124729

  • Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In: Proceedings of the international conference on computer vision, pp 4534–4542

    Google Scholar 

  • Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning, ACM, pp 1096–1103

    Google Scholar 

  • Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408

    Google Scholar 

  • Wang D, Cui P, Ou M, Zhu W (2015a) Deep multimodal hashing with orthogonal regularization. In: Proceedings of the 24th international joint conference on artificial intelligence, vol 367, pp 2291–2297

    Google Scholar 

  • Wang W, Arora R, Livescu K, Bilmes J (2015b) On deep multi-view representation learning. In: Proceedings of the 32nd international conference on machine learning, pp 1083–1092

    Google Scholar 

  • Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35

    Article  MathSciNet  Google Scholar 

  • Weston J, Bengio S, Usunier N (2011) WSABIE: Scaling up to large vocabulary image annotation. In: Proceedings of the 20th international joint conference on artificial intelligence, vol 11, pp 2764–2770

    Google Scholar 

  • Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015a) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, pp 2048–2057

    Google Scholar 

  • Xu R, Xiong C, Chen W, Corso JJ (2015b) Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: Proceedings of the 29th international association for the advancement of artificial intelligence, vol 5, p 6

    Google Scholar 

  • Yan F, Mikolajczyk K (2015) Deep correlation for matching images and text. In: Proceedings of the 28th IEEE conference on computer vision and pattern recognition, pp 3441–3450

    Google Scholar 

  • Yih Wt, He X, Meek C (2014) Semantic parsing for single-relation question answering. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 2, pp 643–648

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiliang Sun .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sun, S., Mao, L., Dong, Z., Wu, L. (2019). Multiview Deep Learning. In: Multiview Machine Learning. Springer, Singapore. https://doi.org/10.1007/978-981-13-3029-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-3029-2_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-3028-5

  • Online ISBN: 978-981-13-3029-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics