Skip to main content

Multimodal Data Fusion and Prediction of Emotional Dimensions Using Deep Neural Network

  • Conference paper
  • First Online:
Computational Intelligence: Theories, Applications and Future Directions - Volume II

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 799))

Abstract

This paper proposes a novel deep neural network model to handle multimodal data. The proposed model seamlessly facilitates fusion of multimodal inputs and provides dimensional reduction of the input feature space. The architecture employs modified stacked autoencoder in conjunction with multilayer perceptron-based regression model. Two variants of architecture are proposed, and experiments have been performed on the multimodal benchmark data (RECOLA) to study the impact of multimodality as against a single modality. Experiments are also conducted to illustrate the effect of presenting multimodal data in sequential or concatenated manner. The results obtained are encouraging. The proposed approach is computationally less expensive than the existing approaches, and the performance is better or at par with other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fujisaki, W., Goda, N., Motoyoshi, I., Nishida, S., Komatsu, H.: Audiovisual integration in the human perception of materials. J. Vis. 14, 1–20 (2014)

    Article  Google Scholar 

  2. Khorrami, P., Paine, T.L., Brady, K., Dagli, C., Huang, T.S.: How deep neural networks can improve emotion recognition on video data (2016). arXiv:1602.07377

  3. Zhang, S., Zhang, S., Huang, T., Gao, W.: Multimodal deep convolutional neural network for audio-visual emotion recognition. In: Proceedings of the 2016 ACM International Conferences on Multimedia Retrieval, ICMR’16, pp. 281–284 (2016)

    Google Scholar 

  4. Wu, Z., Sivadas, S., Tan, Y.K., Bin, M., Goh, R.S.M.: Multi-modal hybrid deep neural network for speech enhancement (2016). arXiv:1606.04750

  5. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conferences on Machine Learning, pp. 689–696 (2011)

    Google Scholar 

  6. Uzan, L., Wolf, L.: I know that voice: Identifying the voice actor behind the voice. In: Proceedings of the 2015 International Conferences on Biometrics, ICB’15, pp. 46–51 (2015)

    Google Scholar 

  7. Salah, A.: Perceptual information fusion in humans and machines. Cogn. Neurosci. Forum. (2007)

    Google Scholar 

  8. Yang, L., Jiang, D., He, L., Pei, E., Oveneke, M.C., Sahli, H.: Decision tree based depression classification from audio video and language information. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, AVEC’16, pp. 89–96 (2016)

    Google Scholar 

  9. He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, AVEC’15, pp. 73–80 (2015)

    Google Scholar 

  10. Fontaine, J.R.J., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not two-dimensional. Psychol. Sci. 18, 1050–1057 (2015)

    Article  Google Scholar 

  11. Breazeal, C.: Emotion and sociable humanoid robots. Int. J. Hum Comput Stud. 59, 119–155 (2003)

    Article  Google Scholar 

  12. Chao, L., Tao, J., Yang, M., Li, Y., Wen, Z.: Long short term memory recurrent neural network based multimodal dimensional emotion recognition, pp. 65–72 (2012). arXiv:1212.5701

  13. Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conferences on Workshops on Automatic Face and Gesture Recognition (2013)

    Google Scholar 

  14. Keras Documentation. https://keras.io/

  15. TensorFlow. https://www.tensorflow.org/

  16. Wang, W., Ooi, C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoding. Proc. VLDB Endow. 7, 649–660 (2014)

    Article  Google Scholar 

  17. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  18. Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D., Cowie, R., Pantic, M.: Av + Ec 2015—The first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshops Audio/Visual Emotion Challenge, AVEC’15, pp. 3–8 (2015)

    Google Scholar 

  19. Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv:1212.5701

  20. Valstar, M., Pantic, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R.: AVEC 2016—depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshops Audio/Visual Emotion Challenge, AVEC’16, pp. 3–10 (2016)

    Google Scholar 

  21. Kächele, M., Thiam, P., Palm, G., Schwenker, F., Schels, M.: Ensemble methods for continuous affect recognition, pp. 9–16 (2012). arXiv:1212.5701

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandeep Paul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhandari, D., Paul, S., Narayan, A. (2019). Multimodal Data Fusion and Prediction of Emotional Dimensions Using Deep Neural Network. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume II. Advances in Intelligent Systems and Computing, vol 799. Springer, Singapore. https://doi.org/10.1007/978-981-13-1135-2_17

Download citation

Publish with us

Policies and ethics