Multimodal Data Fusion and Prediction of Emotional Dimensions Using Deep Neural Network

Bhandari, Dhruv; Paul, Sandeep; Narayan, Apurva

doi:10.1007/978-981-13-1135-2_17

Dhruv Bhandari¹⁶,
Sandeep Paul¹⁶ &
Apurva Narayan¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 799))

589 Accesses
1 Citations

Abstract

This paper proposes a novel deep neural network model to handle multimodal data. The proposed model seamlessly facilitates fusion of multimodal inputs and provides dimensional reduction of the input feature space. The architecture employs modified stacked autoencoder in conjunction with multilayer perceptron-based regression model. Two variants of architecture are proposed, and experiments have been performed on the multimodal benchmark data (RECOLA) to study the impact of multimodality as against a single modality. Experiments are also conducted to illustrate the effect of presenting multimodal data in sequential or concatenated manner. The results obtained are encouraging. The proposed approach is computationally less expensive than the existing approaches, and the performance is better or at par with other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fujisaki, W., Goda, N., Motoyoshi, I., Nishida, S., Komatsu, H.: Audiovisual integration in the human perception of materials. J. Vis. 14, 1–20 (2014)
Article Google Scholar
Khorrami, P., Paine, T.L., Brady, K., Dagli, C., Huang, T.S.: How deep neural networks can improve emotion recognition on video data (2016). arXiv:1602.07377
Zhang, S., Zhang, S., Huang, T., Gao, W.: Multimodal deep convolutional neural network for audio-visual emotion recognition. In: Proceedings of the 2016 ACM International Conferences on Multimedia Retrieval, ICMR’16, pp. 281–284 (2016)
Google Scholar
Wu, Z., Sivadas, S., Tan, Y.K., Bin, M., Goh, R.S.M.: Multi-modal hybrid deep neural network for speech enhancement (2016). arXiv:1606.04750
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th International Conferences on Machine Learning, pp. 689–696 (2011)
Google Scholar
Uzan, L., Wolf, L.: I know that voice: Identifying the voice actor behind the voice. In: Proceedings of the 2015 International Conferences on Biometrics, ICB’15, pp. 46–51 (2015)
Google Scholar
Salah, A.: Perceptual information fusion in humans and machines. Cogn. Neurosci. Forum. (2007)
Google Scholar
Yang, L., Jiang, D., He, L., Pei, E., Oveneke, M.C., Sahli, H.: Decision tree based depression classification from audio video and language information. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, AVEC’16, pp. 89–96 (2016)
Google Scholar
He, L., Jiang, D., Yang, L., Pei, E., Wu, P., Sahli, H.: Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, AVEC’15, pp. 73–80 (2015)
Google Scholar
Fontaine, J.R.J., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not two-dimensional. Psychol. Sci. 18, 1050–1057 (2015)
Article Google Scholar
Breazeal, C.: Emotion and sociable humanoid robots. Int. J. Hum Comput Stud. 59, 119–155 (2003)
Article Google Scholar
Chao, L., Tao, J., Yang, M., Li, Y., Wen, Z.: Long short term memory recurrent neural network based multimodal dimensional emotion recognition, pp. 65–72 (2012). arXiv:1212.5701
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conferences on Workshops on Automatic Face and Gesture Recognition (2013)
Google Scholar
Keras Documentation. https://keras.io/
TensorFlow. https://www.tensorflow.org/
Wang, W., Ooi, C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoding. Proc. VLDB Endow. 7, 649–660 (2014)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D., Cowie, R., Pantic, M.: Av + Ec 2015—The first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshops Audio/Visual Emotion Challenge, AVEC’15, pp. 3–8 (2015)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv:1212.5701
Valstar, M., Pantic, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R.: AVEC 2016—depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshops Audio/Visual Emotion Challenge, AVEC’16, pp. 3–10 (2016)
Google Scholar
Kächele, M., Thiam, P., Palm, G., Schwenker, F., Schels, M.: Ensemble methods for continuous affect recognition, pp. 9–16 (2012). arXiv:1212.5701

Download references

Author information

Authors and Affiliations

Dayalbagh Educational Institute, Dayalbagh, Agra, India
Dhruv Bhandari & Sandeep Paul
University of Waterloo, Waterloo, Canada
Apurva Narayan

Authors

Dhruv Bhandari
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Paul
View author publications
You can also search for this author in PubMed Google Scholar
Apurva Narayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandeep Paul .

Editor information

Editors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
Nishchal K. Verma
Department of Aerospace Engineering, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, India
A. K. Ghosh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhandari, D., Paul, S., Narayan, A. (2019). Multimodal Data Fusion and Prediction of Emotional Dimensions Using Deep Neural Network. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume II. Advances in Intelligent Systems and Computing, vol 799. Springer, Singapore. https://doi.org/10.1007/978-981-13-1135-2_17

Download citation

DOI: https://doi.org/10.1007/978-981-13-1135-2_17
Published: 02 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1134-5
Online ISBN: 978-981-13-1135-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics