Abstract
Computer vision domain consists of algorithms and techniques to enhance computers with the ability to see and perceive. Human emotion recognition using computer vision is a challenging research area. Facial expression may not always give accurate judgment of emotion hence needs to be combined with other modalities such as voice, text and physiological signals. Several fusion approaches such as direct, early and late were introduced but the problem still persists. This paper focuses on deep neural network (NN) based sequential late fusion approach to identify emotions from various available modalities. Modalities are integrated into the system sequentially at the decision level. A deep CNN was trained to identify face emotions. Short videos were analyzed to recognize emotions. Further, frames were extracted and the emotions were analyzed. The voice channel was processed and transcripts were generated. Each channel outcome was compared for accuracy. The opinion was recorded manually for conformance of results. The opinion matched with the emotion classified by the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3-4), 169–200 (1992)
Plutchik, R., Kellerman, H.: Emotion, Theory, Research, and Experience, vol. 1. Academic Press, London (1980)
Parrott, W.G. (eds.): Emotions in Social Psychology: Essential Readings. Psychology Press, New York (2001)
Eckman, P., Friesen, W.V.: Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1977)
MPEG Video and SNHC, Text of ISO/IEC FDIS 14 496-3: Audio, Atlantic City MPEG Mtg (1998)
Ekman, P., Friesen, W.V.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Mountain View (1978)
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), Grenoble, France, pp. 46–53 (2000)
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), Southampton, pp. 211–216 (2006)
Yin, L., Chen, X., Sun, Y., Worm, T., Reale, M.: A high-resolution 3D dynamic facial expression database. In: Proceedings of 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, pp. 1–6 (2008)
Nguyen, H., Kotani, K., Chen, F., Le, B.: A thermal facial emotion database and its analysis. In: Klette, R., Rivera, M., Satoh, S. (eds.) PSIVT 2013. LNCS, vol. 8333, pp. 397–408. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53842-1_34
Paeschke, A., Kienast, M., Sendlmeier, W.F.: F0-contours in emotional speech. In: Proceedings of 14th International Congress of Phonetic Sciences, vol. 2 (1999)
Binali, H., Wu, C., Potdar, V.: Computational approaches for emotion detection in text. In: 4th IEEE International Conference on Digital Ecosystems and Technologies, Dubai, pp. 172–177 (2010)
Thushara, S., Veni, S.: A multimodal emotion recognition system from video. In: International Conference on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil, pp. 1–5 (2016)
Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the 2008 ACM Symposium on Applied Computing. ACM (2008)
Huang, Y., Yang, J., Liao, P., Pan, J.: Fusion of facial expressions and EEG for multimodal emotion recognition. Comput. Intell. Neurosci. 2017, 8 (2017)
Kapur, A., Kapur, A., Virji-Babul, N., Tzanetakis, G., Driessen, P.F.: Gesture-based affective computing on motion capture data. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 1–7. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_1
Ranganathan, H., Chakraborty, S., Panchanathan, S.: Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, New York, pp. 1–9 (2016)
Huang, T.S., Chen, L.S., Tao, H., Miyasato, T., Nakatsu, R.: Bimodal emotion recognition by man and machine. In: ATR Workshop on Virtual Communication Environments, vol. 31 (1998)
Gunes, H., Piccardi, M.: Affect recognition from face and body: early fusion vs. late fusion. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, vol. 4, pp. 3437–3443 (2005)
Yoshitomi, Y., Kim, S.-I., Kawano, T., Kilazoe, T.: Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In: Proceedings of the 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000, Osaka, Japan, pp. 178–183 (2000)
Chen, L.S., Huang, T.S., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, pp. 366–371 (1998)
Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., Wróbel, M.R.: Emotion recognition and its applications. In: Hippe, Z.S., Kulikowski, J.L., Mroczek, T., Wtorek, J. (eds.) Human-Computer Systems Interaction: Backgrounds and Applications 3. AISC, vol. 300, pp. 51–62. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08491-6_5
Song, K., Nho, Y., Seo, J., Kwon, D.: Decision-level fusion method for emotion recognition using multimodal emotion recognition information. In: 15th International Conference on Ubiquitous Robots (UR), Honolulu, HI, pp. 472–476 (2018)
Hossain, M.S., Muhammad, G.: Emotion recognition using deep learning approach from audio-visual emotional big data. Inf. Fusion 49, 69–78 (2019)
Choi, W.Y., Song, K.Y., Lee, C.W.: Convolutional attention networks for multimodal emotion recognition from speech and text data. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pp. 28–34 (2018)
Tan, Z.X., Goel, A., Nguyen, T.-S., Ong, D.C.: A multimodal LSTM for predicting listener empathic responses over time. arXiv preprint arXiv:1812.04891 (2018)
Sonawane, B., Sharma, P.: Acceleration of CNN-based facial emotion detection using NVIDIA GPU. In: Bhalla, S., Bhateja, V., Chandavale, A.A., Hiwale, A.S., Satapathy, S.C. (eds.) Intelligent Computing and Information and Communication. AISC, vol. 673, pp. 257–264. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7245-1_26
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Modi, A., Sharma, P. (2019). SeLF: A Deep Neural Network Based Multimodal Sequential Late Fusion Approach for Human Emotion Recognition. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-13-9939-8_25
Download citation
DOI: https://doi.org/10.1007/978-981-13-9939-8_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9938-1
Online ISBN: 978-981-13-9939-8
eBook Packages: Computer ScienceComputer Science (R0)