Abstract
This paper presents a multimodal fusion approach using kernel-based Extreme Learning Machine (ELM) for video emotion recognition by combing video content and electroencephalogram (EEG) signals. Firstly, several audio-based features and visual-based features are extracted from video clips and EEG features are obtained by using Wavelet Packet Decomposition (WPD). Secondly, video features are selected using Double Input Symmetrical Relevance (DISR) and EEG features are selected by Decision Tree (DT). Thirdly, multimodal fusion using kernel-based ELM is adopted for classification by combing video and EEG features at decision-level. In order to test the validity of the proposed method, we design and conduct the EEG experiment to collect data that consisted of video clips and EEG signals of subjects. We compare our method separately with single mode methods of using video content only and EEG signals only on classification accuracy. The experimental results show that the proposed fusion method produces better classification performance than those of the video emotion recognition methods which use either video content or EEG signals alone.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lin, J., Sun, Y., Wang, W.: Violence detection in movies with auditory and visual cues. In: Proceedings of the 2010 International Conference on Computational Intelligence and Security. IEEE Computer Society, pp. 561–565 (2010)
Nie, D., Wang, X.W., Shi, L.C., et al.: EEG-based emotion recognition during watching movies. Int. IEEE/EMBS Conf. Neural Eng. 1359, 667–670 (2011)
Bailenson, J.N., et al.: Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int J Hum Mach Stud 66(5), 303–317 (2008)
Mansoorizadeh, M.: Multimodal information fusion application to human emotion recognition from face and speech. Multimed. Tools Appl. 49(2), 277–297 (2010)
Koelstra, R.A.L.S.: Affective and implicit tagging using facial expressions and electroencephalography. Queen Mary University of London (2012)
Ye, W., Fan, X.: Bimodal emotion recogition from speech and text. Int. J. Adv. Comput. Sci. Appl. 5(2), 26–29 (2014)
Wang, S., Zhu, Y., Wu, G., et al.: Hybrid video emotional tagging using users’ EEG and video content. Multimed. Tools Appl. 72(2), 1257–1283 (2014)
Chuang, Z.J., Wu, C.H.: Multi-modal emotion recognition from speech and text. Int. J. Comput. Linguist. Chin. Lang. Process. 1, 779–783 (2004)
Pantic, M., Caridakis, G., André, E., et al.: Multimodal emotion recognition from low-level cues. Cognit. Technol. 115–132 (2011)
Sun, K., Yu, J.: Video affective content representation and recognition using video affective tree and hidden markov models. Lecture Notes Comput. Sci. 594–605 (2007)
Jasmine, K.P., Kumar, P.R.: Integration of HSV color histogram and LMEBP joint histogram for multimedia image retrieval. Adv. Intell. Syst. Comput. (2014)
Wu, T., Yan, G.Z., Yang, B.H., et al.: EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6), 618–625 (2008)
Chen, X., Wu, J., Cai, Z.: Learning the attribute selection measures for decision tree. In: Fifth international conference on machine vision (ICMV 2012): algorithms, pattern recognition, and basic technologies, 8784(2), 257–259 (2013)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 International Joint Conference on Neural Networks (IJCNN’2004), (Budapest, Hungary), July 25–29 (2004)
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybernet. Part B: Cybernet. 42(2), 513–529 (2012). (This paper shows that ELM generally outperforms SVM/LS-SVM in various kinds of cases.)
Acknowledgements
This research is partially sponsored by Natural Science Foundation of China (Nos. 61175115, 61370113 and 61272320), Beijing Municipal Natural Science Foundation (4152005 and 4152006), the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (CIT&TCD201304035), Jing-Hua Talents Project of Beijing University of Technology (2014-JH-L06), Ri-Xin Talents Project of Beijing University of Technology (2014-RX-L06), the Research Fund of Beijing Municipal Commission of Education (PXM2015_014204_500221) and the International Communication Ability Development Plan for Young Teachers of Beijing University of Technology (No. 2014-16).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Duan, L., Ge, H., Yang, Z., Chen, J. (2016). Multimodal Fusion Using Kernel-Based ELM for Video Emotion Recognition. In: Cao, J., Mao, K., Wu, J., Lendasse, A. (eds) Proceedings of ELM-2015 Volume 1. Proceedings in Adaptation, Learning and Optimization, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-28397-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-28397-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28396-8
Online ISBN: 978-3-319-28397-5
eBook Packages: EngineeringEngineering (R0)