Abstract
Emotion tagging is one theme of interest in affective computing, which labels stimuli with human understandable semantic information. Previous works indicate that modality fusion could improve the performance of this kind of tasks. However, acquiring the subjects’ responses is costly and time consuming, leading to that the response modality is absent for large part of multimedia contents, which is required by modality fusion methods. To address this problem, in this paper a novel emotion tagging framework is proposed, which completes the missing response modalities based on the conception of brain encoding. In the framework, an encoding model is built based on the response modality from subjects’ responses and the stimulus modality from stimulus contents. Then the model is applied to those videos whose response modalities are absent to complete the missing response modalities. Modality fusion is finally conducted on stimulus modality and response modality and followed by the classification methods. To test the performance of the proposed framework, DEAP dataset is adopted as a benchmark. In the experiments, three kinds of features are employed as stimulus modalities. Response modality and fused modality are computed under the proposed framework. Affective level identification is conducted as emotion tagging task. The results demonstrate that the accuracies of the proposed framework outperforms the accuracies obtained by using only stimulus modality. The improvements are higher than 5% for all kinds of stimulus modalities in valence and arousal in terms of accuracy. Additionally, the improvement of performance introduces no extra physiological data acquisition, saving economical and timing costs.
Similar content being viewed by others
References
Abadi M, Subramanian R, Kia S, Avesani P, Patras I, Sebe N (2015) Decaf: Meg-based multimodal database for decoding affective physiological responses. IEEE Trans Affect Comput 6(3):209–222
Baveye Y, Dellandréa E, Chamaret C, Chen L (2015) Deep learning vs. kernel methods: performance for emotion prediction in videos. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, Piscataway, pp 77–83
Canini L, Benini S, Leonardi R (2011) Affective analysis on patterns of shot types in movies. In: International Symposium on Image and signal processing and analysis (ISPA), pp 253–258
Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. IEEE Trans Circuits Syst Video Technol 23(4):636–647
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Chang CY, Chang CW, Zheng JY, Chung PC (2013) Physiological emotion analysis using support vector regression. Neurocomputing 122:79–87
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems https://doi.org/10.1109/TNNLS.2016.2582746
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27(7):1502–1513
Chang X, Ma Z, Lin M, Yang Y, Hauptmann AG (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Transactions on Cybernetics 47(5):1180–1197
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: International conference on machine learning, vol 28, pp 1274–1282
Chen M, Han J, Guo L, Wang J, Patras I (2015) Identifying valence and arousal levels via connectivity between eeg channels. In: International conference on affective computing and intelligent interaction, pp 63–69
Chen T, Borth D, Darrell T, Chang SF (2014) Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv:14108586
Cheng G, Han J, Guo L, Liu Z, Bu S, Ren J (2015) Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images. IEEE Trans Geosci Remote Sens 53(8):4238–4249
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415
Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE. https://doi.org/10.1109/JPROC.2017.2675998
Dietz R, Lang A (1999) Affective agents: effects of agent affect on arousal, attention, liking and learning. In: Proceedings of the Third International Cognitive Technology Conference, San Francisco
Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. In: Mozer MC, Jordan MI, Petsche T (eds) Advances in neural information processing systems 9. MIT press, Cambridge, pp 155–161
Ekman P, Friesen WV, O’Sullivan M, Chan A, Diacoyanni-Tarlatzis I, Heider K, Krause R, LeCompte WA, Pitcairn T, Ricci-Bitti PE et al (1987) Universals and cultural differences in the judgments of facial expressions of emotion. J Pers Soc Psychol 53(4):712
Ganchev T, Fakotakis N, Kokkinakis G (2005) Comparative evaluation of various mfcc implementations on the speaker verification task. In: Proceedings of the SPECOM, vol 1, pp 191–194
Guo D, Zhang J, Liu X, Cui Y, Zhao C (2014) Multiple kernel learning based multi-view spectral clustering. In: International conference on pattern recognition, pp 3774–3779
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154
Haralick R, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC 3(6):610–621
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) Deap: A database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lartillot O, Toiviainen P (2007) A matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp 237–244
Lartillot O, Toiviainen P, Eerola T (2008) A matlab toolbox for music information retrieval. In: Data analysis, machine learning and applications. Springer, Berlin, pp 261–268
Ma Z, Chang X, Xu Z, Sebe N, Hauptmann AG (2017) Joint attributes and event analysis for multimedia event detection. IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2017.2709308
Ma Z, Chang X, Yang Y, Sebe N, Hauptmann AG (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. Pattern Recognit Artif Intell 116:374–388
Müller M (2007) Information retrieval for music and motion, vol 2. Springer, Berlin
Naselaris T, Kay KN, Nishimoto S, Gallant JL (2011) Encoding and decoding in fmri. Neuroimage 56(2):400–410
Picard RW (1995) Affective computing. Tech. rep., M.I.T. Media Laboratory Perceptual Computing Section
Picard RW (2000) Affective computing. MIT press, Cambridge
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Rozgic V, Vitaladevuni S, Prasad R (2013) Robust eeg emotion classification using segment level decision fusion. In: IEEE international conference on acoustics, speech and signal processing, pp 1286–1290
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294
Schacter DL, Gilbert DT, Wegner DM (2011) Psychology, 2nd edn. Worth Publishers, Basingstoke, p 310
Soleymani M, Kierkels JJM, Chanel G, Pun T (2009) A bayesian framework for video affective representation. In: International conference on affective computing and intelligent interaction and workshops, pp 1–7
Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55
Tao J, Tan T (2005) Affective computing: A review. In: Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science, vol 3784. Springer, Berlin, pp 981–995
Torres-Valencia C, Garcia-Arias H, Alvarez Lopez M, Orozco-gutierrez A (2014) Comparative analysis of physiological signals and electroencephalogram (eeg) for multimodal emotion recognition using generative models. In: Symposium on image, signal processing and artificial vision, pp 1–5
Wagner J, Kim J, Andre E (2005) From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: 2005 IEEE international conference on multimedia and expo, pp 940–943
Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704
Wang J, Zhou Y, Wang H, Yang X, Yang F, Peterson A (2015) Image tag completion by local learning. In: International symposium on neural networks. Springer International Publishing, Switzerland, pp 232–239
Wang S, Zhu Y, Wu G, Ji Q (2014) Hybrid video emotional tagging using users’ eeg and video content. Multimedia Tools and Applications 72(2):1257–1283
Yang Y, Huang Z, Yang Y, Liu J, Shen HT, Luo J (2013) Local image tagging via graph regularized joint group sparsity. Pattern Recogn 46(5):1358–1368
Yang Y, Gao Y, Zhang H, Shao J, Chua TS (2014) Image tagging with social assistance. In: Proceedings of international conference on multimedia retrieval. ACM, New York, pp 81–88
Yao X, Han J, Zhang D, Nie F (2017) Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Trans Image Process 26(7):3196–3209
Yi-Hsuan Y, Chen HH (2011) Prediction of the distribution of perceived music emotions using discrete samples. IEEE Trans Audio Speech Lang Process 19(7):2184–2196
Zhang D, Han J, Han J, Shao L (2016) Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Transactions on Neural Networks and Learning Systems 27(6):1163–1176
Zhang D, Han J, Li C, Wang J, Li X (2016) Detection of co-salient objects by looking deep and wide. Int J Comput Vis 120(2):215–232
Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758
Zhang D, Meng D, Han J (2017) Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell 39(5):865–878
Zhuang X, Rozgic V, Crystal M (2014) Compact unsupervised eeg response representation for emotion recognition. In: IEEE-EMBS international conference on biomedical and health informatics, pp 736–739
Acknowledgements
This work was supported in part by the National Science Foundation of China under Grant 61401357 and the Fundamental Research Funds for the Central Universities under Grants 3102016ZY023.
The authors would like to express their appreciation to Multimedia Vision Group in Queen Mary, University of London, for the help during the first author working in Multimedia Vision Group.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, M., Cheng, G. & Guo, L. Identifying affective levels on music video via completing the missing modality. Multimed Tools Appl 77, 3287–3302 (2018). https://doi.org/10.1007/s11042-017-5125-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5125-8