Identifying affective levels on music video via completing the missing modality

Chen, Mo; Cheng, Gong; Guo, Lei

doi:10.1007/s11042-017-5125-8

Identifying affective levels on music video via completing the missing modality

Published: 23 August 2017

Volume 77, pages 3287–3302, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

421 Accesses
3 Citations
Explore all metrics

Abstract

Emotion tagging is one theme of interest in affective computing, which labels stimuli with human understandable semantic information. Previous works indicate that modality fusion could improve the performance of this kind of tasks. However, acquiring the subjects’ responses is costly and time consuming, leading to that the response modality is absent for large part of multimedia contents, which is required by modality fusion methods. To address this problem, in this paper a novel emotion tagging framework is proposed, which completes the missing response modalities based on the conception of brain encoding. In the framework, an encoding model is built based on the response modality from subjects’ responses and the stimulus modality from stimulus contents. Then the model is applied to those videos whose response modalities are absent to complete the missing response modalities. Modality fusion is finally conducted on stimulus modality and response modality and followed by the classification methods. To test the performance of the proposed framework, DEAP dataset is adopted as a benchmark. In the experiments, three kinds of features are employed as stimulus modalities. Response modality and fused modality are computed under the proposed framework. Affective level identification is conducted as emotion tagging task. The results demonstrate that the accuracies of the proposed framework outperforms the accuracies obtained by using only stimulus modality. The improvements are higher than 5% for all kinds of stimulus modalities in valence and arousal in terms of accuracy. Additionally, the improvement of performance introduces no extra physiological data acquisition, saving economical and timing costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

Article Open access 01 March 2024

References

Abadi M, Subramanian R, Kia S, Avesani P, Patras I, Sebe N (2015) Decaf: Meg-based multimodal database for decoding affective physiological responses. IEEE Trans Affect Comput 6(3):209–222
Article Google Scholar
Baveye Y, Dellandréa E, Chamaret C, Chen L (2015) Deep learning vs. kernel methods: performance for emotion prediction in videos. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, Piscataway, pp 77–83
Canini L, Benini S, Leonardi R (2011) Affective analysis on patterns of shot types in movies. In: International Symposium on Image and signal processing and analysis (ISPA), pp 253–258
Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. IEEE Trans Circuits Syst Video Technol 23(4):636–647
Article Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
Chang CY, Chang CW, Zheng JY, Chung PC (2013) Physiological emotion analysis using support vector regression. Neurocomputing 122:79–87
Article Google Scholar
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems https://doi.org/10.1109/TNNLS.2016.2582746
Article MathSciNet Google Scholar
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27(7):1502–1513
Article MathSciNet Google Scholar
Chang X, Ma Z, Lin M, Yang Y, Hauptmann AG (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Article MathSciNet Google Scholar
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Transactions on Cybernetics 47(5):1180–1197
Article Google Scholar
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Article Google Scholar
Chen M, Zheng A, Weinberger K (2013) Fast image tagging. In: International conference on machine learning, vol 28, pp 1274–1282
Chen M, Han J, Guo L, Wang J, Patras I (2015) Identifying valence and arousal levels via connectivity between eeg channels. In: International conference on affective computing and intelligent interaction, pp 63–69
Chen T, Borth D, Darrell T, Chang SF (2014) Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv:14108586
Cheng G, Han J, Guo L, Liu Z, Bu S, Ren J (2015) Effective and efficient midlevel visual elements-oriented land-use classification using VHR remote sensing images. IEEE Trans Geosci Remote Sens 53(8):4238–4249
Article Google Scholar
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415
Article Google Scholar
Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE. https://doi.org/10.1109/JPROC.2017.2675998
Article Google Scholar
Dietz R, Lang A (1999) Affective agents: effects of agent affect on arousal, attention, liking and learning. In: Proceedings of the Third International Cognitive Technology Conference, San Francisco
Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. In: Mozer MC, Jordan MI, Petsche T (eds) Advances in neural information processing systems 9. MIT press, Cambridge, pp 155–161
Ekman P, Friesen WV, O’Sullivan M, Chan A, Diacoyanni-Tarlatzis I, Heider K, Krause R, LeCompte WA, Pitcairn T, Ricci-Bitti PE et al (1987) Universals and cultural differences in the judgments of facial expressions of emotion. J Pers Soc Psychol 53(4):712
Article Google Scholar
Ganchev T, Fakotakis N, Kokkinakis G (2005) Comparative evaluation of various mfcc implementations on the speaker verification task. In: Proceedings of the SPECOM, vol 1, pp 191–194
Guo D, Zhang J, Liu X, Cui Y, Zhao C (2014) Multiple kernel learning based multi-view spectral clustering. In: International conference on pattern recognition, pp 3774–3779
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154
Article Google Scholar
Haralick R, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC 3(6):610–621
Article Google Scholar
Koelstra S, Muhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) Deap: A database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lartillot O, Toiviainen P (2007) A matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp 237–244
Lartillot O, Toiviainen P, Eerola T (2008) A matlab toolbox for music information retrieval. In: Data analysis, machine learning and applications. Springer, Berlin, pp 261–268
Chapter Google Scholar
Ma Z, Chang X, Xu Z, Sebe N, Hauptmann AG (2017) Joint attributes and event analysis for multimedia event detection. IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2017.2709308
Ma Z, Chang X, Yang Y, Sebe N, Hauptmann AG (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568
Article Google Scholar
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. Pattern Recognit Artif Intell 116:374–388
Google Scholar
Müller M (2007) Information retrieval for music and motion, vol 2. Springer, Berlin
Book Google Scholar
Naselaris T, Kay KN, Nishimoto S, Gallant JL (2011) Encoding and decoding in fmri. Neuroimage 56(2):400–410
Article Google Scholar
Picard RW (1995) Affective computing. Tech. rep., M.I.T. Media Laboratory Perceptual Computing Section
Picard RW (2000) Affective computing. MIT press, Cambridge
Google Scholar
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Article Google Scholar
Rozgic V, Vitaladevuni S, Prasad R (2013) Robust eeg emotion classification using segment level decision fusion. In: IEEE international conference on acoustics, speech and signal processing, pp 1286–1290
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294
Article Google Scholar
Schacter DL, Gilbert DT, Wegner DM (2011) Psychology, 2nd edn. Worth Publishers, Basingstoke, p 310
Google Scholar
Soleymani M, Kierkels JJM, Chanel G, Pun T (2009) A bayesian framework for video affective representation. In: International conference on affective computing and intelligent interaction and workshops, pp 1–7
Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55
Article Google Scholar
Tao J, Tan T (2005) Affective computing: A review. In: Affective Computing and Intelligent Interaction, Lecture Notes in Computer Science, vol 3784. Springer, Berlin, pp 981–995
Chapter Google Scholar
Torres-Valencia C, Garcia-Arias H, Alvarez Lopez M, Orozco-gutierrez A (2014) Comparative analysis of physiological signals and electroencephalogram (eeg) for multimodal emotion recognition using generative models. In: Symposium on image, signal processing and artificial vision, pp 1–5
Wagner J, Kim J, Andre E (2005) From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: 2005 IEEE international conference on multimedia and expo, pp 940–943
Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704
Article Google Scholar
Wang J, Zhou Y, Wang H, Yang X, Yang F, Peterson A (2015) Image tag completion by local learning. In: International symposium on neural networks. Springer International Publishing, Switzerland, pp 232–239
Chapter Google Scholar
Wang S, Zhu Y, Wu G, Ji Q (2014) Hybrid video emotional tagging using users’ eeg and video content. Multimedia Tools and Applications 72(2):1257–1283
Article Google Scholar
Yang Y, Huang Z, Yang Y, Liu J, Shen HT, Luo J (2013) Local image tagging via graph regularized joint group sparsity. Pattern Recogn 46(5):1358–1368
Article Google Scholar
Yang Y, Gao Y, Zhang H, Shao J, Chua TS (2014) Image tagging with social assistance. In: Proceedings of international conference on multimedia retrieval. ACM, New York, pp 81–88
Yao X, Han J, Zhang D, Nie F (2017) Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Trans Image Process 26(7):3196–3209
Article MathSciNet Google Scholar
Yi-Hsuan Y, Chen HH (2011) Prediction of the distribution of perceived music emotions using discrete samples. IEEE Trans Audio Speech Lang Process 19(7):2184–2196
Article Google Scholar
Zhang D, Han J, Han J, Shao L (2016) Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Transactions on Neural Networks and Learning Systems 27(6):1163–1176
Article MathSciNet Google Scholar
Zhang D, Han J, Li C, Wang J, Li X (2016) Detection of co-salient objects by looking deep and wide. Int J Comput Vis 120(2):215–232
Article MathSciNet Google Scholar
Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758
Article MathSciNet Google Scholar
Zhang D, Meng D, Han J (2017) Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell 39(5):865–878
Article Google Scholar
Zhuang X, Rozgic V, Crystal M (2014) Compact unsupervised eeg response representation for emotion recognition. In: IEEE-EMBS international conference on biomedical and health informatics, pp 736–739

Download references

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grant 61401357 and the Fundamental Research Funds for the Central Universities under Grants 3102016ZY023.

The authors would like to express their appreciation to Multimedia Vision Group in Queen Mary, University of London, for the help during the first author working in Multimedia Vision Group.

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China
Mo Chen, Gong Cheng & Lei Guo

Authors

Mo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Lei Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gong Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Cheng, G. & Guo, L. Identifying affective levels on music video via completing the missing modality. Multimed Tools Appl 77, 3287–3302 (2018). https://doi.org/10.1007/s11042-017-5125-8

Download citation

Received: 15 March 2017
Revised: 30 July 2017
Accepted: 16 August 2017
Published: 23 August 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5125-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying affective levels on music video via completing the missing modality

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Identifying affective levels on music video via completing the missing modality

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Emotion recognition with EEG-based brain-computer interfaces: a systematic literature review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation