Global Affective Video Content Regression Based on Complementary Audio-Visual Features

Guo, Xiaona; Zhong, Wei; Ye, Long; Fang, Li; Heng, Yan; Zhang, Qin

doi:10.1007/978-3-030-37734-2_44

Xiaona Guo¹⁶,
Wei Zhong¹⁶,
Long Ye¹⁶,
Li Fang¹⁶,
Yan Heng¹⁷ &
…
Qin Zhang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11962))

Included in the following conference series:

International Conference on Multimedia Modeling

2244 Accesses
1 Citations

Abstract

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61801440 and 61631016, and the Fundamental Research Funds for the Central Universities under Grant Nos. 2018XNG1824 and YLSZ180226.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baveye, Y., Chamaret, C., Dellandréa, E., Chen, L.M.: Affective video content analysis: a multidisciplinary insight. IEEE Trans. Affect. Comput. 9(4), 396–409 (2018)
Article Google Scholar
Baveye, Y., Dellandréa, E., Chamaret, C., Chen, L.M.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015)
Article Google Scholar
Sjöberg, M., Baveye, Y., Wang, H.L., Quang, V.L., Ionescu, B., et al.: The MediaEval 2015 affective impact of movies task. In: MediaEval (2015)
Google Scholar
Dellandréa, E., Chen, L.M., Baveye, Y., Sjöberg, M.V., Chamaret, C.: The MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
Google Scholar
Chen, S.Z., Jin, Q.: RUC at MediaEval 2016 emotional impact of movies task: fusion of multimodal features. In: MediaEval (2016)
Google Scholar
Liu, Y., Gu, Z.L., Zhang, Y., Liu, Y.: Mining emotional features of movies. In: MediaEval (2016)
Google Scholar
Ma, Y., Ye, Z.P., Xu, M.X.: THU-HCSI at MediaEval 2016: emotional impact of movies task. In: MediaEval (2016)
Google Scholar
Jan, A., Gaus, Y.F.B.A., Meng, H.Y., Zhang, F.: BUL in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
Google Scholar
Timoleon, A.T., Hadjileontiadis, L.J.: AUTH-SGP in MediaEval 2016 emotional impact of movies task. In: MediaEval (2016)
Google Scholar
Yi, Y., Wang, H.L.: Multi-modal learning for affective content analysis in movies. Multimedia Tools Appl. 78(10), 13331–13350 (2019)
Article Google Scholar
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., Andre, E., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. Barcelona, Spain (2016)
Google Scholar
Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 776–780. New Orleans, USA (2017)
Google Scholar
Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., et al.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135. New Orleans, USA (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. San Diego, USA (2015)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)
MathSciNet MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Media Audio and Video, Ministry of Education, Communication University of China, Beijing, 100024, China
Xiaona Guo, Wei Zhong, Long Ye, Li Fang & Qin Zhang
Shanghai Radio Equipment Research Institute, Shanghai, 201109, China
Yan Heng

Authors

Xiaona Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Long Ye
View author publications
You can also search for this author in PubMed Google Scholar
Li Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Heng
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhong .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, X., Zhong, W., Ye, L., Fang, L., Heng, Y., Zhang, Q. (2020). Global Affective Video Content Regression Based on Complementary Audio-Visual Features. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-37734-2_44
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics