Advertisement

Affective rating ranking based on face images in arousal-valence dimensional space

  • Guo-peng Xu
  • Hai-tang Lu
  • Fei-fei Zhang
  • Qi-rong Mao
Article
  • 27 Downloads

Abstract

In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations. Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions.

Key words

Ordinal ranking Dimensional affect recognition Valence Arousal Facial image processing 

CLC number

TP391 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abousaleh F, Lim T, Cheng W, et al., 2016. A novel comparative deep learning framework for facial age estimation. EURASIP J Image Video Process, 2016(1):47. https://doi.org/10.1186/s13640-016-0151-4 CrossRefGoogle Scholar
  2. Baron-Cohen S, 2004. Mind Reading: the Interactive Guide to Emotions. Jessica Kingsley Publishers.Google Scholar
  3. Bruna J, Mallat S, 2013. Invariant scattering convolution networks. IEEE Trans Patt Anal Mach Intell, 35(8):1872–1886. https://doi.org/10.1109/TPAMI.2012.230 CrossRefGoogle Scholar
  4. Caridakis G, Malatesta L, Kessous L, et al., 2006. Modeling naturalistic affective states via facial and vocal expressions recognition. 8th Int Conf on Multim Interfaces, p.146–154. https://doi.org/10.1145/1180995.1181029 Google Scholar
  5. Chang C, Lin C, 2011. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol, 2(3):27. https://doi.org/10.1145/1961189.1961199 CrossRefGoogle Scholar
  6. Chang K, Chen C, 2015. A learning framework for age rank estimation based on face images with scattering transform. IEEE Trans Image Process, 24(3):785–798. https://doi.org/10.1109/TIP.2014.2387379 MathSciNetCrossRefGoogle Scholar
  7. Chang K, Chen C, Hung Y, 2010. A ranking approach for human ages estimation based on face images. 20th Int Conf on Pattern Recognition, p.3396–3399. https://doi.org/10.1109/ICPR.2010.829 Google Scholar
  8. Feng S, Lang C, Feng J, et al., 2017. Human facial age estimation by cost-sensitive label ranking and trace norm regularization. IEEE Trans Multim, 19(1):136–148. https://doi.org/10.1109/TMM.2016.2608786 MathSciNetCrossRefGoogle Scholar
  9. Geng X, Zhou Z, Smith-Miles K, 2007. Automatic age estimation based on facial aging patterns. IEEE Trans Patt Anal Mach Intell, 29(12):2234–2240. https://doi.org/10.1109/TPAMI.2007.70733 CrossRefGoogle Scholar
  10. Glowinski D, Camurri A, Volpe G, et al., 2008. Technique for automatic emotion recognition by body gesture analysis. Int Conf on Computer Vision and Pattern Recognition Workshops, p.1–6. https://doi.org/10.1109/CVPRW.2008.4563173 Google Scholar
  11. Gunes H, Pantic M, 2010. Automatic, dimensional and continuous emotion recognition. Int J Synth Emot, 1(1):68–99. https://doi.org/10.4018/jse.2010101605 CrossRefGoogle Scholar
  12. He L, Jiang D, Yang L, et al., 2015. Multimodal affective dimension prediction using deep bidirectional long short-term memory recurrent neural networks. 5th Int Workshop on Audio/Visual Emotion Challenge, p.73–80. https://doi.org/10.1145/2808196.2811641 CrossRefGoogle Scholar
  13. Ioannou S, Raouzaiou A, Tzouvaras V, et al., 2005. Emotion recognition through facial expression analysis based on a neurofuzzy network. Neur Netw, 18(4):423–435. https://doi.org/10.1016/j.neunet.2005.03.004 CrossRefGoogle Scholar
  14. Joachims T, 2002. Optimizing search engines using clickthrough data. 8th ACM Int Conf on Knowledge Discovery and Data Mining, p.133–142. https://doi.org/10.1145/775047.775067 Google Scholar
  15. Levi G, Hassncer T, 2015. Age and gender classification using convolutional neural networks. Int Conf on Computer Vision and Pattern Recognition Workshops, p.34–42. https://doi.org/10.1109/CVPRW.2015.7301352 Google Scholar
  16. Li L, Lin H, 2006. Ordinal regression by extended binary classification. Advances in Neural Information Processing Systems, p.865–872.Google Scholar
  17. Lim T, Hua K, Wang H, et al., 2015. VRank: voting system on ranking model for human age estimation. 17th IEEE Int Workshop on Multimedia Signal Processing, p.1–6. https://doi.org/10.1109/MMSP.2015.7340789 Google Scholar
  18. Liu T, 2011. Learning to Rank for Information Retrieval. Springer-Verlag Berlin Heidelberg.CrossRefzbMATHGoogle Scholar
  19. Martinez H, Yannakakis G, Hallam J, 2014. Don’t classify ratings of affect; rank them! IEEE Trans Affect Comput, 5(3):314–326. https://doi.org/10.1109/TAFFC.2014.2352268 CrossRefGoogle Scholar
  20. McDuff D, El Kaliouby R, Kassam K, et al., 2010. Affect valence inference from facial action unit spectrograms. Int Conf on Computer Vision and Pattern Recognition Workshops, p.17–24. https://doi.org/10.1109/CVPRW.2010.5543833 Google Scholar
  21. Nicolaou M, Gunes H, Pantic M, 2010. Audio-visual classification and fusion of spontaneous affective data in likelihood space. 20th Int Conf on Pattern Recognition, p.3695–3699. https://doi.org/10.1109/ICPR.2010.900 Google Scholar
  22. Nicolaou M, Gunes H, Pantic M, 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Trans Affect Comput, 2(2):92–105. https://doi.org/10.1109/T-AFFC.2011.9 CrossRefGoogle Scholar
  23. Ringeval F, Sonderegger A, Sauer J, et al., 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. 10th IEEE Int Conf on Automatic Face and Gesture Recognition Workshops, p.1–8. https://doi.org/10.1109/FG.2013.6553805 Google Scholar
  24. Ringeval F, Schuller B, Valstar M, et al., 2015. AVEC 2015: the 5th International Audio/Visual Emotion Challenge and Workshop. 23rd ACM Int Conf on Multimedia, p.1335–1336. https://doi.org/10.1145/2733373.2806408 Google Scholar
  25. Russell J, 1980. A circumplex model of affect. J Pers Soc Psychol, 39(6):1161–1178. https://doi.org/10.1037/h0077714 CrossRefGoogle Scholar
  26. Scherer K, 2000. Psychological models of emotion. In: Borod J(Ed.), The Neuropsychology of Emotion. Oxford University Press, New York, USA.Google Scholar
  27. Scherer K, Schorr A, Johnstone T, 2001. Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York, USA.Google Scholar
  28. Schuller B, Vlasenko B, Eyben F, et al., 2009. Acoustic emotion recognition: a benchmark comparison of performances. IEEE Workshop on Automatic Speech Recognition and Understanding, p.552–557. https://doi.org/10.1109/ASRU.2009.5372886 Google Scholar
  29. Senechal T, Rapp V, Salam H, et al., 2012. Facial action recognition combining heterogeneous features via multikernel learning. IEEE Trans Syst Man Cybern Part B (Cybern), 42(4):993–1005. https://doi.org/10.1109/TSMCB.2012.2193567 CrossRefGoogle Scholar
  30. Wöllmer M, Eyben F, Reiter S, et al., 2008. Abandoning emotion classes—towards continuous emotion recognition with modelling of long-range dependencies. Interspeech, p.597–600.Google Scholar
  31. Wöllmer M, Metallinou A, Eyben F, et al., 2010a. Contextsensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. Interspeech, p.2362–2365.Google Scholar
  32. Wöllmer M, Schuller B, Eyben F, et al., 2010b. Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J Sel Top Signal Process, 4(5):867–881. https://doi.org/10.1109/JSTSP.2010.2057200 CrossRefGoogle Scholar
  33. Xu J, Li H, 2007. AdaRank: a boosting algorithm for information retrieval. 30th ACM Int Conf on Research and Development in Information Retrieval, p.391–398. https://doi.org/10.1145/1277741.1277809 Google Scholar
  34. Yang Y, Chen H, 2011. Ranking-based emotion recognition for music organization and retrieval. IEEE Trans Audio Speech Lang Process, 19(4):762–774. https://doi.org/10.1109/TASL.2010.2064164 CrossRefGoogle Scholar
  35. Yu C, Aoki P, Woodruff A, 2004. Detecting user engagement in everyday conversations. 8th Int Conf on Spoken Language Processing, p.1329–1332.Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and Communication EngineeringJiangsu UniversityZhenjiangChina

Personalised recommendations