Recognizing Emotions Based on Human Actions in Videos

  • Guolong Wang
  • Zheng QinEmail author
  • Kaiping Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10133)


Systems for automatic analysis of videos are in high demands as videos are expanding rapidly on the Internet and understanding of the emotions carried by the videos (e.g. “anger”, “happiness”) are becoming a hot topic. While existing affective computing model mainly focusing on facial expression recognition, little attempts have been made to explore the relationship between emotion and human action. In this paper, we propose a comprehensive emotion classification framework based on spatio-temporal volumes built with human actions. To each action unit we get before, we use Dense-SIFT as descriptor and K-means to form histograms. Finally, the histograms are sent to the mRVM and recognizing the human emotion. The experiment results show that our method performs well on FABO dataset.


Emotion Action Spatio-temporal volumes mRVM 


  1. 1.
    Yan, L., Wen, X., Zhang, L., Son, Y.: The application of unascertained measure to the video emotion type recognition. In: Signal Processing Systems (ICSPS) (2010)Google Scholar
  2. 2.
    Irie, G., Satou, T., Kojima, A., Yamasaki, T.: Affective audio-visual words and latent topic driving model for realizing movie affective scene classification. IEEE Trans. Multimed. 12(6), 523–535 (2010)CrossRefGoogle Scholar
  3. 3.
    Hanjalic, A., Xu, L.Q.: Affective video content representation and modeling. IEEE Trans. Multimed. 7(1), 143–154 (2005)CrossRefGoogle Scholar
  4. 4.
    Kleinsmith, A., Bianchi-Berthouze, N.: Affective body expression perception and recognition: a survey. IEEE Trans. Affect. Comput. 4(1), 15–33 (2013)CrossRefGoogle Scholar
  5. 5.
    Karg, M., Samadani, A., Gorbet, R., Kuhnlenz, K., Hoey, J., Kulić, D.: Body movements for affective expression: a survey of automatic recognition and generation. IEEE Trans. Affect. Comput. 4(4), 341–359 (2013)CrossRefGoogle Scholar
  6. 6.
    Samadani, A.-A., Gorbet, R., Kulić, D.: Affective movement recognition based on generative and discriminative stochastic dynamic models. IEEE Trans. Hum.-Mach. Syst. 44(4), 454–467 (2014)CrossRefGoogle Scholar
  7. 7.
    Barrett, L.F.: Solving the emotion paradox: categorization and the experience of emotion. Pers. Soc. Psychol. Rev. 10(1), 20–46 (2006)CrossRefGoogle Scholar
  8. 8.
    Binali, H., Potdar, V.: Emotion detection state of the art. In: Proceedings of the CUBE International Information Technology Conference, pp. 501–507 (2012)Google Scholar
  9. 9.
    Gunes, H., Pantic, M.: Automatic dimensional and continuous emotion recognition. Int. J. Synth. Emot. 1(1), 68–99 (2010)CrossRefGoogle Scholar
  10. 10.
    Calvo, R.A., D’Mello, S.: Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1(1), 18–37 (2010)CrossRefGoogle Scholar
  11. 11.
    Plutchik, R.: The nature of emotions. Am. Sci. 89(4), 344–350 (2001)CrossRefGoogle Scholar
  12. 12.
    Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. IEEE Comput. Soc. 1(1), 1148–1153 (2006)Google Scholar
  13. 13.
    Binali, H., Perth, W.A., Wu, C., Potdar V.: Computational approaches for emotion detection in text. In: IEEE International Conference on Digital Ecosystems and Technologies (DEST) (2010)Google Scholar
  14. 14.
    Aldelson, E., Bergen, J.R.: Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. 2(2), 284–299 (1985)CrossRefGoogle Scholar
  15. 15.
    Korimilli, K., Sarkar, S.: Motion segmentation based on perceptual organization of spatio-temporal volumes. In: Proceedings of the 15th International Conference on Pattern Recognition (ICPR) (2000)Google Scholar
  16. 16.
    Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  17. 17.
    Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  18. 18.
    Wang, J., Xu, Z., Xu, Q.: Video volume segmentation for event detection. In: International Conference on Computer Graphics, Imaging and Visualization (CGIV) (2009)Google Scholar
  19. 19.
    Loog, M., Lauze, F.: The improbability of Harris interest points. IEEE Trans. Pattern Anal. Mach. Intell. 32(6), 1141–1147 (2010)CrossRefGoogle Scholar
  20. 20.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  21. 21.
    Damoulas, T., et al.: Inferring sparse kernel combinations and relevance vectors: an application to subcellular localization of proteins. In: Proceedings of the 7th International Conference on Machine Learing and Applications, pp. 577–582. IEEE Computer Society (2008)Google Scholar
  22. 22.
    Jiang, Y.G.: Predicting emotions in user-generated videos. In: AAAI Conference on Artificial Intelligence (2014)Google Scholar
  23. 23.
    Lefevre, S., Vincent, N.: Efficient and robust shot change detection. J. Real-Time Image Proc. 2(1), 23–24 (2007)CrossRefGoogle Scholar
  24. 24.
    Samadani, A., Ghodsi, A., Kulić, D.D.: Discriminative functional analysis of human movements. Pattern Recogn. Lett. 34(15), 1829–1839 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of SoftwareTsinghua UniversityBeijingChina

Personalised recommendations