Skip to main content

Emotion Modelling via Speech Content and Prosody: In Computer Games and Elsewhere

  • Chapter
  • First Online:
Emotion in Games

Part of the book series: Socio-Affective Computing ((SAC,volume 4))

Abstract

The chapter describes a typical modern speech emotion recognition engine as can be used to enhance computer games’ or other technical systems’ emotional intelligence. Acquisition of human affect via the spoken content and its prosody and further acoustic features is highlighted. Features for both of these information streams are shortly discussed along chunking of the stream. Decision making with and without training data is presented, each. A particular focus is then laid on autonomous learning and adaptation methods as well as the required calculation of confidence measures. Practical aspects include the encoding of the information, distribution of the processing, and available toolkits. Benchmark performances are given by typical competitive challenges in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baggia P, Burnett DC, Carter J, Dahl DA, McCobb G, Raggett D (2007) EMMA: extensible MultiModal annotation markup language

    Google Scholar 

  2. Banea C, Mihalcea R, Wiebe J (2011) Multilingual sentiment and subjectivity. In: Zitouni I, Bikel D (eds) Multilingual natural language processing. Prentice Hall

    Google Scholar 

  3. Batliner A, Schuller B (2014) More than fifty years of speech processing – the rise of computational paralinguistics and ethical demands. In: Proceedings ETHICOMP 2014. CERNA, Paris, 11p

    Google Scholar 

  4. Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Aharonson V, Kessous L, Amir N (2011) Whodunnit – searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang Spec Issue Affect Speech Real-life Interact 25(1):4–28

    Google Scholar 

  5. Becker C, Nakasone A, Prendinger H, Ishizuka M, Wachsmuth I (2005) Physiologically interactive gaming with the 3D agent max. In: Proceedings international workshop on conversational informatics in conjunction with JSAI-05, Kitakyushu, pp 37–42

    Google Scholar 

  6. Brückner R, Schuller B (2014) Social signal classification using deep BLSTM recurrent neural networks. In: Proceedings 39th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2014, Florence. IEEE, pp 4856–4860

    Google Scholar 

  7. Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) Feeltrace: an instrument for recording perceived emotion in real time. In: Proceedings ISCA workshop on speech and emotion, Newcastle, pp 19–24

    Google Scholar 

  8. Davidov D, Tsur O, Rappoport A (2010) Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings CoNNL, Uppsala, pp 107–116

    Google Scholar 

  9. Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Proceedings ICSLP, Philadelphia, pp 1970–1973

    Google Scholar 

  10. Deng J, Schuller B (2012) Confidence measures in speech emotion recognition based on semi-supervised learning. In: Proceedings of INTERSPEECH. ISCA, Portland

    Google Scholar 

  11. Deng J, Zhang Z, Schuller B (2014) Linked source and target domain subspace feature transfer learning – exemplified by speech emotion recognition. In: Proceedings 22nd international conference on pattern recognition (ICPR 2014). IAPR, Stockholm, pp 761–766

    Chapter  Google Scholar 

  12. Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (eds) (2013) Proceedings emotion recognition in the wild challenge and workshop. ACM, Sydney

    Google Scholar 

  13. Döring S, Goldie P, McGuinness S (2011) Principalism: a method for the ethics of emotion-oriented machines. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems: the HUMAINE handbook, cognitive technologies. Springer, Berlin/Heidelberg, pp 713–724

    Chapter  Google Scholar 

  14. Elfenbein HA, Mandal MK, Ambady N, Harizuka S, Kumar S (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull 128(2):236–242

    Article  Google Scholar 

  15. Eyben F, Weninger F, Groß F, Schuller B (2013) Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, MM 2013. ACM, Barcelona, pp 835–838

    Google Scholar 

  16. Eyben F, Wöllmer M, Schuller B (2012) A multi-task approach to continuous five-dimensional affect sensing in natural speech. ACM Trans Interact Intell Syst Spec Issue Affect Interact Nat Environ 2(1):29

    Google Scholar 

  17. Gao Y, Bianchi-Berthouze N, Meng H (2012) What does touch tell us about emotions in touchscreen-based gameplay? ACM Trans Comput-Hum Interact 19(4/31):1–30

    Article  Google Scholar 

  18. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11

    Google Scholar 

  19. Holmgard C, Yannakakis G, Karstoft KI, Andersen H (2013) Stress detection for PTSD via the StartleMart game. In: Proceedings of 2013 humaine association conference on affective computing and intelligent interaction (ACII). IEEE, Memphis, pp 523–528

    Chapter  Google Scholar 

  20. Hudlicka E (2009) Affective game engines: motivation and requirements. In: Proceedings of the 4th international conference on foundations of digital games. ACM, New York, 9p

    Google Scholar 

  21. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning. Springer, Heidelberg/Chemnitz, pp 137–142

    Chapter  Google Scholar 

  22. Johnstone T (1996) Emotional speech elicited using computer games. In: Proceedings ICSLP, Philadelphia, 4p

    Book  Google Scholar 

  23. Johnstone T, van Reekum CM, Hird K, Kirsner K, Scherer KR (2005) Affective speech elicited with a computer game. Emotion 5:513–518

    Article  PubMed  Google Scholar 

  24. Kim J, Bee N, Wagner J, André E (2004) Emote to win: affective interactions with a computer game agent. In: Lecture notes in informatics (LNI) – proceedings 01/2004, vol 50. Springer, pp 159–164

    Google Scholar 

  25. Kim Y, Lee H, Mower-Provost E (2013) Deep learning for robust feature generation in audiovisual emotion recognition. In: Proceedings the 2nd CHiME workshop on machine listening in multisource environments held in conjunction with ICASSP 2013, Vancouver. IEEE, pp 86–90

    Google Scholar 

  26. Liscombe J, Hirschberg J, Venditti JJ (2005) Detecting certainness in spoken tutorial dialogues. In: Proceedings INTERSPEECH. ISCA, Lisbon, pp 1837–1840

    Google Scholar 

  27. Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In: Proceedings ASRU, Virgin Island. IEEE, pp 25–30

    Google Scholar 

  28. Mahdhaoui A, Chetouani M (2009) A new approach for motherese detection using a semi-supervised algorithm. In: Machine learning for signal processing XIX – Proceedings of the 2009 IEEE signal processing society workshop, MLSP 2009, Grenoble. IEEE, pp 1–6

    Google Scholar 

  29. Martyn C, Sutherland JJ (2005) Creating an emotionally reactive computer game responding to affective cues in speech. In: Proceedings HCI, Las Vegas, vol 2, pp 1–2

    Google Scholar 

  30. Metze F, Batliner A, Eyben F, Polzehl T, Schuller B, Steidl S (2010) Emotion recognition using imperfect speech recognition. In: Proceedings INTERSPEECH. ISCA, Makuhari, pp 478–481

    Google Scholar 

  31. Missen M, Boughanem M (2009) Using WordNet’s semantic relations for opinion detection in blogs. In: Advances in information retrieval. Lecture notes in computer science, vol 5478/2009. Springer, Berlin, pp 729–733

    Google Scholar 

  32. Mower E, Mataric MJ, Narayanan SS (2011) A framework for automatic human emotion classification using emotion profiles. IEEE Trans Audio Speech Lang Process 19:1057–1070

    Article  Google Scholar 

  33. Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP J Audio Speech Music Process 2009:1–23

    Article  Google Scholar 

  34. Park S, Sim H, Lee W (2014) Dynamic game difficulty control by using EEG-based emotion recognition. Int J Control Autom 7:267–272

    Article  Google Scholar 

  35. Ploog BO, Banerjee S, Brooks PJ (2009) Attention to prosody (intonation) and content in children with autism and in typical children using spoken sentences in a computer game. Res Autism Spectr Disord 3:743–758

    Article  Google Scholar 

  36. Polzehl T, Schmitt A, Metze F (2010) Approaching multi-lingual emotion recognition from speech – on language dependency of acoustic/prosodic features for anger detection. In: Proceedings speech prosody, Chicago. ISCA

    Google Scholar 

  37. Ringeval F, Eyben F, Kroupi E, Yuce A, Thiran JP, Ebrahimi T, Lalanne D, Schuller B (2015) Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognit Lett 66:10

    Article  Google Scholar 

  38. Rudra T, Kavakli M, Tien D (2007) Emotion detection from female speech in computer games. In: Proceedings of TENCON 2007 – 2007 IEEE region 10 conference, Taipei. IEEE, pp 712–716

    Google Scholar 

  39. Sauter DA, Eisner F, Ekman P, Scott SK (2010) Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proc Natl Acad Sci USA 107(6):2408–2412

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Scherer KR (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256

    Article  Google Scholar 

  41. Scherer S, Hofmann H, Lampmann M, Pfeil M, Rhinow S, Schwenker F, Palm G (2008) Emotion recognition from speech: stress experiment. In: Proceedings of the international conference on language resources and evaluation, LREC 2008, Marrakech. ELRA, 6p

    Google Scholar 

  42. Schröder M, Devillers L, Karpouzis K, Martin JC, Pelachaud C, Peter C, Pirker H, Schuller B, Tao J, Wilson I (2007) What should a generic emotion markup language be able to represent? In: Paiva A, Prada R, Picard RW (eds) Proceedings of ACII. Springer, Berlin/Heidelberg, pp 440–451

    Google Scholar 

  43. Schuller B (2012) The computational paralinguistics challenge. IEEE Signal Process Mag 29(4):97–101

    Article  Google Scholar 

  44. Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley, New York

    Book  Google Scholar 

  45. Schuller B, Devillers L (2010) Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. In: Proceedings INTERSPEECH, Makuhari. ISCA, pp 2794–2797

    Google Scholar 

  46. Schuller B, Dunwell I, Weninger F, Paletta L (2013) Serious gaming for behavior change – the state of play. IEEE Pervasive Comput Mag 12(3):48–55

    Article  Google Scholar 

  47. Schuller B, Knaup T (2011) Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito A, Esposito AM, Martone R, Müller V, Scarpetta G (eds) Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues: third COST 2102 international training school. Lecture notes on computer science (LNCS), vol 6456/2010, 1st edn. Springer, Heidelberg, pp 448–472

    Google Scholar 

  48. Schuller B, Marchi E, Baron-Cohen S, Lassalle A, O’Reilly H, Pigat D, Robinson P, Davies I, Baltrusaitis T, Mahmoud M, Golan O, Friedenson S, Tal S, Newman S, Meir N, Shillo R, Camurri A, Piana S, Staglianò A, Bölte S, Lundqvist D, Berggren S, Baranger A, Sullings N, Sezgin M, Alyuz N, Rynkiewicz A, Ptaszek K, Ligmann K (2015) Recent developments and results of ASC-inclusion: an integrated internet-based environment for social inclusion of children with autism spectrum conditions. In: Proceedings of the of the 3rd international workshop on intelligent digital games for empowerment and inclusion (IDGEI 2015) as part of the 20th ACM international conference on intelligent user interfaces, IUI 2015, Atlanta. ACM, 9p

    Google Scholar 

  49. Schuller B, Rigoll G (2006) Timing levels in segment-based speech emotion recognition. In: Proceedings of INTERSPEECH, Pittsburgh. ISCA, pp 1818–1821

    Google Scholar 

  50. Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: Proceedings of INTERSPEECH, Brighton. ISCA, pp 312–315

    Google Scholar 

  51. Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, Mortillaro M, Salamin H, Polychroniou A, Valente F, Kim S (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH, Lyon. ISCA, pp 148–152

    Google Scholar 

  52. Schuller B, Zhang Z, Weninger F, Burkhardt F (2012) Synthesized speech for model training in cross-corpus recognition of human emotion. Int J Speech Technol Spec Issue New Improv Adv Speak Recognit Technol 15(3):313–323

    Google Scholar 

  53. Shahid S, Krahmer E, Swerts M (2007) Audiovisual emotional speech of game playing children: effects of age and culture. In: Proceedings of INTERSPEECH, Antwerp, pp 2681–2684

    Google Scholar 

  54. Shaver PR, Wu S, Schwartz JC (1992) Cross-cultural similarities and differences in emotion and its representation: a prototype approach. Emotion 175–212

    Google Scholar 

  55. Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J, Hirschberg J (1992) ToBI: a standard for labeling English prosody. In: Proceedings of ICSLP, Banff, pp 867–870

    Google Scholar 

  56. Sneddon I, Goldie P, Petta P (2011) Ethics in emotion-oriented systems: the challenges for an ethics committee. In: Petta P, Pelachaud C, Cowie R (eds) Emotion-oriented systems: the HUMAINE handbook, cognitive technologies. Springer, Berlin/Heidelberg, pp 753–768

    Chapter  Google Scholar 

  57. Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings 36th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2011, Prague. IEEE, pp 5688–5691

    Chapter  Google Scholar 

  58. Vogt T, André E, Bee N (2008) Emovoice – a framework for online recognition of emotions from voice. In: Proceedings IEEE PIT, Kloster Irsee. Lecture notes in computer science, vol 5078. Springer, pp 188–199

    Google Scholar 

  59. Weninger F, Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognition of non-prototypical emotions in reverberated and noisy speech by non-negative matrix factorization. EURASIP J Adv Signal Process Spec Issue Emot Ment State Recognit Speech 2011:Article ID 838790

    Google Scholar 

  60. Weninger FJ, Watanabe S, Tachioka Y, Schuller B (2014) Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In: Proceedings 39th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2014, Florence. IEEE, pp 4656–4660

    Google Scholar 

  61. Wöllmer M, Schuller B, Eyben F, Rigoll G (2010) Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE J Select Top Signal Process Spec Issue Speech Process Nat Interact Intell Environ 4(5):867–881

    Article  Google Scholar 

  62. Yildirim S, Lee C, Lee S, Potamianos A, Narayanan S (2005) Detecting politeness and Frustration state of a child in a conversational computer game. In: Proceedings of INTERSPEECH, Lisbon. ISCA, pp 2209–2212

    Google Scholar 

  63. Yildirim S, Narayanan S, Potamianos A (2011) Detecting emotional state of a child in a conversational computer game. Comput Speech Lang 25:29–44

    Article  Google Scholar 

  64. Zhang Z, Coutinho E, Deng J, Schuller B (2015) Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans Audio Speech Lang Process 23(1):115–126

    CAS  Google Scholar 

  65. Zhang Z, Coutinho E, Deng J, Schuller B (2015) Distributing recognition in computational paralinguistics. IEEE Trans Affect Comput

    Google Scholar 

  66. Zhang Z, Deng J, Marchi E, Schuller B (2013) Active learning by label uncertainty for acoustic emotion recognition. In: Proceedings of INTERSPEECH, Lyon. ISCA, pp 2841–2845

    Google Scholar 

  67. Zhang Z, Deng J, Schuller B (2013) Co-training succeeds in computational paralinguistics. In: Proceedings 38th IEEE international conference on acoustics, speech, and signal processing, ICASSP 2013, Vancouver. IEEE, pp 8505–8509

    Chapter  Google Scholar 

Download references

Acknowledgements

The author acknowledges the support of the European Union’s Horizon 2020 Framework Programme under grant agreement no. 645378 (ARIA-VALUSPA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schuller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Schuller, B. (2016). Emotion Modelling via Speech Content and Prosody: In Computer Games and Elsewhere. In: Karpouzis, K., Yannakakis, G. (eds) Emotion in Games. Socio-Affective Computing, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-41316-7_5

Download citation

Publish with us

Policies and ethics