Recognising Expression in Speech for Human Computer Interaction

  • T. S. Shikler
  • P. Robinson


Human-computer interaction and human-human communication via computer interfaces have become a major part of our lives, but still lack the basic means of recognising and responding to non-verbal cues of attitudes, emotions and mental states, that we take for granted in human communication. They fail to appreciate the users’ reactions and intentions. This problem is more acute in speech interfaces, used by the general population and specifically by people with degraded motor abilities. In these systems speech is used to convey commands and data, while natural behaviour also uses speech for thinking out loud, expressions of frustration, misunderstanding, discomfort, and more. Most of these functions relate to nuances of expressions, and some of them are obvious only in speech.


Facial Expression Speech Signal Galvanic Skin Response Speech Rate Human Computer Interface 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bechara A, Damasio H, Tranel D, Damasio AR (1997) Deciding advantageously before knowing the advantageous strategy. Science 275(5304): 1293–1295CrossRefGoogle Scholar
  2. Boersma P (1993) Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of the Institute of Phonetic Sciences, AmsterdamGoogle Scholar
  3. Cohn JF, Katz GS (1998) Bimodal expression of emotion by face and voice. In: Workshop on Face / Gesture Recognition and Their Applications, The Sixth ACM International Multimedia Conference, Bristol, UKGoogle Scholar
  4. Cornelius R, Cowie R (2003) Describing the emotional states that are expressed in speech. Speech Communication, 59Google Scholar
  5. Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1): 32–80CrossRefGoogle Scholar
  6. Dellaert F, Polzin Th, Waibel A (1996) Recognizing emotions in speech. ICSLP 96Google Scholar
  7. Delier JRJ, Proakis JG, Hansen JHL (1993) Discrete-time processing of speech signals. New York: Macmillan Publishing CompanyGoogle Scholar
  8. Ekman P (1999) Basic emotion. In: Handbook of cognition and emotion, Wiley, Chichester, UKGoogle Scholar
  9. Fernandez R, Picard RW (2003) Modeling drivers’ speech under stress. Speech Communication 40: 145–159MATHCrossRefGoogle Scholar
  10. Guojun Z, Hansen JHL, Kaiser JF (1998) Classification of speech under stress based on features derived from the nonlinear Teager energy operator. In: Proceedings of the ICASSP ’98, New York, USAGoogle Scholar
  11. Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design, and results. Interacting with Computers, 14(2): 119–140CrossRefGoogle Scholar
  12. Lisetti CL, Schiano DJ (2000) Automatic facial expression interpretation: Where human-computer interaction, artificial intelligence and cognitive sciences intersect. Pragmatics & cognition, 8(1)Google Scholar
  13. Moore CA, Cohn JF, Katz GS (1994) Quantitative description and differentiation of fundamental frequency contours. Computer Speech & Language, 8(4): 385–404CrossRefGoogle Scholar
  14. Mozziconacci SJL (2001) Modeling emotion and attitude in speech by means of perceptually based parameter values. User Modeling & User-Adapted Interaction, 11(4): 297–326MATHCrossRefGoogle Scholar
  15. Murray IR, Arnott JL (1993) Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America, 93(2): 1097–1108CrossRefGoogle Scholar
  16. Nass B, Reeves C (1996) The media equation. Cambridge University Press, Cambridge, UKGoogle Scholar
  17. Oudeyer PY, (2003) The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction 59(1–2): 1–2Google Scholar
  18. Petrushin V (1999) Emotion in speech: Recognition and application to call centers. Intelligent Engineering Systems Through Artificial Neural Networks, ASME PressGoogle Scholar
  19. Picard RW, Klein J (2002) Computers that recognise and respond to user emotion: theoretical and practical implications. Interacting with Computers, 14(2): 141–169CrossRefGoogle Scholar
  20. Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis & Machine Intelligence, 23(10): 1175–1191CrossRefGoogle Scholar
  21. Polzin T, Waibel A (2000) Emotion-sensitive human-computer interfaces. In: ISCA Workshop on Speech and Emotion, Belfast, UKGoogle Scholar
  22. Scherer KR (2000) Emotion effects on voice and speech: Paradigms and approaches to evaluation. In: ISCA Workshop on Speech and Emotion, Belfast, UKGoogle Scholar
  23. Shen JL, Hung JW, Lee LS (1998) Robust entropy-based endpoint detection for speech recognition in noisy environments. International Conference on Spoken Language Processing, Sydney, AustraliaGoogle Scholar
  24. Wierzbicka A (2000) The semantics of human facial expressions. Pragmatics & cognition, 8(1)Google Scholar
  25. Yacoob Y, Davis LS (1994) Recognizing human facial expressions. Image Understanding Workshop. In: Proceedings. San Francisco, CA, USAGoogle Scholar
  26. Yan Li FY, Ying-Qing X, Chang E, Heung-Yeung S (2001) Speech driven cartoon animation with emotions. In: ACM Multimedia 2001, Ottawa, CanadaGoogle Scholar
  27. Zhao WW, Ogunfunmi T (1999) Formant and pitch detection using time-frequency distribution. International Journal of Speech Technology 3(1): 35–49CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2004

Authors and Affiliations

  • T. S. Shikler
  • P. Robinson

There are no affiliations available

Personalised recommendations