Skip to main content
Log in

Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

This study focuses on the perception of emotion and attitude in speech. The ability to identify vocal expressions of emotion and/or attitude in speech material was investigated. Systematic perception experiments were carried out to determine optimal values for the acoustic parameters: pitch level, pitch range and speech rate. Speech was manipulated by varying these parameters around the values found in a selected subset of the speech material which consisted of two sentences spoken by a male speaker expressing seven emotions or attitudes: neutrality, joy, boredom, anger, sadness, fear, and indignation. Listening tests were carried out with this speech material, and optimal values for pitch level, pitch range, and speech rate were derived for the generation of speech expressing emotion or attitude, from a neutral utterance. These values were perceptually tested in re-synthesized speech and in synthetic speech generated from LPC-coded diphones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bartneck, C.: How convincing is Mr. Data's smile: Affective expressions of machines (in this issue).

  • Beckman, M. E.: 1997, Speech models and speech synthesis. In: J. P. H. van Santen, R. W. Sproat, J. P. Olive and J. Hirschberg (eds.) Progress in speech synthesis. Springer-Verlag, New York, pp. 185-209.

    Google Scholar 

  • Bezooijen, R. A. M. G. van: 1984, The characteristics and recognizability of vocal expression of emotion. Foris, Dordrecht, The Netherlands.

    Google Scholar 

  • Bianchi-Berthouse, N. and Lisetti C. L.: Modeling multimodal expression of user's affective subjective experience (in this issue).

  • Bouwhuis, D. G.: 1974, The recognition of attitudes in speech. IPO Annual Progress Report 9, pp. 82-86.

    Google Scholar 

  • Cahn, J. E.: 1990, Generating expression in synthesized speech. Technical report, MIT Media Lab., Boston.

    Google Scholar 

  • Carlson, R.: 1991, Synthesis: modelling variability and constraints. Proceedings Eurospeech'91, Genova, Italy 3, pp. 1043-1048.

    Google Scholar 

  • Carlson, R., Granström, B., and Nord, L.: 1992, Experiments with emotive speech: acted utterances and synthesized replicas. Proceedings ICSLP 92. Banff, Alberta, Canada, 1, pp. 671-674.

    Google Scholar 

  • Charpentier, F., and Moulines, E.: 1989, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Proceedings Eurospeech'89. Paris, France, 2, pp. 13-19.

    Google Scholar 

  • Collier, R.: 1991, Multi-language intonation synthesis. Journal of Phonetics 19, pp. 61-73.

    Google Scholar 

  • Cosmides, L.: 1983, Invariances in the acoustic expression of emotion during speech. Journal of Experimental Psychology: Human Perception and Performance 9, pp. 864-881.

    Google Scholar 

  • Cummings, K. E., and Clements, M. A.: 1995, Analysis of the glottal excitation of emotionally styled and stressed speech. Journal of the Acoustical Society of America 98(1), pp. 88-98.

    Google Scholar 

  • Ekman, P.: 1982, Emotion in the human face, second edition. Cambridge University Press, New York.

    Google Scholar 

  • Fairbanks, G. and Pronovost, W.: 1939, An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monographs 6, pp. 87-104.

    Google Scholar 

  • Frick, R. W.: 1985, Communicating emotion: the role of prosodic features. Psychological Bulletin 97, pp. 412-429.

    Google Scholar 

  • Frijda, N. H.: 1986, The emotions. Cambridge University Press, Cambridge, England.

    Google Scholar 

  • Hart, J. 't, Collier, R. and Cohen, A.: 1990, A perceptual study of intonation. Cambridge University Press, Cambridge.

    Google Scholar 

  • Hermes, D. J.: 1988, Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America 83, pp. 257-264.

    Google Scholar 

  • Hermes, D. J.: 1990, ‘Vowel-onset detection.’ Journal of the Acoustical Society of America 87(2), pp. 866-873.

    Google Scholar 

  • House, D.: 1990, Tonal perception in speech. Lund University Press, Lund.

    Google Scholar 

  • Izard, C. E.: 1977, Human emotions. Plenum Press, New York.

    Google Scholar 

  • Kitahara, Y. and Tohkura, Y.: 1992, Prosodic control to express emotions for man-machine interaction. IEICE Transactions on Fundamentals of Electronics, communications and computer sciences 75, pp. 155-163.

    Google Scholar 

  • Ladd, D. R., Silverman, K. E. A., Tolkmitt, F., Bergman, G. and Scherer, K. R.: 1985, ‘Evidence for the independent function of intonation contour type, voice quality, and F0 range in signalling speaker affect.’ Journal of the Acoustical Society of America 78, pp. 435-444.

    Google Scholar 

  • Laukkanen, A.-M., Vilkman, E., Alku, P. and Oksanen H.: 1997, On the perception of emotions in speech: the role of voice quality. Journal of Logopedics and Phoniatrics Vocology 22(4), pp. 157-168.

    Google Scholar 

  • Leinonen, L., Hiltunen, T., Linnankoski, I. and Laakso, M.-L.: 1997, ‘Expression of emotional-motivational connotations with a one-word utterance.’ Journal of the Acoustical Society of America 102(3), pp. 1853-1863.

    Google Scholar 

  • Lieberman, P. and Michaels, S. B.: 1962, Some aspects of fundamental frequency and envelope amplitude as related to emotional content of speech. Journal of the Acoustical Society of America 34, pp. 922-927.

    Google Scholar 

  • Lisetti, C. L.: 1999, A user model of emotion-cognition. Proceedings of the workshop on attitude, personality, and emotions in user-adapted interaction, at the 7th International Conference on User Modeling (UM'99). Banff, Canada.

  • Mozziconacci, S. J. L.: 1998, Speech variability and emotion: Production and perception. Technical University Eindhoven, The Netherlands.

    Google Scholar 

  • Mozziconacci, S. J. L., and Hermes, D. J.: 1999, Role of intonation patterns in conveying emotion in speech. Proceedings ICPhS 99. San Francisco, USA.

  • Murray, I. R.: 1989, Simulating emotion in synthetic speech. University of Dundee, Scotland, UK.

    Google Scholar 

  • Murray, I. R. and Arnott, J. L.: 1993, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93, pp. 1097-1108.

    Google Scholar 

  • Pijper, J.-R. de: 1983, Modelling British English intonation: an analysis by resynthesis of British English intonation. Foris, Dordrecht, The Netherlands.

    Google Scholar 

  • Plutchik, R.: 1980, Emotion: a psychoevolutionary synthesis. Harper & Row, New York.

    Google Scholar 

  • Protopapas, A. and Lieberman, P.: 1997, Fundamental frequency of phonation and perceived emotional stress. Journal of the Acoustical Society of America 101(4), pp. 2267-2277.

    Google Scholar 

  • Rijnsoever, P. van: 1988, A multilingual text-to-speech system. IPO Annual Progress Report 23, 34-39.

    Google Scholar 

  • Rosis, F. de, and Grasso, F.: in press, Affective natural language generation. In: A. Paiva (ed.): Affect in interaction. Springer LNAI Series, in press.

  • Siegwart, H. and Scherer, K. R.: 1995, Acoustic concomitants of emotional expression in operatic singing: the case of Lucia in Ardi gli incensi. Journal of Voice 9(3), pp. 249-260.

    Google Scholar 

  • Verhelst, W. and Borger, M.: 1991, Intra-speaker transplantation of speech characteristics: an application of waveform vocoding techniques and DTW Proceedings Eurospeech'91. Genova, Italy, 3, pp. 1319-1322.

    Google Scholar 

  • Williams, C. E. and Stevens, K. N.: 1972, Emotions and speech: some acoustical factors. Journal of the Acoustical Society of America 52, pp. 1238-1250.

    Google Scholar 

  • Zelle, H. W., Pijper, J.-R. de and Hart, J. 't: 1984, Semi-automatic synthesis of intonation for Dutch and British English. Proceedings Xth ICPhS. Utrecht, The Netherlands, IIB, pp. 247-251.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mozziconacci, S.J.L. Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values. User Modeling and User-Adapted Interaction 11, 297–326 (2001). https://doi.org/10.1023/A:1011800417621

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011800417621

Navigation