Abstract
This study focuses on the perception of emotion and attitude in speech. The ability to identify vocal expressions of emotion and/or attitude in speech material was investigated. Systematic perception experiments were carried out to determine optimal values for the acoustic parameters: pitch level, pitch range and speech rate. Speech was manipulated by varying these parameters around the values found in a selected subset of the speech material which consisted of two sentences spoken by a male speaker expressing seven emotions or attitudes: neutrality, joy, boredom, anger, sadness, fear, and indignation. Listening tests were carried out with this speech material, and optimal values for pitch level, pitch range, and speech rate were derived for the generation of speech expressing emotion or attitude, from a neutral utterance. These values were perceptually tested in re-synthesized speech and in synthetic speech generated from LPC-coded diphones.
Similar content being viewed by others
References
Bartneck, C.: How convincing is Mr. Data's smile: Affective expressions of machines (in this issue).
Beckman, M. E.: 1997, Speech models and speech synthesis. In: J. P. H. van Santen, R. W. Sproat, J. P. Olive and J. Hirschberg (eds.) Progress in speech synthesis. Springer-Verlag, New York, pp. 185-209.
Bezooijen, R. A. M. G. van: 1984, The characteristics and recognizability of vocal expression of emotion. Foris, Dordrecht, The Netherlands.
Bianchi-Berthouse, N. and Lisetti C. L.: Modeling multimodal expression of user's affective subjective experience (in this issue).
Bouwhuis, D. G.: 1974, The recognition of attitudes in speech. IPO Annual Progress Report 9, pp. 82-86.
Cahn, J. E.: 1990, Generating expression in synthesized speech. Technical report, MIT Media Lab., Boston.
Carlson, R.: 1991, Synthesis: modelling variability and constraints. Proceedings Eurospeech'91, Genova, Italy 3, pp. 1043-1048.
Carlson, R., Granström, B., and Nord, L.: 1992, Experiments with emotive speech: acted utterances and synthesized replicas. Proceedings ICSLP 92. Banff, Alberta, Canada, 1, pp. 671-674.
Charpentier, F., and Moulines, E.: 1989, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Proceedings Eurospeech'89. Paris, France, 2, pp. 13-19.
Collier, R.: 1991, Multi-language intonation synthesis. Journal of Phonetics 19, pp. 61-73.
Cosmides, L.: 1983, Invariances in the acoustic expression of emotion during speech. Journal of Experimental Psychology: Human Perception and Performance 9, pp. 864-881.
Cummings, K. E., and Clements, M. A.: 1995, Analysis of the glottal excitation of emotionally styled and stressed speech. Journal of the Acoustical Society of America 98(1), pp. 88-98.
Ekman, P.: 1982, Emotion in the human face, second edition. Cambridge University Press, New York.
Fairbanks, G. and Pronovost, W.: 1939, An experimental study of the pitch characteristics of the voice during the expression of emotion. Speech Monographs 6, pp. 87-104.
Frick, R. W.: 1985, Communicating emotion: the role of prosodic features. Psychological Bulletin 97, pp. 412-429.
Frijda, N. H.: 1986, The emotions. Cambridge University Press, Cambridge, England.
Hart, J. 't, Collier, R. and Cohen, A.: 1990, A perceptual study of intonation. Cambridge University Press, Cambridge.
Hermes, D. J.: 1988, Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America 83, pp. 257-264.
Hermes, D. J.: 1990, ‘Vowel-onset detection.’ Journal of the Acoustical Society of America 87(2), pp. 866-873.
House, D.: 1990, Tonal perception in speech. Lund University Press, Lund.
Izard, C. E.: 1977, Human emotions. Plenum Press, New York.
Kitahara, Y. and Tohkura, Y.: 1992, Prosodic control to express emotions for man-machine interaction. IEICE Transactions on Fundamentals of Electronics, communications and computer sciences 75, pp. 155-163.
Ladd, D. R., Silverman, K. E. A., Tolkmitt, F., Bergman, G. and Scherer, K. R.: 1985, ‘Evidence for the independent function of intonation contour type, voice quality, and F0 range in signalling speaker affect.’ Journal of the Acoustical Society of America 78, pp. 435-444.
Laukkanen, A.-M., Vilkman, E., Alku, P. and Oksanen H.: 1997, On the perception of emotions in speech: the role of voice quality. Journal of Logopedics and Phoniatrics Vocology 22(4), pp. 157-168.
Leinonen, L., Hiltunen, T., Linnankoski, I. and Laakso, M.-L.: 1997, ‘Expression of emotional-motivational connotations with a one-word utterance.’ Journal of the Acoustical Society of America 102(3), pp. 1853-1863.
Lieberman, P. and Michaels, S. B.: 1962, Some aspects of fundamental frequency and envelope amplitude as related to emotional content of speech. Journal of the Acoustical Society of America 34, pp. 922-927.
Lisetti, C. L.: 1999, A user model of emotion-cognition. Proceedings of the workshop on attitude, personality, and emotions in user-adapted interaction, at the 7th International Conference on User Modeling (UM'99). Banff, Canada.
Mozziconacci, S. J. L.: 1998, Speech variability and emotion: Production and perception. Technical University Eindhoven, The Netherlands.
Mozziconacci, S. J. L., and Hermes, D. J.: 1999, Role of intonation patterns in conveying emotion in speech. Proceedings ICPhS 99. San Francisco, USA.
Murray, I. R.: 1989, Simulating emotion in synthetic speech. University of Dundee, Scotland, UK.
Murray, I. R. and Arnott, J. L.: 1993, Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93, pp. 1097-1108.
Pijper, J.-R. de: 1983, Modelling British English intonation: an analysis by resynthesis of British English intonation. Foris, Dordrecht, The Netherlands.
Plutchik, R.: 1980, Emotion: a psychoevolutionary synthesis. Harper & Row, New York.
Protopapas, A. and Lieberman, P.: 1997, Fundamental frequency of phonation and perceived emotional stress. Journal of the Acoustical Society of America 101(4), pp. 2267-2277.
Rijnsoever, P. van: 1988, A multilingual text-to-speech system. IPO Annual Progress Report 23, 34-39.
Rosis, F. de, and Grasso, F.: in press, Affective natural language generation. In: A. Paiva (ed.): Affect in interaction. Springer LNAI Series, in press.
Siegwart, H. and Scherer, K. R.: 1995, Acoustic concomitants of emotional expression in operatic singing: the case of Lucia in Ardi gli incensi. Journal of Voice 9(3), pp. 249-260.
Verhelst, W. and Borger, M.: 1991, Intra-speaker transplantation of speech characteristics: an application of waveform vocoding techniques and DTW Proceedings Eurospeech'91. Genova, Italy, 3, pp. 1319-1322.
Williams, C. E. and Stevens, K. N.: 1972, Emotions and speech: some acoustical factors. Journal of the Acoustical Society of America 52, pp. 1238-1250.
Zelle, H. W., Pijper, J.-R. de and Hart, J. 't: 1984, Semi-automatic synthesis of intonation for Dutch and British English. Proceedings Xth ICPhS. Utrecht, The Netherlands, IIB, pp. 247-251.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mozziconacci, S.J.L. Modeling Emotion and Attitude in Speech by Means of Perceptually Based Parameter Values. User Modeling and User-Adapted Interaction 11, 297–326 (2001). https://doi.org/10.1023/A:1011800417621
Issue Date:
DOI: https://doi.org/10.1023/A:1011800417621