Abstract
Recent studies on speech and emotion have revealed the potential application and the future prospect of the notable human-machine interaction. The progress in this research area heavily relies on our understanding of the mechanism involved in the emotional encoding and decoding processes in speech communication. This chapter presents general aims and specific issues of the book, and gives an overall literature review on the state-of-the-art emotional researches, including concepts, theoretical frameworks, and the face-to-face emotional communication studies from the aspects of encoding and decoding schemes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abelin, A. 2004. Cross-cultural multimodal interpretation of emotional expressions – an experimental study of Spanish and Swedish. Proceedings of speech prosody, Nara.
Abelin, A., and J. Allwood. 2000. Cross linguistic interpretation of emotional prosody. Proceedings of ISCA workshop on Speech and Emotion, Belfast, 2000.
Abhishek, J., and D.P. Marc. 2012. Categorical processing of negative emotions from speech prosody. Speech Communication 54: 1–10.
Akagi, M. 2009. Introduction of SCOPE project: Analysis of production and perception characteristics of non-linguistic information in speech and its application to inter-language communications. International symposium on biomechanical and physiological modeling and speech science. Japan: Kanazawa.
Alter, K., E. Rank, S.A. Kotz, U. Toepel, M. Besson, A. Schirmer, et al. 2003. Affective encoding in the speech signal and in event-related brain potentials. Speech Communication 40: 61–70.
Auberge, V., and M. Cathiard. 2003. Can we hear the prosody of smile? Speech Communication 40: 87–97.
Audibert, N., V. Aubergé, and A. Rilliard. 2005. The prosodic dimensions of emotion in speech: The relative weights of parameters. Proceedings of Interspeech 2005 – EUROSPEECH, Lisbon.
Audibert, N., D. Vincent, V. Aubergé, and O. Rosec. 2006. Expressive speech synthesis: Evaluation of a voice quality centered coder on the different acoustic dimensions. Proceedings of Speech Prosody, Dresden.
Audibert, N., V. Aubergé, and A. Rilliard. 2007. When is the emotional information? A gating experiment for gradient and contours cues. Saarbrücken: ICPhS XVI.
Averill, J.R. 1980. A constructivist view of emotion. In Emotion: Theory, research and experience, vol. 1, ed. R. Plutchik and H. Kellerman, 305–339. New York: Academic Press.
Bailly, G., and B. Holm. 2005. SFC: A trainable prosodic model. Speech Communication 46: 348–364.
Banse, R., and K.R. Scherer. 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3):614–636.
Bänziger, T., and K.R. Scherer. 2005. The role of intonation in emotional expressions. Speech Communication 46: 252–267.
Barkhuysen, P., E. Krahmer, and M. Swerts. 2007a. Cross-modal perception of emotional speech. Proceedings of ICPhS XVI, Saarbrücken.
Barkhuysen, P., E. Krahmer, and M. Swerts. 2007b. Incremental perception of acted and real emotional speech. Proceedings of ICPhS XVI, Saarbrücken.
Barra-Chicote, R., J. Yamagishi, S. King, J.M. Montero, and J. Macias-Guarasa. 2010. Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech. Speech Communication 52(5): 394–404.
Beaupré, M.G., and H. Ursula. 2005. Cross-cultural emotion recognition among Canadian ethnic groups. Journal of Cross-Cultural Psychology 36: 355.
Benus, S., A. Gravano, and J. Hirschberg. 2007. Prosody, emotions, and… ‘whatever’. Proceedings of the 8th Interspeech, Antwerp.
Brunswik, E. 1956. Historical and thematic relations of psychology to other sciences. Scientific Monthly 83: 151–161.
Busso, C., and S. Narayanan. 2007. Joint analysis of the emotional fingerprint in the face and speech: A single subject study. Chania: IEEE Workshop on MMSP.
Busso, C., and S. Narayanan. 2008. The expression and perception of emotions: Comparing assessments of self versus others. Proceedings of the 9th Interspeech. Brisbane.
Calder, A.J., A.W. Young, D.I. Perrett, N.L. Etcoff, and D. Rowland. 1996. Categorical perception of morphed facial expressions. Visual Cognition 3: 81–117.
Campbell, N. 2007. On the use of nonverbal speech sounds in human communication. International workshop on paralinguistic speech – between models and data, Saarbrücken.
Chuenwattanapranithi, S., Y. Xu, B. Thipakorn, and S. Maneewongvatana. 2008. Encoding emotions in speech with the size code — a perceptual investigation. Phonetica 65(4): 210–230.
Cornelius, R.R. 1996. The science of emotion: Research and tradition in the psychology of emotion. Upper Saddle River: Prentice-Hall.
Cowie, R., and R. Cornelius. 2003. Describing the emotional states that are expressed in speech. Speech Communication 40: 5–32.
Cowie, R., E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey, and M. Schröder. 2000. ‘FEELTRACE’: An instrument for recording perceived emotion in real time. Proceedings of ISCA workshop on speech and emotion: A conceptual framework for research, 19–24. Belfast: Textflow.
Cowie, R., E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, et al. 2001. Emotion recognition in human–computer interaction. IEEE Signal Process Magazine 18(1): 32–80.
Dang, J.W., A.J. Li, D. Erickson, A. Suemitsu, M. Akagi, K. Sakuraba, et al. 2010. Comparison of emotion perception among different cultures. Acoustics of Science and Technology 31(6): 394–402.
Darwin, C. 1998. The expression of the emotions in man and animals. London: John Murray (reprinted with introduction, afterword, and commentary by, ed. P. Ekman). New York: Oxford University Press. (Original work published 1872).
Ekman, P. 1984. Expression and the nature of emotion. In Approaches to emotion, ed. K.R. Scherer and P. Ekman, 319–344. Hillsdale: Erlbaum.
Ekman, P. 1992. An argument for basic emotions. Cognition and Emotion 6(3–4): 169–200.
Ekman, P. 2003. Emotions revealed. New York: Times Books.
Ekman, P., and W.V. Friesen. 1978. The Facial Action Coding System: A technique for the measurement of facial movement. Palo Alto: Consulting Psychologists Press.
Ekman, P., E.R. Sorenson, and W.V. Friesen. 1969. Pan-cultural elements in facial displays of emotion. Science 164: 86–88.
Erickson, D. 2005. Expressive speech: Production, perception and application to speech synthesis Gifu City Women’s college. Japan Acoustical Science and Technology 26: 4.
Erickson, D., O. Fujimura, and B. Pardo. 1998. Articulatory correlates of prosodic control: Emotion and emphasis. Language and Speech 41(3–4): 399–417.
Erickson, D., K. Yoshida, C. Menezes, A. Fujino, T. Mochida, and Y. Shibuya. 2006. Exploratory study of some acoustic and articulatory characteristics of ‘Sad’ speech. Phonetica 63: 1–25.
Erickson, D., C. Menezes, and K. Sakakibara. 2009. Are you laughing, smiling or crying? Japan: APACIPA.
Fagel, S. 2006. Emotional mcGurk effect. Proceedings of the 3rd speech prosody, Dresden.
Fersini, E., E. Messina, and F. Archetti. 2012. Emotional states in judicial courtrooms: An experimental investigation. Speech Communication 54: 11–22.
Forgas, J.P. 1995. Mood and judgment: The affect infusion model (AIM). Psychological Bulletin 117: 1–28.
Fujisaki, H. 1997a. Prosody, models, and spontaneous speech. In Computing prosody: Computational models for processing spontaneous speech, ed. Y. Sagisaka et al. Heidelberg: Springer.
Fujisaki, H., and K. Hirose. 1984. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan 5(4): 233–242.
Gentilucci, M., and L. Cattaneo. 2005. Automatic audiovisual integration in speech perception. Experimental Brain Research 167(1): 66–75.
Gobl, C., and A.N. Chasaide. 2003. The role of voice quality in communicating emotion, mood and attitude. Speech Communication 40: 189–212.
Gonzalvo, X., P. Taylor, C. Monzo, I. Iriondo, and J.C. Socoró. 2010. High quality emotional HMM-based synthesis in Spanish. Advances in Nonlinear Speech Processing, Lecture Notes in Computer Science 5933: 26–34.
Grandjean, D., T. Bänziger, and K.R. Scherer. 2006. Intonation as an interface between language and affect. Progress in Brain Research 156: 47–235.
Greenberg, Y., H. Kato, M. Tsuzaki, and Y. Sagisaka. 2010. Analysis of impression-prosody mapping in communicative speech consisting of multiple lexicons with different impressions. Proceedings of O-COCOSDA, Nepal.
Grimm, M., K. Kroschel, E. Mower, and S. Narayanan. 2007. Primitives-based evaluation and estimation of emotions in speech. Speech Communication 49: 787–800.
Guerrero, L.K., P.A. Andersen, and M.R. Trost. 1998. Communication and emotion: Basic concepts and approaches. In Handbook of communication and emotion: Research, theory, applications, and contexts, ed. P.A. Andersen, 3–27. New York: Academic Press.
Hess, U., A. Kappas, and K.R. Scherer. 1988. Multichannel communication of emotion: Synthetic signal production. In Facets of emotion: Recent research, ed. K.R. Scherer, 161–182. Hillsdale: Lawrence Erlbaum Associates.
Hirst, D.J. 2007. A Praat plugin for Momel and INTSINT with improved algorithms for modelling and coding intonation. Proceedings of ICPHS XVI, Saarbrucken, 1233–1236.
Huang, C.F., and M. Akagi. 2008. A three-layered model for expressive speech perception. Speech Communication 50: 810–828.
Huang, C.F., and M. Akagi. 2007a. A rule-based speech morphing for verifying an expressive speech perception model. Proceedings of the 8th Interspeech, Antwerp, 2661–2664.
Huang, C.F., and M. Akagi. 2007b. The building and verification of a three-layered model for expressive speech perception. Proceedings of JCA, Sendai, Japan.
Iida, A., N. Campbell, F. Higuchi, and M. Yasumura. 2003. A corpus-based speech synthesis system with emotion. Speech Communication 40: 161–187.
Izard, C.E. 1977. Human emotions. New York: Plenum Press.
Izard, C.E. 1992. Basic emotions, relations among emotions, and emotion– cognition relations. Psychological Review 99: 561–565.
Jack, R.E., O.G. Garrod, H. Yu, R. Caldara, and P.G. Schyns. 2012. Facial expressions of emotion are not culturally universal. PNAS 109(19): 7241–7244. doi:10.1073/pnas.1200155109.
James, W. 1884. What is an emotion? Mind 9(34): 188–205.
Jonathan, H.T., and E.S. Jan. 2005. The sociology of emotions. New York: Cambridge University Press.
Kochanski, G., and C. Shih. 2003. Prosody modeling with soft templates. Speech Communication 39: 311–352.
Krahmer, E., & M. Swerts. 2008. On the role of acting skills for the collection of simulated emotional speech. Proceedings of the 9th Interspeech, Brighton.
Ladd, D.R. 1996. Intonational phonology. Cambridge: Cambridge University Press.
Laver, J. 1975. Individual features in voice quality. PhD dissertation, University of Edinburgh.
Laver, J. 1980. The phonetic description of voice quality. Cambridge: Cambridge University Press.
Lee, C.M., and S. Narayanan. 2005. Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13(2): 293–303.
Lee, S., S. Yildirim, A. Kazemzadeh, and S. Narayanan. 2005. An articulatory study of emotional speech production. Proceedings of the 9th European conference on speech communication and technology, Portugal, 497–500.
Lee, S., E. Bresch, J. Adams, A. Kazemzadeh, and S. Narayanan. 2006. A study of emotional speech articulation using a fast magnetic resonance imaging technique. ICSLP, 2234–2237.
Lieberman, P., and S.B. Michaels. 1962. Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. Journal of the Acoustical Society of America 34(7): 922–927.
Martin, P. 2014. Emotions and prosodic structure: Who is in charge? In Linguistic approaches to emotions in context, ed. F. Baider and G. Cislaru, 215–229. Amsterdam: John Benjamins.
Massaro, D.W. 2000. Multimodal emotion perception: Analogous to speech processes. Proceedings of the ISCA workshop on speech and emotion, Newcastle, 114–121.
Matsumoto, D., B. Franklin, J. Choi, D. Rogers, and H. Tatani. 2002. Cultural influences on the expression and perception of emotion. In Handbook of international and intercultural communication, ed. W.B. Gudykunst and B. Moody. Newbury Park: Sage Publications.
McGurk, H., and J. MacDonald. 1976. Hearing lips and seeing voices. Nature 264(5588): 8–746.
Mower, E., S. Lee, M.J. Mataric, and S. Narayanan. 2008. Human perception of synthetic character emotions in the presence of conflicting and congruent vocal and facial expressions. ICASSP, 2201–2204.
Mozziconnaci, S. 1998. Speech variability and emotion: Production and perception. PhD thesis, Eindhoven University.
Murray, I., and J. Arnott. 1993. Toward a simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America 93(2): 1097–1108.
Nath, A.R., and M.S. Beauchamp. 2011. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59(1): 781–787.
Oatley, K. 1989. The importance of being emotional. New Scienist 123(Pt 1678): 33–36.
Oatley, K. 2004. Emotions: A brief history. Oxford: Blackwell Publishing Ltd.
Ortony, A., and T.J. Turner. 1990. What’s basic about basic emotions? Psychological Review 97: 315–331.
Papanicolaou, A.C. 1989. Emotion: A reconsideration of the somatic theory. New York: Gordon and Breach Science Publishers.
Patel, S., K.R. Scherer, E. Björkner, and J. Sundberg. 2011. Mapping emotions into acoustic space: The role of voice production. Biological Psychology 87: 93–98.
Pavlenko, A. 2005. Emotions and multilingualism. Cambridge: Cambridge University Press.
Peter, D., and P. Elliot. 1993. The speech chain: The physics and biology of spoken language. New York: W.H. Freeman and Company.
Plutchik, R. 1980. A general psychoevolutionary theory of emotion. In Emotion: Theory, research, and experience, Theories of emotion, vol. 1, ed. R. Plutchik and H. Kellerman, 3–33. New York: Academic.
Prom-on, S., Y. Xu, and B. Thipakorn. 2009. Modeling tone and intonation in Mandarin and English as a process of target approximation. Journal of the Acoustical Society of America 125(1): 405–424.
Quarteroni, S., A.V. Ivanov, and G. Riccardi. 2011. Simultaneous dialog act segmentation and classification from human-human spoken conversations. Proceedings of international conference on acoustics, speech, and signal processing – ICASSP, 5596–5599, Prague, Czech.
Rilliard, A., D. Erickson, J.A. De Moraes, and T. Shochi. 2014. Cross-cultural perception of some Japanese politeness and impoliteness expressions. In Linguistic approaches to emotions in context, ed. F. Baider and G. Cislaru, 251–276. Amsterdam: John Benjamins.
Roseman, I.J., and C.A. Smith. 2001. Appraisal theory: Overview, assumptions, varieties, controversies. In Appraisal processes in emotion: Theory, methods, research, ed. K.R. Scherer, A. Schorr, and T. Johnstone. New York: Oxford University Press.
Russell, J.A. 2003. Core affect and the psychological construction of emotion. Psychological Review 110: 145–172.
Sagisaka, Y. 2012. Modeling prosody variations for communicative speech and the second language towards trans-disciplinary scientific understanding. Keynote speech of speech prosody, Shanghai.
Sagisaka, Y., and Y. Tohkura. 1984. Phoneme duration control for speech synthesis by rule. Transactions of the Institute of Electronics, Information and Communication Engineers of Japan J67-A(7): 629–636.
Sagisaka, Y., M. Tsuzaki, and H. Kato. 2005a. Prosody generation for communicative speech synthesis. SNLP 1: 23–28.
Sagisaka, Y., T. Yamashita, and Y. Kokenawa. 2005b. Generation and perception of F0 markedness for communicative speech synthesis. Speech Communication 46(3–4): 376–384.
Samovar, L.A., R.E. Porter, and E.R. McDaniel. 1995. Communication between cultures. Roland: Wordsworth Publishing Company Barthes.
Sander, D., D. Grandjean, and K.R. Scherer. 2005. A systems approach to appraisal mechanisms in emotion. Neural Networks 18: 317–352.
Sauter, D.A., F. Eisne, P. Ekman, and S.K. Scott. 2010. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. PNAS 107(6): 2408–2412. doi:10.1073/pnas.0908239106.
Scherer, K.R. 1978. Personality inference from voice quality: The loud voice of extroversion. European Journal of Social Psychology 8: 467–487.
Scherer, K.R. 1982. Methods of research on vocal communication: Paradigms and parameters. In Handbook of methods in nonverbal behavior research, ed. K.R. Scherer and P. Ekman, 136–198. Cambridge: Cambridge University Press.
Scherer, K.R. 1984a. Emotion as a multicomponent process: A model and some cross cultural data. Review of Personality and Social Psychology 5: 37–63.
Scherer, K.R. 1984b. On the nature and function of emotion: A component process approach. In Approaches to emotion, ed. K.R. Scherer and P. Ekman, 293–317. Hillsdale: Erlbaum.
Scherer, K.R. 1999. Appraisal theories. In Handbook of cognition and emotion, ed. T. Dalgleish and M. Power. Chichester: Wiley.
Scherer, K.R. 2000. A cross-cultural investigation of emotion inferences from voice and speech: Implications for speech technology. Proceedings of the 6th international conference on spoken language processing, Beijing.
Scherer, K.R. 2001. Appraisal considered as a process of multilevel sequential checking. In Appraisal processes in emotion: Theory, methods, research, ed. K.R. Scherer, A. Schorr, and T. Johnstone, 92–120. New York: Oxford University Press.
Scherer, K.R. 2003. Vocal communication of emotion: A review of research paradigms. Speech Communication 40: 227–256.
Scherer, K.R. 2009. Emotions are emergent processes: They require a dynamic computational architecture. Philosophical Transactions of the Royal Society B: Biological Science 364: 3459–3474.
Scherer, K.R., and H. Ellgring. 2007. Are facial expressions of emotion produced by categorical affect programs or dynamically driven by appraisal? Emotion 7(1): 113–130.
Scherer, U., H. Helfrich, and K.R. Scherer. 1980. Paralinguistic behaviour: Internal push or external pull? In Language: Social psychological perspectives, ed. H. Giles, P. Robinson, and P. Smith, 279–282. Oxford: Pergamon.
Schlosberg, H. 1941. A scale for the judgement of facial expressions. Journal of Experimental Psychology 29: 497–510.
Schlosberg, H. 1954. Three dimensions of emotion. Psychological Review 61(2): 81–88.
Schröder, M. 2004. Speech and emotion research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD dissertation, Universität des Saarlandes.
Schuller, B.W., and A. Batliner. 2014. Computational paralinguistics emotion, affect and personality in speech and language processing. West Sussex, UK: Wiley.
Schuller, B.W., S. Steidl, and A. Batliner. 2009. The interspeech 2009 emotion challenge. Proceedings of the 10th Interspeech, Brighton.
Schuller, B., A. Batliner, and S. Steidl. 2011. Introduction to the special issue on sensing emotion and affect – Facing realism in speech processing. Speech Communication 53: 1059–1061.
Schulze, R., and R.D. Roberts. 2005. Emotional intelligence – An international handbook. Cambridge, MA: Hogrefe and Huber Publishers.
Smith, C.A., and P.C. Ellsworth. 1985. Patterns of cognitive appraisal in emotion. Journal of Personality and Social Psychology 48: 813–838.
Swerts, M., and E. Krahmer. 2008. Gender-related differences in the production and perception of emotion. Proceedings of the 9th Interspeech, Brisbane.
Tao, J.H., Y.G. Kang, and A.J. Li. 2006. Prosody conversion from ‘Neutral’ speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing 14(4): 1145–1154.
Thoits, P.A. 1999. Introduction to the special issue: Sociological contributions to the understanding of emotion. Motivation and Emotion 23(2): 67–71.
Tomkins, S.S. 1962. Affect, imagery, consciousness, The positive affects, vol. 1. New York: Springer.
Tomkins, S.S. 1963. Affect, imagery, consciousness, The negative affects, vol. 2. New York: Springer.
Truong, K.P., M.A. Neerincx, and D. Leeuwen. 2008. Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. Proceedings of the 9th Interspeech, Brisbane.
Turner, J.H. 2007. Human emotions: A sociological theory. London: Routledge.
Venditti, J.J., K. Maeda, and J.P.H. van Santen. 1998. Modeling Japanese boundary pitch movements for speech synthesis. Proceedings of the 3rd ESCA/COCOSDA workshop (ETRW) on speech synthesis, Jenolan Caves, Australia.
Wang, K., R. Hoosain, T.M.C. Lee, Y. Meng, J. Fu, and R.M. Yang. 2006a. Perception of six basic emotional facial expressions by the Chinese. Journal of Cross-Cultural Psychology 37: 623.
Wang, L., A.J. Li, and Q. Fang. 2006b. A method for decomposing and modeling jitter in expressive speech in Chinese. Proceedings of the 3rd speech prosody. Dresden: TUDpress.
Wilce, J.M. 2009. Language and emotion. New York: Cambridge University Press.
William, C.E., and K.N. Stevens. 1972. Emotions and speech: Some acoustical correlates. Journal of the Acoustical Society of America 52(4): 1238–1250.
Wulff, H. 2007. The emotions: A cultural reader. London: Bloomsbury.
Xu, Y. 2005. Speech melody as articulatorily implemented communicative functions. Speech Communication 46: 220–251.
Xu, Y. 2011. Speech prosody: A methodological review. Journal of Speech Sciences 1: 85–115.
Yanushevskaya, I, A. N. Chasaide, and C. Gobl. 2008. Cross-language study of vocal correlates of affective states. In Proceedings of the 9th Interspeech. Brisbane, 330–333.
Yanushevskaya, I., C. Gobl, and A. N. Chasaide. 2006. Mapping voice to affect: Japanese listeners. Proceedings of the 3rd international conference on speech prosody, Dresden.
Yin, Z.G., A.J. Li, and Z.Y. Xiong. 2008. Study on “ng, a” type of discourse markers in standard Chinese. Proceedings of the 9th Interspeech. Brisbane, 1683–1686.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, A. (2015). Introduction. In: Encoding and Decoding of Emotional Speech. Prosody, Phonology and Phonetics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47691-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-47691-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47690-1
Online ISBN: 978-3-662-47691-8
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)