Skip to main content

Expressive/Affective Speech Synthesis

  • Chapter
Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

Abstract

The focus of speech synthesis research has recently shifted from read speech towards more conversational styles of speech, in order to reproduce those situations where a speech synthesis is used as part of a dialogue. When a speech synthesizer is used to represent the voice of a cognisant agent, whether human or simulated, there is need for more than just the intelligible portrayal of linguistic information; there is also a need for the expression of affect. This chapter reviews some recent advances in the synthesis of expressive speech and shows how the technology can be adapted to include the display of affect in conversational speech.

The chapter discusses how the presence of an interactive and active partner in a conversation can greatly affect the styles of human speech and presents a model of the cognitive processes that result in these differences, which concern not just the acoustic prosody and phonation quality of an utterance, but also its lexical selection and phrasing. It proposes a measure of the ratio of paralinguistic to linguistic content in an utterance as a means of quantifying the expressivity of a speaking style, and closes with a description of a phrase-level concatenative speech synthesis system that is currently in development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

SSML:

speech synthesis markup language

ToBI:

tone and break indices

References

  1. G. Bailly, C. Benoit, T.R. Sawallis (Eds.): Talking Machines: Theories, Models, and Designs, Reports Papers from the first ISCA Speech Synthesis workshop in Autrans (North-Holland, Amsterdam 1992)

    Google Scholar 

  2. N. Campbell: Getting to the heart of the matter; speech as expression of affect rather than just text or language, Language Res. Eval. 39(1), 109-118 (2005)

    Article  Google Scholar 

  3. SSML, The W3 Speech Synthesis Markup Language: www.w3.org/TR/speech-synthesis/ (See also the papers from the 2005 SSML Meeting at http://www.w3.org/2005/08/SSML/Papers/)

    Google Scholar 

  4. N. Campbell, D. Erickson: What do people hear? A study of the perception of non-verbal affective information in conversational speech, J. Phonetic Soc. Jpn. 7(4), 9-28 (2004)

    Google Scholar 

  5. I.G. Mattingly: Experimental methods for speech synthesis by rules, IEEE Trans. AU 16, 198-202 (1968)

    Google Scholar 

  6. J. Allen: Linguistic-based algorithms offer practical text-to-speech systems, Speech Technol. 1(1), 12-16 (1981)

    Google Scholar 

  7. K. Church: Stress assignment in letter to sound rules for speech synthesis. In: ACL Proc. 23rd Annual Meeting, ed. by University of Chicago (Association for Computational Linguistics, Chicago 1985) pp. 246-253

    Google Scholar 

  8. G. Akers, M. Lennig: Intonation in text-to-speech synthesis: Evaluation of algorithms, J. Acoust. Soc. Am. 77, 2157-2165 (1985)

    Article  Google Scholar 

  9. N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc Language Resources and Evaluation Conference LREC-02) (Las Palmas, Spain 2002) pp. 2029-2032

    Google Scholar 

  10. N. Campbell: Speech and expression; the value of a longitudinal corpus. In: Proc. Language Resources and Evaluation Conference (2004) pp. 183-186

    Google Scholar 

  11. R. Cowie, E. Douglas-Cowie, C. Cox: Beyond emotion archetypes; Databases for emotion modelling using neural networks, Neural Networks 18, 371-388 (2005)

    Article  Google Scholar 

  12. K. Ishimura, N. Campbell: Telephone dialogue data base of JST/CREST expressive speech processing project, Proc. Ann. Conf. JSAI 16, 147-148 (2002)

    Google Scholar 

  13. D. McNeill, F. Quek, K.-E. McCullough, S. Duncan, N. Furuyama, R. Bryll, X.-F. Ma, R. Ansari: Catchments, prosody, and discourse, Gesture 1, 9-33 (2001)

    Article  Google Scholar 

  14. R. Carlson, B. Granstrom: A text-to-speech system based entirely on rules, Proc. IEEE-ICASSP 76, 686-688 (1976)

    Google Scholar 

  15. J. Allen, M.S. Hunnicutt, D.H. Klatt: From Text to Speech, The MITalk System (Cambridge Univ. Press, Cambridge 1987)

    Google Scholar 

  16. K. Hirose, K. Sato, Y. Asano, N. Minematsu: Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis, Speech Commun. 46(3-4), 385-404 (2005-2007)

    Article  Google Scholar 

  17. A. Sakurai, K. Hirose, N. Minematsu: Data-driven generation of F0 contours using a superpositional model, Speech Commun. 40(4), 535-549 (2003)

    Article  Google Scholar 

  18. K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, J. Hirschberg: ToBI: A standard for labelling English prosody, Proc. ICSLP 92 2, 867-870 (1992)

    Google Scholar 

  19. The official ToBI website: http://www.ling.ohio-state.edu/ tobi/

    Google Scholar 

  20. Webpage of the Signal Processing Laboratory of the University of the Basque Country (UPV/EHU) in Bilbao: http://bips.bi.ehu.es/aholab/TTS/Expressive-Speech-Synthesis.html

    Google Scholar 

  21. P. Ekman: Basic emotions. In: Handbook of Cognition and Emotion, ed. by T. Dalgleish, M. Power (Wiley, New York 1999) pp. 301-320

    Google Scholar 

  22. K. Sjolander, J. Gustafson: Voice creation for conversational fairy-tale characters. In: Proc. SSW SYnthesis Workshop (2005)

    Google Scholar 

  23. A. Silva, G. Raimundo, C. de Melo, A. Paiva: To tell or not to tell... Building an interactive virtual storyteller. In: Proc. AISB Symp. Language Speech and Gesture for Expressive Characters (2004)

    Google Scholar 

  24. M. Theune, K. Meijs, D. Heylen, R. Ordelman: Generating expressive speech for storytelling applications, IEEE Trans. Audio Speech Language Process. 14(4), 1137-1144 (2006)

    Article  Google Scholar 

  25. N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc. Language Resources and Evaluation Conference (Las Palmas, Spain 2002) pp. 2029-2032

    Google Scholar 

  26. J.E. Cahn: Generating expression in synthesized speech, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1989), http://alumni.media.mit.edu/cahn/emot-speech.html

    Google Scholar 

  27. J.E. Cahn: From sad to glad: Emotional computer voices. In: Proc. Speech Tech ʼ88 Voice Input/Output Applications Conference and Exhibition (New York 1988) pp. 35-37

    Google Scholar 

  28. J.E. Cahn: The generation of affect in synthesized speech, J. Am. Voice I/O Soc. 8, 1-19 (1990)

    Google Scholar 

  29. J.E. Cahn: Generation of affect in synthesized speech. In: Proc. 1989 Conf. American Voice I/O Society (Newport Beach, California 1989) pp. 251-256

    Google Scholar 

  30. M. Bulut, S.S. Narayanan, A.K. Syrdal: Expressive speech synthesis using a concatenative synthesizer, Proc. ICSLP 2002, 1265-1268 (2002)

    Google Scholar 

  31. MITʼs Kismet (the expressiveness and richness of the robotʼs vocal modality and how it supports social interaction): http://www.ai.mit.edu/projects/sociable/expressive-speech.html

    Google Scholar 

  32. MIT Kismet and Affective Intent in Speech: http://www.ai.mit.edu/projects/sociable/affective-intent.html

    Google Scholar 

  33. A. Iida, N. Campbell, M. Yasumura: Design and evaluation of synthesised speech with emotion, J. Inform. Process. Soc. Jpn. 40 (1998)

    Google Scholar 

  34. A. Iida, N. Higuchi, N. Campbell, M. Yasumura: Corpus-based speech synthesis system with emotion, Speech Commun. 40(1-2), 161-187 (2002)

    Article  MATH  Google Scholar 

  35. E. Eide, A. Aaron, R. Bakis, W. Hamza, M.A. Picheny, J.F. Pitrelli: A corpus-based approach to AHEM expressive speech synthesis. In: Proc. 5th ISCA Speech Synthesis Workshop (Pittsburgh, USA 2004)

    Google Scholar 

  36. J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandes, W. Hamza, M.A. Picheny: The IBM expressive text-to-speech synthesis system for American English, IEEE Trans. Audio Speech Language Process. 14(4), 1099-1108 (2006)

    Article  Google Scholar 

  37. SVOX: http://www.svox.com/Innovation.aspx

    Google Scholar 

  38. The HUMAINE Portal - Research on Emotions and Human-Machine Interaction: http://emotion-research.net/

    Google Scholar 

  39. The Expressive Speech Processing project web pages: http://feast.atr.jp/

    Google Scholar 

  40. N. Campbell: Specifying affect and emotion for expressive speech synthesis. In: Computational Linguistics and Intelligent Text Processing, ed. by A. Gelbukh (Springer, Berlin, Heidelberg 2004)

    Google Scholar 

  41. N. Campbell: Conversational speech synthesis and the need for some laughter, IEEE Trans. Audio Speech Language Process. 14(4), 1171-1179 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nick Campbell Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Campbell, N. (2008). Expressive/Affective Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics