Expressive/Affective Speech Synthesis

Campbell, Nick

doi:10.1007/978-3-540-49127-9_25

Nick Campbell Ph.D⁴

Part of the book series: Springer Handbooks ((SHB))

8014 Accesses
2 Citations

Abstract

The focus of speech synthesis research has recently shifted from read speech towards more conversational styles of speech, in order to reproduce those situations where a speech synthesis is used as part of a dialogue. When a speech synthesizer is used to represent the voice of a cognisant agent, whether human or simulated, there is need for more than just the intelligible portrayal of linguistic information; there is also a need for the expression of affect. This chapter reviews some recent advances in the synthesis of expressive speech and shows how the technology can be adapted to include the display of affect in conversational speech.

The chapter discusses how the presence of an interactive and active partner in a conversation can greatly affect the styles of human speech and presents a model of the cognitive processes that result in these differences, which concern not just the acoustic prosody and phonation quality of an utterance, but also its lexical selection and phrasing. It proposes a measure of the ratio of paralinguistic to linguistic content in an utterance as a means of quantifying the expressivity of a speaking style, and closes with a description of a phrase-level concatenative speech synthesis system that is currently in development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

SSML:: speech synthesis markup language
ToBI:: tone and break indices

References

G. Bailly, C. Benoit, T.R. Sawallis (Eds.): Talking Machines: Theories, Models, and Designs, Reports Papers from the first ISCA Speech Synthesis workshop in Autrans (North-Holland, Amsterdam 1992)
Google Scholar
N. Campbell: Getting to the heart of the matter; speech as expression of affect rather than just text or language, Language Res. Eval. 39(1), 109-118 (2005)
Article Google Scholar
SSML, The W3 Speech Synthesis Markup Language: www.w3.org/TR/speech-synthesis/ (See also the papers from the 2005 SSML Meeting at http://www.w3.org/2005/08/SSML/Papers/)
Google Scholar
N. Campbell, D. Erickson: What do people hear? A study of the perception of non-verbal affective information in conversational speech, J. Phonetic Soc. Jpn. 7(4), 9-28 (2004)
Google Scholar
I.G. Mattingly: Experimental methods for speech synthesis by rules, IEEE Trans. AU 16, 198-202 (1968)
Google Scholar
J. Allen: Linguistic-based algorithms offer practical text-to-speech systems, Speech Technol. 1(1), 12-16 (1981)
Google Scholar
K. Church: Stress assignment in letter to sound rules for speech synthesis. In: ACL Proc. 23rd Annual Meeting, ed. by University of Chicago (Association for Computational Linguistics, Chicago 1985) pp. 246-253
Google Scholar
G. Akers, M. Lennig: Intonation in text-to-speech synthesis: Evaluation of algorithms, J. Acoust. Soc. Am. 77, 2157-2165 (1985)
Article Google Scholar
N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc Language Resources and Evaluation Conference LREC-02) (Las Palmas, Spain 2002) pp. 2029-2032
Google Scholar
N. Campbell: Speech and expression; the value of a longitudinal corpus. In: Proc. Language Resources and Evaluation Conference (2004) pp. 183-186
Google Scholar
R. Cowie, E. Douglas-Cowie, C. Cox: Beyond emotion archetypes; Databases for emotion modelling using neural networks, Neural Networks 18, 371-388 (2005)
Article Google Scholar
K. Ishimura, N. Campbell: Telephone dialogue data base of JST/CREST expressive speech processing project, Proc. Ann. Conf. JSAI 16, 147-148 (2002)
Google Scholar
D. McNeill, F. Quek, K.-E. McCullough, S. Duncan, N. Furuyama, R. Bryll, X.-F. Ma, R. Ansari: Catchments, prosody, and discourse, Gesture 1, 9-33 (2001)
Article Google Scholar
R. Carlson, B. Granstrom: A text-to-speech system based entirely on rules, Proc. IEEE-ICASSP 76, 686-688 (1976)
Google Scholar
J. Allen, M.S. Hunnicutt, D.H. Klatt: From Text to Speech, The MITalk System (Cambridge Univ. Press, Cambridge 1987)
Google Scholar
K. Hirose, K. Sato, Y. Asano, N. Minematsu: Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis, Speech Commun. 46(3-4), 385-404 (2005-2007)
Article Google Scholar
A. Sakurai, K. Hirose, N. Minematsu: Data-driven generation of F0 contours using a superpositional model, Speech Commun. 40(4), 535-549 (2003)
Article Google Scholar
K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, J. Hirschberg: ToBI: A standard for labelling English prosody, Proc. ICSLP 92 2, 867-870 (1992)
Google Scholar
The official ToBI website: http://www.ling.ohio-state.edu/ tobi/
Google Scholar
Webpage of the Signal Processing Laboratory of the University of the Basque Country (UPV/EHU) in Bilbao: http://bips.bi.ehu.es/aholab/TTS/Expressive-Speech-Synthesis.html
Google Scholar
P. Ekman: Basic emotions. In: Handbook of Cognition and Emotion, ed. by T. Dalgleish, M. Power (Wiley, New York 1999) pp. 301-320
Google Scholar
K. Sjolander, J. Gustafson: Voice creation for conversational fairy-tale characters. In: Proc. SSW SYnthesis Workshop (2005)
Google Scholar
A. Silva, G. Raimundo, C. de Melo, A. Paiva: To tell or not to tell... Building an interactive virtual storyteller. In: Proc. AISB Symp. Language Speech and Gesture for Expressive Characters (2004)
Google Scholar
M. Theune, K. Meijs, D. Heylen, R. Ordelman: Generating expressive speech for storytelling applications, IEEE Trans. Audio Speech Language Process. 14(4), 1137-1144 (2006)
Article Google Scholar
N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc. Language Resources and Evaluation Conference (Las Palmas, Spain 2002) pp. 2029-2032
Google Scholar
J.E. Cahn: Generating expression in synthesized speech, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1989), http://alumni.media.mit.edu/cahn/emot-speech.html
Google Scholar
J.E. Cahn: From sad to glad: Emotional computer voices. In: Proc. Speech Tech ʼ88 Voice Input/Output Applications Conference and Exhibition (New York 1988) pp. 35-37
Google Scholar
J.E. Cahn: The generation of affect in synthesized speech, J. Am. Voice I/O Soc. 8, 1-19 (1990)
Google Scholar
J.E. Cahn: Generation of affect in synthesized speech. In: Proc. 1989 Conf. American Voice I/O Society (Newport Beach, California 1989) pp. 251-256
Google Scholar
M. Bulut, S.S. Narayanan, A.K. Syrdal: Expressive speech synthesis using a concatenative synthesizer, Proc. ICSLP 2002, 1265-1268 (2002)
Google Scholar
MITʼs Kismet (the expressiveness and richness of the robotʼs vocal modality and how it supports social interaction): http://www.ai.mit.edu/projects/sociable/expressive-speech.html
Google Scholar
MIT Kismet and Affective Intent in Speech: http://www.ai.mit.edu/projects/sociable/affective-intent.html
Google Scholar
A. Iida, N. Campbell, M. Yasumura: Design and evaluation of synthesised speech with emotion, J. Inform. Process. Soc. Jpn. 40 (1998)
Google Scholar
A. Iida, N. Higuchi, N. Campbell, M. Yasumura: Corpus-based speech synthesis system with emotion, Speech Commun. 40(1-2), 161-187 (2002)
Article MATH Google Scholar
E. Eide, A. Aaron, R. Bakis, W. Hamza, M.A. Picheny, J.F. Pitrelli: A corpus-based approach to AHEM expressive speech synthesis. In: Proc. 5th ISCA Speech Synthesis Workshop (Pittsburgh, USA 2004)
Google Scholar
J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandes, W. Hamza, M.A. Picheny: The IBM expressive text-to-speech synthesis system for American English, IEEE Trans. Audio Speech Language Process. 14(4), 1099-1108 (2006)
Article Google Scholar
SVOX: http://www.svox.com/Innovation.aspx
Google Scholar
The HUMAINE Portal - Research on Emotions and Human-Machine Interaction: http://emotion-research.net/
Google Scholar
The Expressive Speech Processing project web pages: http://feast.atr.jp/
Google Scholar
N. Campbell: Specifying affect and emotion for expressive speech synthesis. In: Computational Linguistics and Intelligent Text Processing, ed. by A. Gelbukh (Springer, Berlin, Heidelberg 2004)
Google Scholar
N. Campbell: Conversational speech synthesis and the need for some laughter, IEEE Trans. Audio Speech Language Process. 14(4), 1171-1179 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Acoustics & Speech Research Project, Spoken Language Communication Group, Knowledge Creating Communication Research Centre, 2-2-2 Hikaridai, 619-0288, Keihanna Science City, Japan
Nick Campbell Ph.D

Authors

Nick Campbell Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nick Campbell Ph.D .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Campbell, N. (2008). Expressive/Affective Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics