Abstract
The focus of speech synthesis research has recently shifted from read speech towards more conversational styles of speech, in order to reproduce those situations where a speech synthesis is used as part of a dialogue. When a speech synthesizer is used to represent the voice of a cognisant agent, whether human or simulated, there is need for more than just the intelligible portrayal of linguistic information; there is also a need for the expression of affect. This chapter reviews some recent advances in the synthesis of expressive speech and shows how the technology can be adapted to include the display of affect in conversational speech.
The chapter discusses how the presence of an interactive and active partner in a conversation can greatly affect the styles of human speech and presents a model of the cognitive processes that result in these differences, which concern not just the acoustic prosody and phonation quality of an utterance, but also its lexical selection and phrasing. It proposes a measure of the ratio of paralinguistic to linguistic content in an utterance as a means of quantifying the expressivity of a speaking style, and closes with a description of a phrase-level concatenative speech synthesis system that is currently in development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- SSML:
-
speech synthesis markup language
- ToBI:
-
tone and break indices
References
G. Bailly, C. Benoit, T.R. Sawallis (Eds.): Talking Machines: Theories, Models, and Designs, Reports Papers from the first ISCA Speech Synthesis workshop in Autrans (North-Holland, Amsterdam 1992)
N. Campbell: Getting to the heart of the matter; speech as expression of affect rather than just text or language, Language Res. Eval. 39(1), 109-118 (2005)
SSML, The W3 Speech Synthesis Markup Language: www.w3.org/TR/speech-synthesis/ (See also the papers from the 2005 SSML Meeting at http://www.w3.org/2005/08/SSML/Papers/)
N. Campbell, D. Erickson: What do people hear? A study of the perception of non-verbal affective information in conversational speech, J. Phonetic Soc. Jpn. 7(4), 9-28 (2004)
I.G. Mattingly: Experimental methods for speech synthesis by rules, IEEE Trans. AU 16, 198-202 (1968)
J. Allen: Linguistic-based algorithms offer practical text-to-speech systems, Speech Technol. 1(1), 12-16 (1981)
K. Church: Stress assignment in letter to sound rules for speech synthesis. In: ACL Proc. 23rd Annual Meeting, ed. by University of Chicago (Association for Computational Linguistics, Chicago 1985) pp. 246-253
G. Akers, M. Lennig: Intonation in text-to-speech synthesis: Evaluation of algorithms, J. Acoust. Soc. Am. 77, 2157-2165 (1985)
N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc Language Resources and Evaluation Conference LREC-02) (Las Palmas, Spain 2002) pp. 2029-2032
N. Campbell: Speech and expression; the value of a longitudinal corpus. In: Proc. Language Resources and Evaluation Conference (2004) pp. 183-186
R. Cowie, E. Douglas-Cowie, C. Cox: Beyond emotion archetypes; Databases for emotion modelling using neural networks, Neural Networks 18, 371-388 (2005)
K. Ishimura, N. Campbell: Telephone dialogue data base of JST/CREST expressive speech processing project, Proc. Ann. Conf. JSAI 16, 147-148 (2002)
D. McNeill, F. Quek, K.-E. McCullough, S. Duncan, N. Furuyama, R. Bryll, X.-F. Ma, R. Ansari: Catchments, prosody, and discourse, Gesture 1, 9-33 (2001)
R. Carlson, B. Granstrom: A text-to-speech system based entirely on rules, Proc. IEEE-ICASSP 76, 686-688 (1976)
J. Allen, M.S. Hunnicutt, D.H. Klatt: From Text to Speech, The MITalk System (Cambridge Univ. Press, Cambridge 1987)
K. Hirose, K. Sato, Y. Asano, N. Minematsu: Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis, Speech Commun. 46(3-4), 385-404 (2005-2007)
A. Sakurai, K. Hirose, N. Minematsu: Data-driven generation of F0 contours using a superpositional model, Speech Commun. 40(4), 535-549 (2003)
K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, J. Hirschberg: ToBI: A standard for labelling English prosody, Proc. ICSLP 92 2, 867-870 (1992)
The official ToBI website: http://www.ling.ohio-state.edu/ tobi/
Webpage of the Signal Processing Laboratory of the University of the Basque Country (UPV/EHU) in Bilbao: http://bips.bi.ehu.es/aholab/TTS/Expressive-Speech-Synthesis.html
P. Ekman: Basic emotions. In: Handbook of Cognition and Emotion, ed. by T. Dalgleish, M. Power (Wiley, New York 1999) pp. 301-320
K. Sjolander, J. Gustafson: Voice creation for conversational fairy-tale characters. In: Proc. SSW SYnthesis Workshop (2005)
A. Silva, G. Raimundo, C. de Melo, A. Paiva: To tell or not to tell... Building an interactive virtual storyteller. In: Proc. AISB Symp. Language Speech and Gesture for Expressive Characters (2004)
M. Theune, K. Meijs, D. Heylen, R. Ordelman: Generating expressive speech for storytelling applications, IEEE Trans. Audio Speech Language Process. 14(4), 1137-1144 (2006)
N. Campbell: Recording techniques for capturing natural everyday speech. In: Proc. Language Resources and Evaluation Conference (Las Palmas, Spain 2002) pp. 2029-2032
J.E. Cahn: Generating expression in synthesized speech, M.S. Thesis (Massachusetts Institute of Technology, Cambridge 1989), http://alumni.media.mit.edu/cahn/emot-speech.html
J.E. Cahn: From sad to glad: Emotional computer voices. In: Proc. Speech Tech ʼ88 Voice Input/Output Applications Conference and Exhibition (New York 1988) pp. 35-37
J.E. Cahn: The generation of affect in synthesized speech, J. Am. Voice I/O Soc. 8, 1-19 (1990)
J.E. Cahn: Generation of affect in synthesized speech. In: Proc. 1989 Conf. American Voice I/O Society (Newport Beach, California 1989) pp. 251-256
M. Bulut, S.S. Narayanan, A.K. Syrdal: Expressive speech synthesis using a concatenative synthesizer, Proc. ICSLP 2002, 1265-1268 (2002)
MITʼs Kismet (the expressiveness and richness of the robotʼs vocal modality and how it supports social interaction): http://www.ai.mit.edu/projects/sociable/expressive-speech.html
MIT Kismet and Affective Intent in Speech: http://www.ai.mit.edu/projects/sociable/affective-intent.html
A. Iida, N. Campbell, M. Yasumura: Design and evaluation of synthesised speech with emotion, J. Inform. Process. Soc. Jpn. 40 (1998)
A. Iida, N. Higuchi, N. Campbell, M. Yasumura: Corpus-based speech synthesis system with emotion, Speech Commun. 40(1-2), 161-187 (2002)
E. Eide, A. Aaron, R. Bakis, W. Hamza, M.A. Picheny, J.F. Pitrelli: A corpus-based approach to AHEM expressive speech synthesis. In: Proc. 5th ISCA Speech Synthesis Workshop (Pittsburgh, USA 2004)
J.F. Pitrelli, R. Bakis, E.M. Eide, R. Fernandes, W. Hamza, M.A. Picheny: The IBM expressive text-to-speech synthesis system for American English, IEEE Trans. Audio Speech Language Process. 14(4), 1099-1108 (2006)
SVOX: http://www.svox.com/Innovation.aspx
The HUMAINE Portal - Research on Emotions and Human-Machine Interaction: http://emotion-research.net/
The Expressive Speech Processing project web pages: http://feast.atr.jp/
N. Campbell: Specifying affect and emotion for expressive speech synthesis. In: Computational Linguistics and Intelligent Text Processing, ed. by A. Gelbukh (Springer, Berlin, Heidelberg 2004)
N. Campbell: Conversational speech synthesis and the need for some laughter, IEEE Trans. Audio Speech Language Process. 14(4), 1171-1179 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Campbell, N. (2008). Expressive/Affective Speech Synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)