Abstract
Speech synthesis is not necessarily synonymous with text-to-speech. This paper describes a prototype talking machine that produces synthesised speech from a combination of speaker, language, speaking-style, and content information, using icon-based input. The paper addresses the problems of specifying the text-content and output realisation of a conversational utterance from a combination of conceptual icons, in conjunction with language and speaker information. It concludes that in order to specify the speech content (i.e., both text details and speaking-style) adequately, selection options for speaker-commitment and speaker-listener relations will be required. The paper closes with a description of a constraint-based method for selection of affect-marked speech samples for concatenative speech synthesis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Campbell, N., Mokhtari, P.: Voice Quality; the 4th prosodic parameter. In: Proc. 15th ICPhS, Barcelona, Spain (2003)
Auchlin, A.: Linguistics, Geneva. Personal communication (2003)
JST/CREST Expressive Speech Processing project, introductory web pages at: http://feast.his.atr.co.jp/
Campbell, W.N.: Databases of Emotional Speech. In: Proc. ISCA (International Speech Communication Association) ITRW on Speech and Emotion, pp. 34–38 (2000)
Campbell, W.N., Black, A.W.: CHATR a multi-lingual speech re-sequencing synthesis system. Technical Report of IEICE SP96-7, pp. 45–52 (1996)
Campbell, W.N.: Processing a Speech Corpus for CHATR Synthesis. In: Proceedings of the International Conference on Speech Processing, pp. 183–186 (1997)
Campbell, W.N.: The Recording of Emotional speech; JST/CREST database research. In: Proc. LREC 2002 (2002)
Campbell, N., Mokhtari, P.: DAT vs. Minidisc — Is MD recording quality good enough for prosodic analysis? In: Proc. ASJ Spring Meeting 2002, 1-P-27 (2002)
Campbell, W.N., Marumoto, T.: Automatic labelling of voice-quality in speech databases for synthesis. In: Proceedings of 6th ICSLP 2000, pp. 468–471 (2000)
Mokhtari, P., Campbell, W.N.: Automatic detection of acoustic centres of reliability for tagging paralinguistic information in expressive speech. In: Proc. LREC 2002 (2002)
Iida, A., Iga, S., Higuchi, F., Campbell, N., Yasumura, M.: A speech synthesis system with emotion for assisting communication. In: ISCA (International Speech Communication and Assosiation) ITRW on Speech and Emotion, pp. 167–172 (2000)
Iida, A., Campbell, N., Yasumura, M.: Design and Evaluation of Synthesised Speech with Emotion. Journal of Information Processing Society of Japan 40 (1998)
Iida, A., Sakurada, Y., Campbell, N., Yasumura, M.: Communication aid for nonvocal people using corpus-based concatenative speech synthesis. In: Eurospeech 2001 (2001)
Campbell, W.N.: Recording Techniques for capturing natural everyday speech. In: Proc. Language Resources and Evaluation Conference (LREC 2002), Las Palmas, Spain (2002)
Mokhtari, P., Campbell, N.: Automatic measurement of pressed/breathy phonation at acoustic centres of reliability in continuous speech. Special Issue on Speech Information Processing of the IEICE Transactions on Information and Systems, The Institute of Electronics, Information and Communication Engineers E-86-D(3), 574–582 (2003)
Maekawa, K., Koiso, H., Furui, S., Isahara, H.: Spontaneous Speech Corpus of Japanese. In: Proc. LREC 2000, Athens, Greece, pp. 947–952 (2000)
Switchboard telephone-speech database, http://www.ldc.upenn.edu
CALLFRIEND: a telephone-speech database, LDC Catalog (2001)
Campbell, W.N.: Foreign-Language Speech Synthesis. In: Proceedings ESCA/COCOSDA 3rd Speech Synthesis Workshop, Jenolan Caves, Australia, November 26 (1998)
Campbell, W.N.: Multi-Lingual Concatenative Speech Synthesis. In: Proc. ICSLP 1998 (5th International Conference on Spoken Language Processing), Sydney, Australia, pp. 2835–2838 (1998)
Labov, W., Yeager, M., Steiner, R.: Quantitative study of sound change in progress. U.S. Regional Survey, Philadelphia, PA (1972)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Campbell, N. (2004). Specifying Affect and Emotion for Expressive Speech Synthesis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_47
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive