Abstract
In the area of speech synthesis it is already possible to generate understandable speech with discourse neutral prosody for simple written texts. However, at ATR-ITL we are researching speech synthesis techniques for use in a speech translation environment. Dialogues, in such conversations, involve much richer forms of prosodic variation than are required for the reading of texts. For our translations to sound natural it is necessary for our synthesis system to offer a wide range of prosodic variability, which can be described at an appropriate level of abstraction. This paper describes a multi-level intonation system which generates a fundamental frequency (F 0 ) contour based on input labelled with high level discourse information, including speech act type and focussing information, as well as part of speech and syntactic constituent structure. The system is rule driven but rules (and parameters) are derived from naturally spoken dialogues. Two experiments using this model are described, testing its accuracy. First results are given for a system to predict ToBI intonation labels from discourse information use a CART decision tree. Second a detailed investigation of the intonational variation of the word “okay” in different discourse contexts is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Beckman. A typology of spontaneous speech. In Computing Prosody: Approaches to a Computational Analysis of the Prosody of Spontaneous Speech. New York: Springer-Verlag, 1997. This volume.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks, 1984.
A. W. Black and P. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 2, pp. 715–718, 1994.
A. W. Black and P. Taylor. CHATR: A generic speech synthesis system. Proceedings of COLING-94, II:983–986, 1994.
H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.
J. Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.
M. Ostendorf, P. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS-95-001, Electrical, Computer and Systems Engineering Department, Boston University, Boston, MA, 1995.
K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992.
M. Seligman, L. Fais, and M. Tomokiyo. A bilingual set of communicative act labels for spontaneous dialogues. Technical Report Technical Report TR-IT-0081, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.
A. Stenström. An Introduction to Spoken Interaction. London: Longman, 1994.
P. Taylor. The Rise/Fall/Connection model of intonation. Speech Communication, 15:169–186, 1994.
P. Taylor and A. W. Black. Synthesizing conversational intonation from a linguistically rich input. Proceedings of the ESC A/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 175–178, 1994.
C. W. Wightman and W. N. Campbell. Automatic labelling of prosodic structure. Technical Report TR-IT-0061, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.
G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61:747–776, 1985.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Black, A.W. (1997). Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_9
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive