Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech

Black, Alan W.

doi:10.1007/978-1-4612-2258-3_9

Alan W. Black

302 Accesses

Abstract

In the area of speech synthesis it is already possible to generate understandable speech with discourse neutral prosody for simple written texts. However, at ATR-ITL we are researching speech synthesis techniques for use in a speech translation environment. Dialogues, in such conversations, involve much richer forms of prosodic variation than are required for the reading of texts. For our translations to sound natural it is necessary for our synthesis system to offer a wide range of prosodic variability, which can be described at an appropriate level of abstraction. This paper describes a multi-level intonation system which generates a fundamental frequency (F₀) contour based on input labelled with high level discourse information, including speech act type and focussing information, as well as part of speech and syntactic constituent structure. The system is rule driven but rules (and parameters) are derived from naturally spoken dialogues. Two experiments using this model are described, testing its accuracy. First results are given for a system to predict ToBI intonation labels from discourse information use a CART decision tree. Second a detailed investigation of the intonational variation of the word “okay” in different discourse contexts is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Beckman. A typology of spontaneous speech. In Computing Prosody: Approaches to a Computational Analysis of the Prosody of Spontaneous Speech. New York: Springer-Verlag, 1997. This volume.
Google Scholar
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks, 1984.
MATH Google Scholar
A. W. Black and P. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 2, pp. 715–718, 1994.
Google Scholar
A. W. Black and P. Taylor. CHATR: A generic speech synthesis system. Proceedings of COLING-94, II:983–986, 1994.
Google Scholar
H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.
Chapter Google Scholar
J. Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.
Article Google Scholar
M. Ostendorf, P. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS-95-001, Electrical, Computer and Systems Engineering Department, Boston University, Boston, MA, 1995.
Google Scholar
K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992.
Google Scholar
M. Seligman, L. Fais, and M. Tomokiyo. A bilingual set of communicative act labels for spontaneous dialogues. Technical Report Technical Report TR-IT-0081, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.
Google Scholar
A. Stenström. An Introduction to Spoken Interaction. London: Longman, 1994.
Google Scholar
P. Taylor. The Rise/Fall/Connection model of intonation. Speech Communication, 15:169–186, 1994.
Article Google Scholar
P. Taylor and A. W. Black. Synthesizing conversational intonation from a linguistically rich input. Proceedings of the ESC A/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 175–178, 1994.
Google Scholar
C. W. Wightman and W. N. Campbell. Automatic labelling of prosodic structure. Technical Report TR-IT-0061, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.
Google Scholar
G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61:747–776, 1985.
Article Google Scholar

Download references

Authors

Alan W. Black
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ATR Interpreting Telecommunications Research Labs, 2-2, Hikaridai, Seika-cho, Soraku-gun, 619-02, Kyoto, Japan
Yoshinori Sagisaka , Nick Campbell & Norio Higuchi , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Black, A.W. (1997). Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_9

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2258-3_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics