Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech

  • Alan W. Black


In the area of speech synthesis it is already possible to generate understandable speech with discourse neutral prosody for simple written texts. However, at ATR-ITL we are researching speech synthesis techniques for use in a speech translation environment. Dialogues, in such conversations, involve much richer forms of prosodic variation than are required for the reading of texts. For our translations to sound natural it is necessary for our synthesis system to offer a wide range of prosodic variability, which can be described at an appropriate level of abstraction. This paper describes a multi-level intonation system which generates a fundamental frequency (F 0 ) contour based on input labelled with high level discourse information, including speech act type and focussing information, as well as part of speech and syntactic constituent structure. The system is rule driven but rules (and parameters) are derived from naturally spoken dialogues. Two experiments using this model are described, testing its accuracy. First results are given for a system to predict ToBI intonation labels from discourse information use a CART decision tree. Second a detailed investigation of the intonational variation of the word “okay” in different discourse contexts is presented.


Speech Synthesis Pitch Accent Prosodic Phrase Speech Synthesis System Accented Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Bec96b]
    M. Beckman. A typology of spontaneous speech. In Computing Prosody: Approaches to a Computational Analysis of the Prosody of Spontaneous Speech. New York: Springer-Verlag, 1997. This volume. Google Scholar
  2. [BFOS84]
    L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks, 1984. zbMATHGoogle Scholar
  3. [BT94a]
    A. W. Black and P. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 2, pp. 715–718, 1994. Google Scholar
  4. [BT94b]
    A. W. Black and P. Taylor. CHATR: A generic speech synthesis system. Proceedings of COLING-94, II:983–986, 1994. Google Scholar
  5. [Fuj83]
    H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983. CrossRefGoogle Scholar
  6. [Hir93a]
    J. Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993. CrossRefGoogle Scholar
  7. [OPSH95a]
    M. Ostendorf, P. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS-95-001, Electrical, Computer and Systems Engineering Department, Boston University, Boston, MA, 1995. Google Scholar
  8. [SBSP92]
    K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992. Google Scholar
  9. [SFT94]
    M. Seligman, L. Fais, and M. Tomokiyo. A bilingual set of communicative act labels for spontaneous dialogues. Technical Report Technical Report TR-IT-0081, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994. Google Scholar
  10. [Ste94]
    A. Stenström. An Introduction to Spoken Interaction. London: Longman, 1994.Google Scholar
  11. [Tay94]
    P. Taylor. The Rise/Fall/Connection model of intonation. Speech Communication, 15:169–186, 1994.CrossRefGoogle Scholar
  12. [TB94]
    P. Taylor and A. W. Black. Synthesizing conversational intonation from a linguistically rich input. Proceedings of the ESC A/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 175–178, 1994. Google Scholar
  13. [WC94]
    C. W. Wightman and W. N. Campbell. Automatic labelling of prosodic structure. Technical Report TR-IT-0061, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994. Google Scholar
  14. [WH85]
    G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61:747–776, 1985. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1997

Authors and Affiliations

  • Alan W. Black

There are no affiliations available

Personalised recommendations