Skip to main content

Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech

  • Chapter
Computing Prosody
  • 302 Accesses

Abstract

In the area of speech synthesis it is already possible to generate understandable speech with discourse neutral prosody for simple written texts. However, at ATR-ITL we are researching speech synthesis techniques for use in a speech translation environment. Dialogues, in such conversations, involve much richer forms of prosodic variation than are required for the reading of texts. For our translations to sound natural it is necessary for our synthesis system to offer a wide range of prosodic variability, which can be described at an appropriate level of abstraction. This paper describes a multi-level intonation system which generates a fundamental frequency (F 0 ) contour based on input labelled with high level discourse information, including speech act type and focussing information, as well as part of speech and syntactic constituent structure. The system is rule driven but rules (and parameters) are derived from naturally spoken dialogues. Two experiments using this model are described, testing its accuracy. First results are given for a system to predict ToBI intonation labels from discourse information use a CART decision tree. Second a detailed investigation of the intonational variation of the word “okay” in different discourse contexts is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Beckman. A typology of spontaneous speech. In Computing Prosody: Approaches to a Computational Analysis of the Prosody of Spontaneous Speech. New York: Springer-Verlag, 1997. This volume.

    Google Scholar 

  2. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks, 1984.

    MATH  Google Scholar 

  3. A. W. Black and P. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 2, pp. 715–718, 1994.

    Google Scholar 

  4. A. W. Black and P. Taylor. CHATR: A generic speech synthesis system. Proceedings of COLING-94, II:983–986, 1994.

    Google Scholar 

  5. H. Fujisaki. Dynamic characteristics of voice fundamental frequency in speech and singing. In P. MacNeilage, editor, The Production of Speech, pp. 39–55. Berlin: Springer-Verlag, 1983.

    Chapter  Google Scholar 

  6. J. Hirschberg. Pitch accent in context: Predicting prominence from text. Artificial Intelligence, 63:305–340, 1993.

    Article  Google Scholar 

  7. M. Ostendorf, P. Price, and S. Shattuck-Hufnagel. The Boston University Radio News Corpus. Technical Report ECS-95-001, Electrical, Computer and Systems Engineering Department, Boston University, Boston, MA, 1995.

    Google Scholar 

  8. K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: Differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992.

    Google Scholar 

  9. M. Seligman, L. Fais, and M. Tomokiyo. A bilingual set of communicative act labels for spontaneous dialogues. Technical Report Technical Report TR-IT-0081, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.

    Google Scholar 

  10. A. Stenström. An Introduction to Spoken Interaction. London: Longman, 1994.

    Google Scholar 

  11. P. Taylor. The Rise/Fall/Connection model of intonation. Speech Communication, 15:169–186, 1994.

    Article  Google Scholar 

  12. P. Taylor and A. W. Black. Synthesizing conversational intonation from a linguistically rich input. Proceedings of the ESC A/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 175–178, 1994.

    Google Scholar 

  13. C. W. Wightman and W. N. Campbell. Automatic labelling of prosodic structure. Technical Report TR-IT-0061, ATR Interpreting Telecommunications Laboratories, Kyoto, Japan, 1994.

    Google Scholar 

  14. G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61:747–776, 1985.

    Article  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Black, A.W. (1997). Predicting the Intonation of Discourse Segments from Examples in Dialogue Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_9

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics