All-Prosodic Speech Synthesis

  • Arthur Dirksen
  • John S. Coleman


We present a speech synthesis architecture, IPOX, which allows the integration of various aspects of prosodic structure at different structural levels. This is achieved by using a hierarchical, metrical representation of the input string in analysis as well as phonetic interpretation. The output of the latter step consists of parameters for the Klatt synthesizer. The architecture is based primarily on YorkTalk [Col92, Col94, Loc92], but differs in that it uses a rule compiler [Dir93], which allows a clean separation of linguistic statements and computational execution as well as a more concise statement of various kinds of generalizations.


Speech Synthesis Prosodic Structure Temporal Interpretation Phrase Structure Rule Connected Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [AHK87]
    J. Allen, M. S. Hunnicut, and D. KLatt. From Text to Speech: The MITALK System. Cambridge University Press, Cambridge, 1987.Google Scholar
  2. [Col92]
    J. S. Coleman. “Synthesis-by-rule” without segments or rewrite rules. In Talking Machines: Theories, Models, and Designs, G. Bailly, C. Benoit, and T. R. Sawallis, eds. Elsevier, Amsterdam, 211–224, 1992.Google Scholar
  3. [Col94]
    J. S. Coleman. Polysyllabic words in the YorkTalk synthesis system. In Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III, P. A. Keating, ed. Cambridge University Press, Cambridge, 293–324, 1994.CrossRefGoogle Scholar
  4. [Col95]
    J. S. Coleman. Synthesis of connected speech. To appear in Work in Progress No. 7. Speech Research Laboratory, University of Reading, 1–12, 1995.Google Scholar
  5. [Dir93]
    A. Dirksen. Phonological parsing. In Computational Linguistics in the Netherlands: Papers from the Third CLIN meeting, W. Sijtsma and O. Zweekhorst, eds. Tilburg University, Netherlands, 27–38, 1993.Google Scholar
  6. [DQ93]
    A. Dirksen and H. Quené. Prosodic analysis: the next generation. In Analysis and Synthesis of Speech: Strategic Research Towards High-Quality Text-to-Speech Generation, V. J. van Heuven and L. C. W. Pols, eds. Mouton de Gruyter, Berlin, 131–144, 1993.CrossRefGoogle Scholar
  7. [Loc92]
    J. K. Local. Modelling assimilation in nonsegmental, rule-free synthesis. Papers in Laboratory Phonology II: Gesture, Segment, Prosody, G. J. Docherty and D. R. Ladd, eds. Cambridge University Press, Cambridge, 190–223, 1992.CrossRefGoogle Scholar
  8. [OGC93]
    J. P. Olive, A. Greenwood and J. Coleman. Acoustics of American English Speech: A Dynamic Approach. Springer-Verlag, New York, 1993.Google Scholar
  9. [Ple81]
    J. B. Pierrehumbert. Synthesizing intonation. J. Acoust. Soc. Amer. 70(4):985–995, 1981.CrossRefGoogle Scholar
  10. [PB88]
    J. B. Pierrehumbert and M. E. Beckman. Japanese Tone Structure. MIT Press, Cambridge, MA, 1988.Google Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Arthur Dirksen
  • John S. Coleman

There are no affiliations available

Personalised recommendations