We present a speech synthesis architecture, IPOX, which allows the integration of various aspects of prosodic structure at different structural levels. This is achieved by using a hierarchical, metrical representation of the input string in analysis as well as phonetic interpretation. The output of the latter step consists of parameters for the Klatt synthesizer. The architecture is based primarily on YorkTalk [Col92, Col94, Loc92], but differs in that it uses a rule compiler [Dir93], which allows a clean separation of linguistic statements and computational execution as well as a more concise statement of various kinds of generalizations.
KeywordsSpeech Synthesis Prosodic Structure Temporal Interpretation Phrase Structure Rule Connected Speech
Unable to display preview. Download preview PDF.
- [AHK87]J. Allen, M. S. Hunnicut, and D. KLatt. From Text to Speech: The MITALK System. Cambridge University Press, Cambridge, 1987.Google Scholar
- [Col92]J. S. Coleman. “Synthesis-by-rule” without segments or rewrite rules. In Talking Machines: Theories, Models, and Designs, G. Bailly, C. Benoit, and T. R. Sawallis, eds. Elsevier, Amsterdam, 211–224, 1992.Google Scholar
- [Col95]J. S. Coleman. Synthesis of connected speech. To appear in Work in Progress No. 7. Speech Research Laboratory, University of Reading, 1–12, 1995.Google Scholar
- [Dir93]A. Dirksen. Phonological parsing. In Computational Linguistics in the Netherlands: Papers from the Third CLIN meeting, W. Sijtsma and O. Zweekhorst, eds. Tilburg University, Netherlands, 27–38, 1993.Google Scholar
- [OGC93]J. P. Olive, A. Greenwood and J. Coleman. Acoustics of American English Speech: A Dynamic Approach. Springer-Verlag, New York, 1993.Google Scholar
- [PB88]J. B. Pierrehumbert and M. E. Beckman. Japanese Tone Structure. MIT Press, Cambridge, MA, 1988.Google Scholar