Digital Formant Synthesis
The attempt to build a talking machine has a long history and can even be traced back to a time before the beginning of the Christian era (Linggard, 1985). The first complete talking machine is due to von Kempelen (1791) and is described in a book of over 400 pages that also reports on the twenty or so years of experimentation that were needed to build the device (interesting historical accounts of the development of speech synthesis are given in Dudley & Tarnóczy, 1950; Flanagan, 1972; Linggard, 1985; see also Klatt, 1987; and Flanagan & Ra-biner, 1973). It was not until the 20th century that speech synthesis became a widespread research endeavour. Part of the reason for this is that with the invention of the telephone, there was an increasing need to find a way of reducing the data in speech transmission without degrading significantly its quality; and this was one of the principal motivations that led to the invention of the first electronic speech synthesis system capable of synthesising whole utterances which was demonstrated publicly at the New York World’s Fair in 1939 and in San Francisco in 1940 (Dudley, 1939; Dudley et al., 1939). Another reason was that mechanical devices that model the vocal tract accurately enough to produce intelligible speech are very difficult to construct; and the advent of electronic instrumentation at the beginning of this century provided a way of synthesising speech without having to copy the action of the vocal organs in detail. Some landmarks in the development of speech synthesis systems in the 1950s include the pattern playback system of the Haskins Laboratories (Cooper, Liberman, & Borst, 1951), the Parametric Artificial Talker (PAT) by Lawrence (1953) and the Orator Verbis Electris (OVE) system developed by Fant (1953). In more recent times, major advances in the development of text-to-speech systems have been made both in the development of the MlTtalk text-to-speech system developed over a number of years by Dennis Klatt at MIT (Allen, Hunnicutt, & 11att, 1987; Klatt, 1980, 1982, 1987; Klatt & Blatt, 1990), which can synthesise intelligible and natural English speech in different voices and from an unrestricted vocabulary, and the KTH synthesis-by-rule system developed at Stockholm (Carlson & Granström, 1975, 1976; Carlson, Granström, & Hunnicutt, 1982).1
KeywordsNasal Consonant Vocal Tract Cascade Model Speech Synthesis Synthetic Speech
Unable to display preview. Download preview PDF.
- 1.There are many other different kinds of speech synthesis systems available. The most important of these are discussed in the review article in Klatt (1987) which also includes recordings of them. There is also much material available on the WWW currently on the comrnp.speech web page(http://vii.speech.cs.cmu.edu:80/comp.speech).
- 2.These publications provide some of the background to the Haskins Laboratory articulatory synthesis system — an excellent demonstration of this (and also of the Pattern Playback system) can be found on their WWW site: http://www.haskins.yale.edu.
- 3.This causes a zero to be introduced at frequencies of zero Hertz and the Nyquist frequency.Google Scholar