A Neural Network Cluster for the Control of a Speech Synthesizer

  • Michael S. Scordilis
Part of the Microprocessor-Based and Intelligent Systems Engineering book series (ISCA, volume 9)


The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related, to a large extent, to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, which is the most effective synthesis method, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information, have hindered the effective control of the models. In this paper, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words comprising of larynx-produced phonemes were analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations. The generalization capabilities of the developed neural network will be fully tested by an enriched training corpus.


Listening Test Speech Synthesis Training Corpus Intelligible Speech Synthetic Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Shriver, B. D., “Artificial Neural Systems”, IEEE Computer, (21), 3, Mar.1988.Google Scholar
  2. [2]
    Hecht-Nielsen, B., “Neurocomputing: picking the human brain”, IEEE Spectrum, (25), 3, Mar. 1988.Google Scholar
  3. [3]
    Lippman, R. P., “An Introduction to Computing with Neural Nets”, IEEE ASSP Magazine, (4) 2, April 1987.Google Scholar
  4. [4]
    Rumelhart, D. and McClelland, J., “Learning Internal Representations by Error Propagation”, in “Parallel Distributed Processing: Explorations in the Microstructure of Cognition”, Volume One: Foundations., pp. 318–362, Cambridge, MA, Bradford Books/MIT Press, 1986.Google Scholar
  5. [5]
    Sejnowski, T. J. and Rosenberg, C. R., “NETtalk: A Parallel Network that Learns to Read Aloud”, Johns Hopkins University Electrical Engineering and Computer Science Technical Report JHU/EECS-86/01.Google Scholar
  6. [6]
    Klatt, D. H., “Software for a cascade/parallel formant synthesizer”, J. Acoust. Soc. Am. 67(3), pp. 971–995, March 1980.CrossRefGoogle Scholar
  7. [7]
    Klatt, Dennis H., “Review of text-to-speech conversion for English”, J. Acoust. Soc. Am., Sept. 1987.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1991

Authors and Affiliations

  • Michael S. Scordilis
    • 1
  1. 1.Department of Electrical and Electronic EngineeringUniversity of MelbourneAustralia

Personalised recommendations