Abstract
The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related, to a large extent, to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, which is the most effective synthesis method, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information, have hindered the effective control of the models. In this paper, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words comprising of larynx-produced phonemes were analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations. The generalization capabilities of the developed neural network will be fully tested by an enriched training corpus.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Shriver, B. D., “Artificial Neural Systems”, IEEE Computer, (21), 3, Mar.1988.
Hecht-Nielsen, B., “Neurocomputing: picking the human brain”, IEEE Spectrum, (25), 3, Mar. 1988.
Lippman, R. P., “An Introduction to Computing with Neural Nets”, IEEE ASSP Magazine, (4) 2, April 1987.
Rumelhart, D. and McClelland, J., “Learning Internal Representations by Error Propagation”, in “Parallel Distributed Processing: Explorations in the Microstructure of Cognition”, Volume One: Foundations., pp. 318–362, Cambridge, MA, Bradford Books/MIT Press, 1986.
Sejnowski, T. J. and Rosenberg, C. R., “NETtalk: A Parallel Network that Learns to Read Aloud”, Johns Hopkins University Electrical Engineering and Computer Science Technical Report JHU/EECS-86/01.
Klatt, D. H., “Software for a cascade/parallel formant synthesizer”, J. Acoust. Soc. Am. 67(3), pp. 971–995, March 1980.
Klatt, Dennis H., “Review of text-to-speech conversion for English”, J. Acoust. Soc. Am., Sept. 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Scordilis, M.S. (1991). A Neural Network Cluster for the Control of a Speech Synthesizer. In: Tzafestas, S.G. (eds) Engineering Systems with Intelligence. Microprocessor-Based and Intelligent Systems Engineering, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-2560-4_27
Download citation
DOI: https://doi.org/10.1007/978-94-011-2560-4_27
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-5130-9
Online ISBN: 978-94-011-2560-4
eBook Packages: Springer Book Archive