A Neural Network Cluster for the Control of a Speech Synthesizer
The quality of text-to-speech conversion performed by machines is still unacceptably low and does not compare well with natural speech. As a result, synthetic speech has not found wide acceptance and has been restricted mainly in useful applications for the handicapped. The problems associated with automatic speech synthesis are related, to a large extent, to the methods of controlling the mathematical models of the human vocal tract and its properties as they change with time during discourse. In formant synthesis, which is the most effective synthesis method, rules are applied to relate the incoming phonemic information to values of the synthesizer control vectors. Such rules are usually developed through the analysis of a representative set of utterances and adjusted with listening tests. The tedious nature of the parameter extraction process and the lack of unambiguous relationships of acoustic events with spectral information, have hindered the effective control of the models. In this paper, artificial neural networks are employed to assist with the latter concern. For this purpose, a set of 56 common words comprising of larynx-produced phonemes were analyzed and used to train a network cluster. The system was able to produce intelligible speech for certain phonemic combinations. The generalization capabilities of the developed neural network will be fully tested by an enriched training corpus.
KeywordsListening Test Speech Synthesis Training Corpus Intelligible Speech Synthetic Speech
Unable to display preview. Download preview PDF.
- Shriver, B. D., “Artificial Neural Systems”, IEEE Computer, (21), 3, Mar.1988.Google Scholar
- Hecht-Nielsen, B., “Neurocomputing: picking the human brain”, IEEE Spectrum, (25), 3, Mar. 1988.Google Scholar
- Lippman, R. P., “An Introduction to Computing with Neural Nets”, IEEE ASSP Magazine, (4) 2, April 1987.Google Scholar
- Rumelhart, D. and McClelland, J., “Learning Internal Representations by Error Propagation”, in “Parallel Distributed Processing: Explorations in the Microstructure of Cognition”, Volume One: Foundations., pp. 318–362, Cambridge, MA, Bradford Books/MIT Press, 1986.Google Scholar
- Sejnowski, T. J. and Rosenberg, C. R., “NETtalk: A Parallel Network that Learns to Read Aloud”, Johns Hopkins University Electrical Engineering and Computer Science Technical Report JHU/EECS-86/01.Google Scholar
- Klatt, Dennis H., “Review of text-to-speech conversion for English”, J. Acoust. Soc. Am., Sept. 1987.Google Scholar