Abstract
Goal of this paper is to introduce appropriate motion models for those visual articulatory movements that are relevant for the process of speechreading, and with this, design a facial animation program with an open input text vocabulary for use as a training aid for speechreading.
Since the experimental work of Menzerath and de Lacerda (1931) it is known that the movements of the speech organs are structurally interrelated within a spoken context. The sound signals and the related visual articulatory movements are created in the course of a fully overlapping coarticulation.
The paper will illustrate the interrelation of audio and visual speech features. It can be shown that even in fluent speech an interactive determination of moments of optimum articulation is possible for most of the phonemes. The facial expressions and the lip contours as well as the set points within the phoneme boundaries depend on the context. Several approaches for the description of these dependencies will be explained and discussed.
A facial animation system will be described which uses a codebook of specific key-pictures corresponding to the moments of optimum articulatory positions in fluent speech. Face movements are generated by selecting video pictures out of thecodebook and subsequently calculating interim pictures with the help of interpolation algorithms Movements of the tongue are artificially introduced into the opening of the mouth.
In order to evaluate different motion models it is necessary to present the corresponding facial animation model to people who can speechread. This is due to the point that too many parameters of the process of human visual perception are still widely unknown. For this reason, the presented evaluation methods are based on visemes as the smallest perceptible visual units of the articulation process. The interaction of these units is qualitatively described. Results of this experimental research extend the knowledge about articulation and coarticulation and are being used for the improvement of the facial animation system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bothe, HH. (1996). Relations of Audio and Visual Speech Signals in a Physical Feature Space: Implications for the Hearing-impaired. In: Stork, D.G., Hennecke, M.E. (eds) Speechreading by Humans and Machines. NATO ASI Series, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-13015-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-662-13015-5_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-08252-8
Online ISBN: 978-3-662-13015-5
eBook Packages: Springer Book Archive