Relations of Audio and Visual Speech Signals in a Physical Feature Space: Implications for the Hearing-impaired

Bothe, Hans-Heinrich

doi:10.1007/978-3-662-13015-5_34

Hans-Heinrich Bothe³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 150))

232 Accesses
1 Citations

Abstract

Goal of this paper is to introduce appropriate motion models for those visual articulatory movements that are relevant for the process of speechreading, and with this, design a facial animation program with an open input text vocabulary for use as a training aid for speechreading.

Since the experimental work of Menzerath and de Lacerda (1931) it is known that the movements of the speech organs are structurally interrelated within a spoken context. The sound signals and the related visual articulatory movements are created in the course of a fully overlapping coarticulation.

The paper will illustrate the interrelation of audio and visual speech features. It can be shown that even in fluent speech an interactive determination of moments of optimum articulation is possible for most of the phonemes. The facial expressions and the lip contours as well as the set points within the phoneme boundaries depend on the context. Several approaches for the description of these dependencies will be explained and discussed.

A facial animation system will be described which uses a codebook of specific key-pictures corresponding to the moments of optimum articulatory positions in fluent speech. Face movements are generated by selecting video pictures out of thecodebook and subsequently calculating interim pictures with the help of interpolation algorithms Movements of the tongue are artificially introduced into the opening of the mouth.

In order to evaluate different motion models it is necessary to present the corresponding facial animation model to people who can speechread. This is due to the point that too many parameters of the process of human visual perception are still widely unknown. For this reason, the presented evaluation methods are based on visemes as the smallest perceptible visual units of the articulation process. The interaction of these units is qualitatively described. Results of this experimental research extend the knowledge about articulation and coarticulation and are being used for the improvement of the facial animation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Technical University of Berlin, D-10587, Berlin, Germany
Hans-Heinrich Bothe

Authors

Hans-Heinrich Bothe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ricoh California Research Center, 2882 Sand Hill Road #115, 94025-7022, Menlo Park, CA, USA
David G. Stork & Marcus E. Hennecke &
Department of Electrical Engineering, Stanford University, 94305, Stanford, CA, USA
David G. Stork & Marcus E. Hennecke &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bothe, HH. (1996). Relations of Audio and Visual Speech Signals in a Physical Feature Space: Implications for the Hearing-impaired. In: Stork, D.G., Hennecke, M.E. (eds) Speechreading by Humans and Machines. NATO ASI Series, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-13015-5_34

Download citation

DOI: https://doi.org/10.1007/978-3-662-13015-5_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-08252-8
Online ISBN: 978-3-662-13015-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics