Acoustic-labial speaker verification

Jourlin, Pierre; Luettin, Juergen; Genoud, Dominique; Wassner, Hubert

doi:10.1007/BFb0016011

Pierre Jourlin^1,2,
Juergen Luettin¹,
Dominique Genoud¹ &
…
Hubert Wassner¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Included in the following conference series:

International Conference on Audio- and Video-Based Biometric Person Authentication

2420 Accesses
19 Citations

Abstract

This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features and the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different modalities onto a common confidence interval. We also describe a novel method for integrating the scores of multiple classifiers. Verification experiments are reported for the individual modalities and for the combined classifier. The performance of the integrated system out-performed each sub-system and reduced the false acceptance rate of the acoustic sub-system from 2.3% to 0.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Acheroy, C. Beumier, J. Bigün, G. Chollet, B. Duc, S. Fischer, D. Genoud, P. Lockwood, G. Maitre, S. Pigeon, I. Pitas, K. Sobottka and L. Vandendorpe (1996) Multi-Modal Person Verification Tools using Speech and Images Proceedings of the European Conference on Multimedia Applications, Services and Techniques, Louvain-la-neuve, 747–761.
Google Scholar
R. Brunelli and D. Falavigna (1995) Person Identification Using Multiple Cues IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17, no. 10, 955–966.
Google Scholar
R. Chellappa, C. L. Wilson and S. Sirohey (1995) Human and Machine Recognition of Faces: A Survey Proceedings IEEE, vol. 83, no. 5, 705–740.
Google Scholar
T. F. Cootes, A. Hill, C. J. Talor and J. Haslam (1994) Use of active shape models for locating structures in medical images Image and Vision Computing, vol. 12, no. 6, 355–365.
Google Scholar
S. Furui (1994) An Overview of speaker recognition technology Proceedings of the ESCA Workshop on Automatic Speaker Recognition Identification Verification, Martigny, 1–9.
Google Scholar
D. Genoud, Frédéric Bimbot, G. Gravier, G. Chollet (1996) Combining Methods to Improve Speaker Verification Decision Proceedings of the International Conference on Speech and Language Processing, Philadelphia.
Google Scholar
P. Jourlin, Marc El-Bèze and Henri Méloni (1995) Bimodal Speech Recognition Proceedings of the International Workshop on Automatic Face and Gesture Recognition, Zurich, 320–325.
Google Scholar
P. Jourlin (1996) Handling Disynchronization Phenomena with HMM in Connected Speech Proceedings of European Signal Processing Conference, Trieste, 1:133–136.
Google Scholar
G. Chollet, J. L. Cochard, A. Constantinescu and P. Langlais (1995) Swiss French Polyphone and Polyvar: Telephone. Speech Databases to Study Intra and Inter Speaker Variability Technical Report, IDIAP, Martigny.
Google Scholar
J. Luettin, N. A. Thacker and S. W. Beet (1996) Locating and Tracking Facial Speech Features Proceedings of the International Conference on Pattern Recognition, Vienna.
Google Scholar
J. Luettin, Neil A. Thacker and Steve Beet (1996) Speaker Identification by Lipreading Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 62–65.
Google Scholar
J. Luettin N. A. Thacker and S. W. Beet (1996) Speechreading using shape and intensity information Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 58–61.
Google Scholar
M. W. Mak and W. G. Allen (1994) Lip-Motion analysis for speech segmentation in noise Speech Communication, vol. 14, no. 3, 279–296.
Google Scholar
E. D. Petajan (1984) Automatic, Lipreading to Enhance Speech Recognition Proceedings of the Global Communications Conference, IEEE Communication Society, Atlanta, Georgia, 265–272.
Google Scholar
S. Pigeon and L. Vandendorpe (1997) The M2VTS Multimodal Face Database (Release 1.00) Proceedings of the First International Conference on Audio-and Video-based Biometric Person Authentication, Crans-Montana, Switzerland.
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP, rue du Simplon 4, CP 592, CH-1920, Martigny, Switzerland
Pierre Jourlin, Juergen Luettin, Dominique Genoud & Hubert Wassner
LIA, 339 chemin des Meinajariès, BP 1228, 84911, Avignon Cedex 9, France
Pierre Jourlin

Authors

Pierre Jourlin
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Luettin
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Genoud
View author publications
You can also search for this author in PubMed Google Scholar
Hubert Wassner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jourlin, P., Luettin, J., Genoud, D., Wassner, H. (1997). Acoustic-labial speaker verification. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016011

Download citation

DOI: https://doi.org/10.1007/BFb0016011
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics