Abstract
This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features and the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different modalities onto a common confidence interval. We also describe a novel method for integrating the scores of multiple classifiers. Verification experiments are reported for the individual modalities and for the combined classifier. The performance of the integrated system out-performed each sub-system and reduced the false acceptance rate of the acoustic sub-system from 2.3% to 0.5%.
Preview
Unable to display preview. Download preview PDF.
References
M. Acheroy, C. Beumier, J. Bigün, G. Chollet, B. Duc, S. Fischer, D. Genoud, P. Lockwood, G. Maitre, S. Pigeon, I. Pitas, K. Sobottka and L. Vandendorpe (1996) Multi-Modal Person Verification Tools using Speech and Images Proceedings of the European Conference on Multimedia Applications, Services and Techniques, Louvain-la-neuve, 747–761.
R. Brunelli and D. Falavigna (1995) Person Identification Using Multiple Cues IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17, no. 10, 955–966.
R. Chellappa, C. L. Wilson and S. Sirohey (1995) Human and Machine Recognition of Faces: A Survey Proceedings IEEE, vol. 83, no. 5, 705–740.
T. F. Cootes, A. Hill, C. J. Talor and J. Haslam (1994) Use of active shape models for locating structures in medical images Image and Vision Computing, vol. 12, no. 6, 355–365.
S. Furui (1994) An Overview of speaker recognition technology Proceedings of the ESCA Workshop on Automatic Speaker Recognition Identification Verification, Martigny, 1–9.
D. Genoud, Frédéric Bimbot, G. Gravier, G. Chollet (1996) Combining Methods to Improve Speaker Verification Decision Proceedings of the International Conference on Speech and Language Processing, Philadelphia.
P. Jourlin, Marc El-Bèze and Henri Méloni (1995) Bimodal Speech Recognition Proceedings of the International Workshop on Automatic Face and Gesture Recognition, Zurich, 320–325.
P. Jourlin (1996) Handling Disynchronization Phenomena with HMM in Connected Speech Proceedings of European Signal Processing Conference, Trieste, 1:133–136.
G. Chollet, J. L. Cochard, A. Constantinescu and P. Langlais (1995) Swiss French Polyphone and Polyvar: Telephone. Speech Databases to Study Intra and Inter Speaker Variability Technical Report, IDIAP, Martigny.
J. Luettin, N. A. Thacker and S. W. Beet (1996) Locating and Tracking Facial Speech Features Proceedings of the International Conference on Pattern Recognition, Vienna.
J. Luettin, Neil A. Thacker and Steve Beet (1996) Speaker Identification by Lipreading Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 62–65.
J. Luettin N. A. Thacker and S. W. Beet (1996) Speechreading using shape and intensity information Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 58–61.
M. W. Mak and W. G. Allen (1994) Lip-Motion analysis for speech segmentation in noise Speech Communication, vol. 14, no. 3, 279–296.
E. D. Petajan (1984) Automatic, Lipreading to Enhance Speech Recognition Proceedings of the Global Communications Conference, IEEE Communication Society, Atlanta, Georgia, 265–272.
S. Pigeon and L. Vandendorpe (1997) The M2VTS Multimodal Face Database (Release 1.00) Proceedings of the First International Conference on Audio-and Video-based Biometric Person Authentication, Crans-Montana, Switzerland.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jourlin, P., Luettin, J., Genoud, D., Wassner, H. (1997). Acoustic-labial speaker verification. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016011
Download citation
DOI: https://doi.org/10.1007/BFb0016011
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive