Skip to main content

Acoustic-labial speaker verification

  • Audio-video Features and Fusion
  • Conference paper
  • First Online:
Audio- and Video-based Biometric Person Authentication (AVBPA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Abstract

This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features and the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different modalities onto a common confidence interval. We also describe a novel method for integrating the scores of multiple classifiers. Verification experiments are reported for the individual modalities and for the combined classifier. The performance of the integrated system out-performed each sub-system and reduced the false acceptance rate of the acoustic sub-system from 2.3% to 0.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Acheroy, C. Beumier, J. Bigün, G. Chollet, B. Duc, S. Fischer, D. Genoud, P. Lockwood, G. Maitre, S. Pigeon, I. Pitas, K. Sobottka and L. Vandendorpe (1996) Multi-Modal Person Verification Tools using Speech and Images Proceedings of the European Conference on Multimedia Applications, Services and Techniques, Louvain-la-neuve, 747–761.

    Google Scholar 

  2. R. Brunelli and D. Falavigna (1995) Person Identification Using Multiple Cues IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17, no. 10, 955–966.

    Google Scholar 

  3. R. Chellappa, C. L. Wilson and S. Sirohey (1995) Human and Machine Recognition of Faces: A Survey Proceedings IEEE, vol. 83, no. 5, 705–740.

    Google Scholar 

  4. T. F. Cootes, A. Hill, C. J. Talor and J. Haslam (1994) Use of active shape models for locating structures in medical images Image and Vision Computing, vol. 12, no. 6, 355–365.

    Google Scholar 

  5. S. Furui (1994) An Overview of speaker recognition technology Proceedings of the ESCA Workshop on Automatic Speaker Recognition Identification Verification, Martigny, 1–9.

    Google Scholar 

  6. D. Genoud, Frédéric Bimbot, G. Gravier, G. Chollet (1996) Combining Methods to Improve Speaker Verification Decision Proceedings of the International Conference on Speech and Language Processing, Philadelphia.

    Google Scholar 

  7. P. Jourlin, Marc El-Bèze and Henri Méloni (1995) Bimodal Speech Recognition Proceedings of the International Workshop on Automatic Face and Gesture Recognition, Zurich, 320–325.

    Google Scholar 

  8. P. Jourlin (1996) Handling Disynchronization Phenomena with HMM in Connected Speech Proceedings of European Signal Processing Conference, Trieste, 1:133–136.

    Google Scholar 

  9. G. Chollet, J. L. Cochard, A. Constantinescu and P. Langlais (1995) Swiss French Polyphone and Polyvar: Telephone. Speech Databases to Study Intra and Inter Speaker Variability Technical Report, IDIAP, Martigny.

    Google Scholar 

  10. J. Luettin, N. A. Thacker and S. W. Beet (1996) Locating and Tracking Facial Speech Features Proceedings of the International Conference on Pattern Recognition, Vienna.

    Google Scholar 

  11. J. Luettin, Neil A. Thacker and Steve Beet (1996) Speaker Identification by Lipreading Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 62–65.

    Google Scholar 

  12. J. Luettin N. A. Thacker and S. W. Beet (1996) Speechreading using shape and intensity information Proceedings of the International Conference on Spoken Language Processing, Philadelphia, PA, USA, vol. 1, 58–61.

    Google Scholar 

  13. M. W. Mak and W. G. Allen (1994) Lip-Motion analysis for speech segmentation in noise Speech Communication, vol. 14, no. 3, 279–296.

    Google Scholar 

  14. E. D. Petajan (1984) Automatic, Lipreading to Enhance Speech Recognition Proceedings of the Global Communications Conference, IEEE Communication Society, Atlanta, Georgia, 265–272.

    Google Scholar 

  15. S. Pigeon and L. Vandendorpe (1997) The M2VTS Multimodal Face Database (Release 1.00) Proceedings of the First International Conference on Audio-and Video-based Biometric Person Authentication, Crans-Montana, Switzerland.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jourlin, P., Luettin, J., Genoud, D., Wassner, H. (1997). Acoustic-labial speaker verification. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016011

Download citation

  • DOI: https://doi.org/10.1007/BFb0016011

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62660-2

  • Online ISBN: 978-3-540-68425-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics