Skip to main content

Research on Audio-Visual Asynchronous Correlation for Speaker Identification Based on DBN

  • Conference paper
Future Intelligent Information Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 86))

  • 1947 Accesses

Abstract

In noisy or other adverse conditions, consistently high speaker identification accuracy is difficult to attain via speech signal, hence visual component which can complement audio information is of particular interest. In this paper, we capture the asynchronous correlation instead of tight synchrony between audio and visual modalities. Furthermore, the apparent asynchrony between the two modalities is effectively modeled based on Dynamic Bayesian Network (DBN) with asynchronous articulatory feature in three ways: (1) there are three hidden state variables, each representing one articulatory feature, (2) the degree of asynchrony among articulatory features is controlled by probability distribution, (3) the audio and video observations depend on all three hidden state variables. Then a multi-level hybrid fusion is explored to combine model-level and decision-level fusion. The experiment results for audio-visual bimodal corpus show that the effectiveness of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Neti, C., Potamianos, G.: Audio visual speech recognition. In: Final report: JHU 2000 Summer Workshop (2000)

    Google Scholar 

  2. Chu, S.M., Huang, T.S.: Multi-model sensory fusion with application to audio-visual speech recognition. In: Proceedings of European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark (2001)

    Google Scholar 

  3. Browman, C.P., Goldstein, L.: Articulatory phonology: An overview. Phonetica 49, 155–180 (1992)

    Article  Google Scholar 

  4. Livescu, K., Cetin, O.: Articulatory Feature-based methods for acoustic and audio-visual speech recognition. In: Final report: JHU 2006 Summer Workshop (2006)

    Google Scholar 

  5. Zhang, Y., Diao, Q.: DBN based multi-stream models for speech. In: Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hong Kong, China, pp. 836–839 (2003)

    Google Scholar 

  6. Chen, T.: Audiovisual speech processing. IEEE Transactions on Signal Processing 18(1), 9–21 (2001)

    Article  MATH  Google Scholar 

  7. Bilmes, J., Zweig, G.: The graphical models toolkit: An open source software system for speech and time-series processing. In: Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florida, USA, pp. 3916–3919 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Y. (2011). Research on Audio-Visual Asynchronous Correlation for Speaker Identification Based on DBN. In: Zeng, D. (eds) Future Intelligent Information Systems. Lecture Notes in Electrical Engineering, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19706-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19706-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19705-5

  • Online ISBN: 978-3-642-19706-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics