Research on Audio-Visual Asynchronous Correlation for Speaker Identification Based on DBN

Chen, Yanxiang

doi:10.1007/978-3-642-19706-2_3

Yanxiang Chen²

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 86))

1947 Accesses

Abstract

In noisy or other adverse conditions, consistently high speaker identification accuracy is difficult to attain via speech signal, hence visual component which can complement audio information is of particular interest. In this paper, we capture the asynchronous correlation instead of tight synchrony between audio and visual modalities. Furthermore, the apparent asynchrony between the two modalities is effectively modeled based on Dynamic Bayesian Network (DBN) with asynchronous articulatory feature in three ways: (1) there are three hidden state variables, each representing one articulatory feature, (2) the degree of asynchrony among articulatory features is controlled by probability distribution, (3) the audio and video observations depend on all three hidden state variables. Then a multi-level hybrid fusion is explored to combine model-level and decision-level fusion. The experiment results for audio-visual bimodal corpus show that the effectiveness of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Neti, C., Potamianos, G.: Audio visual speech recognition. In: Final report: JHU 2000 Summer Workshop (2000)
Google Scholar
Chu, S.M., Huang, T.S.: Multi-model sensory fusion with application to audio-visual speech recognition. In: Proceedings of European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark (2001)
Google Scholar
Browman, C.P., Goldstein, L.: Articulatory phonology: An overview. Phonetica 49, 155–180 (1992)
Article Google Scholar
Livescu, K., Cetin, O.: Articulatory Feature-based methods for acoustic and audio-visual speech recognition. In: Final report: JHU 2006 Summer Workshop (2006)
Google Scholar
Zhang, Y., Diao, Q.: DBN based multi-stream models for speech. In: Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hong Kong, China, pp. 836–839 (2003)
Google Scholar
Chen, T.: Audiovisual speech processing. IEEE Transactions on Signal Processing 18(1), 9–21 (2001)
Article MATH Google Scholar
Bilmes, J., Zweig, G.: The graphical models toolkit: An open source software system for speech and time-series processing. In: Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florida, USA, pp. 3916–3919 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Information, Hefei University of Technology, Hefei, Anhui, 230009, China
Yanxiang Chen

Authors

Yanxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen University, Nanhai Ave 3688, 518060, Guangdong, Shenzhen, P.R. China
Dehuai Zeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y. (2011). Research on Audio-Visual Asynchronous Correlation for Speaker Identification Based on DBN. In: Zeng, D. (eds) Future Intelligent Information Systems. Lecture Notes in Electrical Engineering, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19706-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-19706-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19705-5
Online ISBN: 978-3-642-19706-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics