Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space

Ben Kheder, Waad; Matrouf , Driss; Bousquet, Pierre-Michel; Bonastre, Jean-François; Ajili, Moez

doi:10.1007/978-3-319-11397-5_7

Waad Ben Kheder⁷,
Driss Matrouf ⁷,
Pierre-Michel Bousquet⁷,
Jean-François Bonastre⁷ &
…
Moez Ajili⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1026 Accesses
4 Citations

Abstract

In the last few years, the use of i-vectors along with a generative back-end has become the new standard in speaker recognition. An i-vector is a compact representation of a speaker utterance extracted from a low dimensional total variability subspace. Although current speaker recognition systems achieve very good results in clean training and test conditions, the performance degrades considerably in noisy environments. The compensation of the noise effect is actually a research subject of major importance. As far as we know, there was no serious attempt to treat the noise problem directly in the i-vectors space without relying on data distributions computed on a prior domain. This paper proposes a full-covariance Gaussian modeling of the clean i-vectors and noise distributions in the i-vectors space then introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using MAP approach. Based on NIST data, we show that it is possible to improve up to 60 % the baseline system performances. A noise adding tool is used to help simulate a real-world noisy environment at different signal-to-noise ratio levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: Hmm adaptation using vector taylor series for noisy speech recognition. In: INTERSPEECH, pp. 869–872 (2000)
Google Scholar
The NIST year 2008 speaker recognition evaluation plan (2008). http://www.itl.nist.gov/iad/mig/tests/sre/2008/sre08_evalplan_release4.pdf. Accessed 15 May 2014
Brümmer, N., De Villiers, E.: The speaker partitioning problem. In: Odyssey, p. 34 (2010)
Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech, pp. 249–252 (2011)
Google Scholar
Hirsch, H.G.: FaNT - Filtering and Noise Adding Tool. http://dnt.kr.hsnr.de/download.html. Accessed 15 May 2014
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005)
Google Scholar
Lei, Y., Burget, L., Scheffer, N.: A noise robust i-vector extractor using vector taylor series for speaker recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6788–6791. IEEE (2013)
Google Scholar
Lei, Y., McLaren, M., Ferrer, L., Scheffer, N.: Simplified vts-based i-vector extraction in noise-robust speaker recognition. Submitted to ICASSP, Florence, Italy (2014)
Google Scholar
Martınez, D., Burget, L., Stafylakis, T., Lei, Y., Kenny, P., Lleida, E.: Unscented transform for ivector-based noisy speaker recognition. Submitted to ICASSP, Florence, Italy (2014)
Google Scholar
Matrouf, D., Scheffer, N., Fauve, B.G., Bonastre, J.F.: A straightforward and efficient implementation of the factor analysis model for speaker verification. In: INTERSPEECH, pp. 1242–1245 (2007)
Google Scholar
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Speaker Odyssey, Crete, Greece (2001)
Google Scholar
Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

LIA, University of Avignon, Avignon, France
Waad Ben Kheder, Driss Matrouf , Pierre-Michel Bousquet, Jean-François Bonastre & Moez Ajili

Authors

Waad Ben Kheder
View author publications
You can also search for this author in PubMed Google Scholar
Driss Matrouf
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Michel Bousquet
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Bonastre
View author publications
You can also search for this author in PubMed Google Scholar
Moez Ajili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Waad Ben Kheder .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Kheder, W., Matrouf , D., Bousquet, PM., Bonastre, JF., Ajili, M. (2014). Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_7
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics