Abstract
In the last few years, the use of i-vectors along with a generative back-end has become the new standard in speaker recognition. An i-vector is a compact representation of a speaker utterance extracted from a low dimensional total variability subspace. Although current speaker recognition systems achieve very good results in clean training and test conditions, the performance degrades considerably in noisy environments. The compensation of the noise effect is actually a research subject of major importance. As far as we know, there was no serious attempt to treat the noise problem directly in the i-vectors space without relying on data distributions computed on a prior domain. This paper proposes a full-covariance Gaussian modeling of the clean i-vectors and noise distributions in the i-vectors space then introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using MAP approach. Based on NIST data, we show that it is possible to improve up to 60 % the baseline system performances. A noise adding tool is used to help simulate a real-world noisy environment at different signal-to-noise ratio levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: Hmm adaptation using vector taylor series for noisy speech recognition. In: INTERSPEECH, pp. 869–872 (2000)
The NIST year 2008 speaker recognition evaluation plan (2008). http://www.itl.nist.gov/iad/mig/tests/sre/2008/sre08_evalplan_release4.pdf. Accessed 15 May 2014
Brümmer, N., De Villiers, E.: The speaker partitioning problem. In: Odyssey, p. 34 (2010)
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Interspeech, pp. 249–252 (2011)
Hirsch, H.G.: FaNT - Filtering and Noise Adding Tool. http://dnt.kr.hsnr.de/download.html. Accessed 15 May 2014
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal, (Report) CRIM-06/08-13 (2005)
Lei, Y., Burget, L., Scheffer, N.: A noise robust i-vector extractor using vector taylor series for speaker recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6788–6791. IEEE (2013)
Lei, Y., McLaren, M., Ferrer, L., Scheffer, N.: Simplified vts-based i-vector extraction in noise-robust speaker recognition. Submitted to ICASSP, Florence, Italy (2014)
Martınez, D., Burget, L., Stafylakis, T., Lei, Y., Kenny, P., Lleida, E.: Unscented transform for ivector-based noisy speaker recognition. Submitted to ICASSP, Florence, Italy (2014)
Matrouf, D., Scheffer, N., Fauve, B.G., Bonastre, J.F.: A straightforward and efficient implementation of the factor analysis model for speaker verification. In: INTERSPEECH, pp. 1242–1245 (2007)
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Speaker Odyssey, Crete, Greece (2001)
Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ben Kheder, W., Matrouf , D., Bousquet, PM., Bonastre, JF., Ajili, M. (2014). Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-11397-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)