I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition

Kang, Woo Hyun; Cho, Won Ik; Jang, Se Young; Lee, Hyeon Seung; Kim, Nam Soo

doi:10.1007/978-981-10-6451-7_10

Woo Hyun Kang³²,
Won Ik Cho³²,
Se Young Jang³²,
Hyeon Seung Lee³² &
…
Nam Soo Kim³²

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 449))

1322 Accesses

Abstract

This paper presents a novel scheme for considering the frame-level speaker relevancy during i-vector extraction for speaker recognition. In the proposed system, the frame-level point-wise mutual information is utilized to directly modify the Baum-Welch statistics in order to extract a robust i-vector. Furthermore, a method for computing the frame-level speaker relevancy using deep neural network (DNN) analogous to the DNN used in robust automatic speech recognition (ASR) is proposed. The results show that the modified i-vectors obtained using the proposed methods outperformed the conventional i-vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dehak, N.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
The NIST year 2008 speaker recognition evaluation plan (2008). http://www.itl.nist.gov/iad/mig//tests/sre/2008/
Saeidi, R., Alku, P.: Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation, In: Proceedings INTERSPEECH, pp. 3546–3550 (2015)
Google Scholar
Hasan, T., et al.: Duration mismatch compensation for i-vector based speaker recognition systems. In: Proceedings of IEEE International Conference Acoustic, Speech, and Signal Process, pp. 7663–7667 (2013)
Google Scholar
Jung, C.S.: Selecting feature frames for automatic speaker recognition using mutual information. IEEE Trans. Audio Speech Lang. Process. 18(6), 1332–1340 (2010)
Article Google Scholar
Mandasari, M.I. et al.: Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of INTERSPEECH, pp. 21–24 (2011)
Google Scholar
Mandasari, M.I.: Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Trans. Audio Speech Lang. Process. 21(11), 2425–2438 (2013)
Article Google Scholar
Hansen, J., Hasan, T.: Speaker recognition by machines and humans. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Article Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
The NIST year 2010 speaker recognition evaluation plan (2010). http://www.itl.nist.gov/iad/mig//tests/sre/2010/
Dehak, N. et al.: Language recognition via ivectors and dimensionality reduction, In: Proceedings of INTERSPEECH, pp. 857–860 (2011)
Google Scholar
Kenny, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Audio, Speech, Lang. Process. 13(3), 345–354 (2005)
Article Google Scholar
Dehak, N., et al.: Support vector machines and joint factor analysis for speaker verification. In: Proceedings of IEEE International Conference on Acoust, Speech, and Signal Process, pp. 4237–4240 (2009)
Google Scholar
Hoang, H.H., et al.: A re-examination of lexical association measures, In: Proceedings of Workshop on Multiword Expressions Identification, Interpretation, Disambiguation and Application, pp. 31–39 (2009)
Google Scholar
Bouma, G.: Normalized (pointwise) mutual information in collocation extraction, In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2009)
Google Scholar
Kenny, P.: A small footprint i-vector extractor. In: Proceedings of Odyssey, pp. 1–25 (2012)
Google Scholar
Salakhutdinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2(1), 361–385 (2015)
Article Google Scholar
Garcia-Romero, R., McCree, A.: Insights into deep neural networks for speaker recognition. In: Proceedings of INTERSPEECH, pp. 1141–1145 (2015)
Google Scholar
Leonard, R.G.: A database for speaker-independent digit recognition. In: Proceedings of IEEE International Conference on Acoust, Speech, and Signal Process, pp. 328–331 (1984)
Google Scholar
Gravier, G.: SPro: speech signal processing toolkit. Software. http://gforge.inria.fr/projects/spro
Lopes, C., Perdigao, F.: Phone recognition on the TIMIT database. Speech Technol. 1, 285–302 (2011)
Google Scholar
Sadjadi, S.O., et al.: MSR identity toolbox v1.0: a MATLAB toolbox for speaker recognition research. Speech Lang. Process. Tech. Comm. Newsl. 1(4), (2013)
Google Scholar
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This research was supported by Projects for Research and Development of Police science and Technology under Center for Research and Development of Police science and Technology and Korean National Police Agency funded by the Ministry of Science, ICT and Future Planning (PA-J000001-2017-101), and by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (NRF-2015R1A2A1A15054343).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering and INMC, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea
Woo Hyun Kang, Won Ik Cho, Se Young Jang, Hyeon Seung Lee & Nam Soo Kim

Authors

Woo Hyun Kang
View author publications
You can also search for this author in PubMed Google Scholar
Won Ik Cho
View author publications
You can also search for this author in PubMed Google Scholar
Se Young Jang
View author publications
You can also search for this author in PubMed Google Scholar
Hyeon Seung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Nam Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nam Soo Kim .

Editor information

Editors and Affiliations

iCatse, B-3001, Intellige 2, Kyonggi University, Seongnam-si, Kyonggi-do, Korea (Republic of)
Kuinam J. Kim
Computer Science, Namseoul University Computer Science, Cheonan , Ch´ungch´ong-namdo, Korea (Republic of)
Hyuncheol Kim
School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea (Republic of)
Nakhoon Baek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, W.H., Cho, W.I., Jang, S.Y., Lee, H.S., Kim, N.S. (2018). I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition. In: Kim, K., Kim, H., Baek, N. (eds) IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, vol 449. Springer, Singapore. https://doi.org/10.1007/978-981-10-6451-7_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-6451-7_10
Published: 31 August 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6450-0
Online ISBN: 978-981-10-6451-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics