Skip to main content

I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition

  • Conference paper
  • First Online:
IT Convergence and Security 2017

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 449))

  • 1322 Accesses

Abstract

This paper presents a novel scheme for considering the frame-level speaker relevancy during i-vector extraction for speaker recognition. In the proposed system, the frame-level point-wise mutual information is utilized to directly modify the Baum-Welch statistics in order to extract a robust i-vector. Furthermore, a method for computing the frame-level speaker relevancy using deep neural network (DNN) analogous to the DNN used in robust automatic speech recognition (ASR) is proposed. The results show that the modified i-vectors obtained using the proposed methods outperformed the conventional i-vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dehak, N.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  2. The NIST year 2008 speaker recognition evaluation plan (2008). http://www.itl.nist.gov/iad/mig//tests/sre/2008/

  3. Saeidi, R., Alku, P.: Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation, In: Proceedings INTERSPEECH, pp. 3546–3550 (2015)

    Google Scholar 

  4. Hasan, T., et al.: Duration mismatch compensation for i-vector based speaker recognition systems. In: Proceedings of IEEE International Conference Acoustic, Speech, and Signal Process, pp. 7663–7667 (2013)

    Google Scholar 

  5. Jung, C.S.: Selecting feature frames for automatic speaker recognition using mutual information. IEEE Trans. Audio Speech Lang. Process. 18(6), 1332–1340 (2010)

    Article  Google Scholar 

  6. Mandasari, M.I. et al.: Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of INTERSPEECH, pp. 21–24 (2011)

    Google Scholar 

  7. Mandasari, M.I.: Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Trans. Audio Speech Lang. Process. 21(11), 2425–2438 (2013)

    Article  Google Scholar 

  8. Hansen, J., Hasan, T.: Speaker recognition by machines and humans. IEEE Signal Process. Mag. 32(6), 74–99 (2015)

    Article  Google Scholar 

  9. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  10. The NIST year 2010 speaker recognition evaluation plan (2010). http://www.itl.nist.gov/iad/mig//tests/sre/2010/

  11. Dehak, N. et al.: Language recognition via ivectors and dimensionality reduction, In: Proceedings of INTERSPEECH, pp. 857–860 (2011)

    Google Scholar 

  12. Kenny, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Audio, Speech, Lang. Process. 13(3), 345–354 (2005)

    Article  Google Scholar 

  13. Dehak, N., et al.: Support vector machines and joint factor analysis for speaker verification. In: Proceedings of IEEE International Conference on Acoust, Speech, and Signal Process, pp. 4237–4240 (2009)

    Google Scholar 

  14. Hoang, H.H., et al.: A re-examination of lexical association measures, In: Proceedings of Workshop on Multiword Expressions Identification, Interpretation, Disambiguation and Application, pp. 31–39 (2009)

    Google Scholar 

  15. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction, In: Proceedings of the Biennial GSCL Conference, pp. 31–40 (2009)

    Google Scholar 

  16. Kenny, P.: A small footprint i-vector extractor. In: Proceedings of Odyssey, pp. 1–25 (2012)

    Google Scholar 

  17. Salakhutdinov, R.: Learning deep generative models. Ann. Rev. Stat. Appl. 2(1), 361–385 (2015)

    Article  Google Scholar 

  18. Garcia-Romero, R., McCree, A.: Insights into deep neural networks for speaker recognition. In: Proceedings of INTERSPEECH, pp. 1141–1145 (2015)

    Google Scholar 

  19. Leonard, R.G.: A database for speaker-independent digit recognition. In: Proceedings of IEEE International Conference on Acoust, Speech, and Signal Process, pp. 328–331 (1984)

    Google Scholar 

  20. Gravier, G.: SPro: speech signal processing toolkit. Software. http://gforge.inria.fr/projects/spro

  21. Lopes, C., Perdigao, F.: Phone recognition on the TIMIT database. Speech Technol. 1, 285–302 (2011)

    Google Scholar 

  22. Sadjadi, S.O., et al.: MSR identity toolbox v1.0: a MATLAB toolbox for speaker recognition research. Speech Lang. Process. Tech. Comm. Newsl. 1(4), (2013)

    Google Scholar 

  23. Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research was supported by Projects for Research and Development of Police science and Technology under Center for Research and Development of Police science and Technology and Korean National Police Agency funded by the Ministry of Science, ICT and Future Planning (PA-J000001-2017-101), and by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (NRF-2015R1A2A1A15054343).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nam Soo Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Kang, W.H., Cho, W.I., Jang, S.Y., Lee, H.S., Kim, N.S. (2018). I-Vector Extraction Using Speaker Relevancy for Short Duration Speaker Recognition. In: Kim, K., Kim, H., Baek, N. (eds) IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, vol 449. Springer, Singapore. https://doi.org/10.1007/978-981-10-6451-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6451-7_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6450-0

  • Online ISBN: 978-981-10-6451-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics