Skip to main content

An Improvement of the Degradation of Speaker Recognition in Continuous Cold Speech for Home Assistant

  • Conference paper
  • First Online:
Cyberspace Safety and Security (CSS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11982))

Included in the following conference series:

Abstract

Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)

    Article  Google Scholar 

  2. Marchi, E., et al.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5324–5328 (2018)

    Google Scholar 

  3. Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4944–4947 (2011)

    Google Scholar 

  4. Matveev, Y.: The problem of voice template aging in speaker recognition systems. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 345–353. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01931-4_46

    Chapter  Google Scholar 

  5. Przybocki, M.A., Martin, A.F., Le, A.N.: NIST speaker recognition evaluations utilizing the mixer corpora—2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)

    Article  Google Scholar 

  6. Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Seiderer, A., Andr, E.: Infected phonemes: how a cold impairs speech on a phonetic level. Interspeech 2017, 3457–3461 (2017)

    Article  Google Scholar 

  7. Tull, R.G., Rutledge, J.C., Larson, C.R.: Cepstral analysis of “cold-speech” for speaker recognition: a second look. J. Acoust. Soc. Am. 100(4), 2760 (1996)

    Article  Google Scholar 

  8. Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)

    Google Scholar 

  9. Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)

    Article  Google Scholar 

  10. Henrquez, P., Alonso, J.B., Ferrer, M.A., Travieso, C.M., Godino-Llorente, J.I., Daz-de-Mara, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)

    Article  Google Scholar 

  11. Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)

    Article  Google Scholar 

  12. Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: ICSLP (1998)

    Google Scholar 

  13. Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th International Conference on Digital Signal Processing, pp. 1260–1265 (2009)

    Google Scholar 

  14. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  15. Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Mesallam, T.A.: Vocal fold disorder detection based on continuous speech by using MFCC and GMM. In: 2013 7th IEEE GCC Conference and Exhibition (GCC), pp. 292–297 (2013)

    Google Scholar 

  16. Teixeira, J.P., Oliveira, C., Lopes, C.: Vocal acoustic analysis – jitter, shimmer and HNR parameters. Procedia Technol. 9, 1112–1122 (2013)

    Article  Google Scholar 

  17. Sabir, B., Rouda, F., Khazri, Y., Touri, B., Moussetad, M.: Improved algorithm for pathological and normal voices identification. Int. J. Electr. Comput. Eng. 7(1), 238–243 (2017)

    Google Scholar 

  18. Godino-Llorente, J., Gomez-Vilda, P.: Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans. Biomed. Eng. 51(2), 380–384 (2004)

    Article  Google Scholar 

  19. Gelzinis, A., Verikas, A., Bacauskiene, M.: Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91(1), 36–47 (2008)

    Article  Google Scholar 

  20. Dibazar, A.A, Berger, T.W., Narayanan, S.S.: Pathological Voice Assessment. International Conference of the IEEE Engineering in Medicine & Biology Society (2006)

    Google Scholar 

  21. Costa, S.C., Neto, B.G.A., and Fechine, J.M.: Pathological voice discrimination using cepstral analysis, vector quantization and hidden markov models. In 2008 8th IEEE International Conference on BioInformatics and BioEngineering, pp. 1–5(2008)

    Google Scholar 

Download references

Acknowledgment

This work is partially supported by The National Key Research and Development Program of China (2016YFB0502201) and the National Natural Science Foundation of China (General Program), Grant No. 61971316.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhong Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ai, H., Wang, Y., Yang, Y., Zhang, Q. (2019). An Improvement of the Degradation of Speaker Recognition in Continuous Cold Speech for Home Assistant. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11982. Springer, Cham. https://doi.org/10.1007/978-3-030-37337-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37337-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37336-8

  • Online ISBN: 978-3-030-37337-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics