An Improvement of the Degradation of Speaker Recognition in Continuous Cold Speech for Home Assistant

Ai, Haojun; Wang, Yifeng; Yang, Yuhong; Zhang, Quanxin

doi:10.1007/978-3-030-37337-5_29

Haojun Ai^11,12,
Yifeng Wang¹¹,
Yuhong Yang¹³ &
…
Quanxin Zhang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11982))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

1057 Accesses
3 Citations

Abstract

Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Article Google Scholar
Marchi, E., et al.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5324–5328 (2018)
Google Scholar
Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4944–4947 (2011)
Google Scholar
Matveev, Y.: The problem of voice template aging in speaker recognition systems. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 345–353. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01931-4_46
Chapter Google Scholar
Przybocki, M.A., Martin, A.F., Le, A.N.: NIST speaker recognition evaluations utilizing the mixer corpora—2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)
Article Google Scholar
Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Seiderer, A., Andr, E.: Infected phonemes: how a cold impairs speech on a phonetic level. Interspeech 2017, 3457–3461 (2017)
Article Google Scholar
Tull, R.G., Rutledge, J.C., Larson, C.R.: Cepstral analysis of “cold-speech” for speaker recognition: a second look. J. Acoust. Soc. Am. 100(4), 2760 (1996)
Article Google Scholar
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)
Google Scholar
Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)
Article Google Scholar
Henrquez, P., Alonso, J.B., Ferrer, M.A., Travieso, C.M., Godino-Llorente, J.I., Daz-de-Mara, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)
Article Google Scholar
Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)
Article Google Scholar
Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: ICSLP (1998)
Google Scholar
Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th International Conference on Digital Signal Processing, pp. 1260–1265 (2009)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Mesallam, T.A.: Vocal fold disorder detection based on continuous speech by using MFCC and GMM. In: 2013 7th IEEE GCC Conference and Exhibition (GCC), pp. 292–297 (2013)
Google Scholar
Teixeira, J.P., Oliveira, C., Lopes, C.: Vocal acoustic analysis – jitter, shimmer and HNR parameters. Procedia Technol. 9, 1112–1122 (2013)
Article Google Scholar
Sabir, B., Rouda, F., Khazri, Y., Touri, B., Moussetad, M.: Improved algorithm for pathological and normal voices identification. Int. J. Electr. Comput. Eng. 7(1), 238–243 (2017)
Google Scholar
Godino-Llorente, J., Gomez-Vilda, P.: Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans. Biomed. Eng. 51(2), 380–384 (2004)
Article Google Scholar
Gelzinis, A., Verikas, A., Bacauskiene, M.: Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91(1), 36–47 (2008)
Article Google Scholar
Dibazar, A.A, Berger, T.W., Narayanan, S.S.: Pathological Voice Assessment. International Conference of the IEEE Engineering in Medicine & Biology Society (2006)
Google Scholar
Costa, S.C., Neto, B.G.A., and Fechine, J.M.: Pathological voice discrimination using cepstral analysis, vector quantization and hidden markov models. In 2008 8th IEEE International Conference on BioInformatics and BioEngineering, pp. 1–5(2008)
Google Scholar

Download references

Acknowledgment

This work is partially supported by The National Key Research and Development Program of China (2016YFB0502201) and the National Natural Science Foundation of China (General Program), Grant No. 61971316.

Author information

Authors and Affiliations

School of Cyber Science and Engineering, Wuhan University, Hubei, China
Haojun Ai & Yifeng Wang
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan, China
Haojun Ai
National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Hubei, China
Yuhong Yang
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, People’s Republic of China
Quanxin Zhang

Authors

Haojun Ai
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Quanxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuhong Yang .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Beihang University, Beijing, China
Xiao Zhang
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ai, H., Wang, Y., Yang, Y., Zhang, Q. (2019). An Improvement of the Degradation of Speaker Recognition in Continuous Cold Speech for Home Assistant. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11982. Springer, Cham. https://doi.org/10.1007/978-3-030-37337-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-37337-5_29
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37336-8
Online ISBN: 978-3-030-37337-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics