Abstract
Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)
Marchi, E., et al.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5324–5328 (2018)
Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4944–4947 (2011)
Matveev, Y.: The problem of voice template aging in speaker recognition systems. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 345–353. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01931-4_46
Przybocki, M.A., Martin, A.F., Le, A.N.: NIST speaker recognition evaluations utilizing the mixer corpora—2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)
Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Seiderer, A., Andr, E.: Infected phonemes: how a cold impairs speech on a phonetic level. Interspeech 2017, 3457–3461 (2017)
Tull, R.G., Rutledge, J.C., Larson, C.R.: Cepstral analysis of “cold-speech” for speaker recognition: a second look. J. Acoust. Soc. Am. 100(4), 2760 (1996)
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2016)
Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)
Henrquez, P., Alonso, J.B., Ferrer, M.A., Travieso, C.M., Godino-Llorente, J.I., Daz-de-Mara, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)
Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)
Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: ICSLP (1998)
Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th International Conference on Digital Signal Processing, pp. 1260–1265 (2009)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Mesallam, T.A.: Vocal fold disorder detection based on continuous speech by using MFCC and GMM. In: 2013 7th IEEE GCC Conference and Exhibition (GCC), pp. 292–297 (2013)
Teixeira, J.P., Oliveira, C., Lopes, C.: Vocal acoustic analysis – jitter, shimmer and HNR parameters. Procedia Technol. 9, 1112–1122 (2013)
Sabir, B., Rouda, F., Khazri, Y., Touri, B., Moussetad, M.: Improved algorithm for pathological and normal voices identification. Int. J. Electr. Comput. Eng. 7(1), 238–243 (2017)
Godino-Llorente, J., Gomez-Vilda, P.: Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans. Biomed. Eng. 51(2), 380–384 (2004)
Gelzinis, A., Verikas, A., Bacauskiene, M.: Automated speech analysis applied to laryngeal disease categorization. Comput. Methods Programs Biomed. 91(1), 36–47 (2008)
Dibazar, A.A, Berger, T.W., Narayanan, S.S.: Pathological Voice Assessment. International Conference of the IEEE Engineering in Medicine & Biology Society (2006)
Costa, S.C., Neto, B.G.A., and Fechine, J.M.: Pathological voice discrimination using cepstral analysis, vector quantization and hidden markov models. In 2008 8th IEEE International Conference on BioInformatics and BioEngineering, pp. 1–5(2008)
Acknowledgment
This work is partially supported by The National Key Research and Development Program of China (2016YFB0502201) and the National Natural Science Foundation of China (General Program), Grant No. 61971316.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ai, H., Wang, Y., Yang, Y., Zhang, Q. (2019). An Improvement of the Degradation of Speaker Recognition in Continuous Cold Speech for Home Assistant. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11982. Springer, Cham. https://doi.org/10.1007/978-3-030-37337-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-37337-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37336-8
Online ISBN: 978-3-030-37337-5
eBook Packages: Computer ScienceComputer Science (R0)