Abstract
Automatic speaker verification systems have achieved great performance and been widely adopted in many security applications. One of the important requirements for the verification system is its resilience to spoofing attacks, such as impersonation, replay, speech synthesis and voice conversion. Among these attacks, speech synthesis has a high risk to the verification systems. In this paper, a novel detection method for computer-generated speech, especially for HMM synthetic speech, is proposed. It is found that the wavelet coefficients in specified position show the obvious difference between the synthetic and natural speech. The logarithmic spectrum features are extracted from the wavelet coefficients and support vector machine is used as the classifier to evaluate the performance of our proposed algorithm. The experimental results over SAS corpus show that the proposed algorithm can achieve high detection accuracy and low equal error rate.
Similar content being viewed by others
REFERENCES
Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., 2010, vol. 52, no. 2, pp. 12–40.
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., Spoofing and countermeasures for speaker verification: A survey, Speech Commun., 2015, vol. 66, pp. 130–153.
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J., Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 1, pp. 66–83.
Evans, N., Kinnunen, T., and Yamagishi, J., Spoofing and countermeasures for automatic speaker verification, Proceedings of Annual Conference of the International Speech Communication Association, 2013, pp. 925–929.
Alegre, F., Vipperla, R., Evans, N., and Fauve, B., On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, Proceedings of European Signal Processing Conference, 2012, pp. 36–40.
Satoh, T., Masuko, T., Kobayashi, T., and Tokuda, K., A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proceedings of European Conference on Speech Communication and Technology, 2001, pp. 759–762.
Chen, L.W., Guo, W.L., and Dai, R., Speaker verification against synthetic speech, Proceedings of 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 309–312.
De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., and Saratxaga, I., Evaluation of speaker verification security and detection of HMM-based synthetic speech, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 8, pp. 2280–2290.
Wu, Z., Chng, E.S., and Li, H., Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 1700–1703.
Ogihara, A., Unno, H., and Shiozakai, A., Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2005, vol. 88, no. 1, pp. 280–286.
De Leon, P.L., Stewart, B., and Yamagishi, J., Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 370–373.
Daubechies, I., The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inf. Theory, 1990, vol. 36, no. 5, pp. 961–1005.
Wu, Z., De Leon, P.L., Demiroglu, C., and Khodabakhsh, A., Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance, IEEE Trans. Audio Speech Lang. Process., 2016, vol. 24, no. 4, pp. 768–783.
Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., SAS: A speaker verification spoofing database containing diverse attacks, Proceedings of International Conference on Acoustics, Speech and Signal Proceeding, 2015, pp. 4440–4444.
Zen, H., Tokuda, K., and Black, A.W., Statistical parametric speech synthesis, Speech Commun., 2009, vol. 51, no. 11, pp. 1039–1064.
Chang, C.C. and Lin, C.J., LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2011, vol. 2, no. 3, pp. 1–27.
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (grant nos. 61300055, U1736215, 61672302), Zhejiang Natural Science Foundation (grant nos. LY17F020010, LZ15F020002), Ningbo Natural Science Foundation (grant no. 2017A610123), Ningbo University Fund (grant no. XKXL1503) and K.C. Wong Magna Fund in Ningbo University.
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
About this article
Cite this article
Diqun Yan, Xiang, L., Wang, Z. et al. Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum. Aut. Control Comp. Sci. 53, 72–79 (2019). https://doi.org/10.3103/S014641161901005X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S014641161901005X