Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum
Automatic speaker verification systems have achieved great performance and been widely adopted in many security applications. One of the important requirements for the verification system is its resilience to spoofing attacks, such as impersonation, replay, speech synthesis and voice conversion. Among these attacks, speech synthesis has a high risk to the verification systems. In this paper, a novel detection method for computer-generated speech, especially for HMM synthetic speech, is proposed. It is found that the wavelet coefficients in specified position show the obvious difference between the synthetic and natural speech. The logarithmic spectrum features are extracted from the wavelet coefficients and support vector machine is used as the classifier to evaluate the performance of our proposed algorithm. The experimental results over SAS corpus show that the proposed algorithm can achieve high detection accuracy and low equal error rate.
Keywords:speech synthesis spoofing attack wavelet transform classification
This work was supported by the National Natural Science Foundation of China (grant nos. 61300055, U1736215, 61672302), Zhejiang Natural Science Foundation (grant nos. LY17F020010, LZ15F020002), Ningbo Natural Science Foundation (grant no. 2017A610123), Ningbo University Fund (grant no. XKXL1503) and K.C. Wong Magna Fund in Ningbo University.
- 4.Evans, N., Kinnunen, T., and Yamagishi, J., Spoofing and countermeasures for automatic speaker verification, Proceedings of Annual Conference of the International Speech Communication Association, 2013, pp. 925–929.Google Scholar
- 5.Alegre, F., Vipperla, R., Evans, N., and Fauve, B., On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, Proceedings of European Signal Processing Conference, 2012, pp. 36–40.Google Scholar
- 6.Satoh, T., Masuko, T., Kobayashi, T., and Tokuda, K., A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proceedings of European Conference on Speech Communication and Technology, 2001, pp. 759–762.Google Scholar
- 7.Chen, L.W., Guo, W.L., and Dai, R., Speaker verification against synthetic speech, Proceedings of 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 309–312.Google Scholar
- 9.Wu, Z., Chng, E.S., and Li, H., Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 1700–1703.Google Scholar
- 11.De Leon, P.L., Stewart, B., and Yamagishi, J., Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 370–373.Google Scholar
- 14.Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., SAS: A speaker verification spoofing database containing diverse attacks, Proceedings of International Conference on Acoustics, Speech and Signal Proceeding, 2015, pp. 4440–4444.Google Scholar