Skip to main content
Log in

Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

Automatic speaker verification systems have achieved great performance and been widely adopted in many security applications. One of the important requirements for the verification system is its resilience to spoofing attacks, such as impersonation, replay, speech synthesis and voice conversion. Among these attacks, speech synthesis has a high risk to the verification systems. In this paper, a novel detection method for computer-generated speech, especially for HMM synthetic speech, is proposed. It is found that the wavelet coefficients in specified position show the obvious difference between the synthetic and natural speech. The logarithmic spectrum features are extracted from the wavelet coefficients and support vector machine is used as the classifier to evaluate the performance of our proposed algorithm. The experimental results over SAS corpus show that the proposed algorithm can achieve high detection accuracy and low equal error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

REFERENCES

  1. Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., 2010, vol. 52, no. 2, pp. 12–40.

    Article  Google Scholar 

  2. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., Spoofing and countermeasures for speaker verification: A survey, Speech Commun., 2015, vol. 66, pp. 130–153.

    Article  Google Scholar 

  3. Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J., Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 1, pp. 66–83.

    Article  Google Scholar 

  4. Evans, N., Kinnunen, T., and Yamagishi, J., Spoofing and countermeasures for automatic speaker verification, Proceedings of Annual Conference of the International Speech Communication Association, 2013, pp. 925–929.

  5. Alegre, F., Vipperla, R., Evans, N., and Fauve, B., On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, Proceedings of European Signal Processing Conference, 2012, pp. 36–40.

  6. Satoh, T., Masuko, T., Kobayashi, T., and Tokuda, K., A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proceedings of European Conference on Speech Communication and Technology, 2001, pp. 759–762.

  7. Chen, L.W., Guo, W.L., and Dai, R., Speaker verification against synthetic speech, Proceedings of 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 309–312.

  8. De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., and Saratxaga, I., Evaluation of speaker verification security and detection of HMM-based synthetic speech, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 8, pp. 2280–2290.

    Article  Google Scholar 

  9. Wu, Z., Chng, E.S., and Li, H., Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 1700–1703.

  10. Ogihara, A., Unno, H., and Shiozakai, A., Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2005, vol. 88, no. 1, pp. 280–286.

    Article  Google Scholar 

  11. De Leon, P.L., Stewart, B., and Yamagishi, J., Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 370–373.

  12. Daubechies, I., The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inf. Theory, 1990, vol. 36, no. 5, pp. 961–1005.

    Article  MathSciNet  MATH  Google Scholar 

  13. Wu, Z., De Leon, P.L., Demiroglu, C., and Khodabakhsh, A., Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance, IEEE Trans. Audio Speech Lang. Process., 2016, vol. 24, no. 4, pp. 768–783.

    Article  Google Scholar 

  14. Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., SAS: A speaker verification spoofing database containing diverse attacks, Proceedings of International Conference on Acoustics, Speech and Signal Proceeding, 2015, pp. 4440–4444.

  15. Zen, H., Tokuda, K., and Black, A.W., Statistical parametric speech synthesis, Speech Commun., 2009, vol. 51, no. 11, pp. 1039–1064.

    Article  Google Scholar 

  16. Chang, C.C. and Lin, C.J., LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2011, vol. 2, no. 3, pp. 1–27.

    Article  Google Scholar 

Download references

ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China (grant nos. 61300055, U1736215, 61672302), Zhejiang Natural Science Foundation (grant nos. LY17F020010, LZ15F020002), Ningbo Natural Science Foundation (grant no. 2017A610123), Ningbo University Fund (grant no. XKXL1503) and K.C. Wong Magna Fund in Ningbo University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diqun Yan.

Additional information

The article is published in the original.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diqun Yan, Xiang, L., Wang, Z. et al. Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum. Aut. Control Comp. Sci. 53, 72–79 (2019). https://doi.org/10.3103/S014641161901005X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S014641161901005X

Keywords:

Navigation