Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum

Diqun Yan; Xiang, Li; Wang, Zhifeng; Wang, Rangding

doi:10.3103/S014641161901005X

Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum

Published: 16 April 2019

Volume 53, pages 72–79, (2019)
Cite this article

Automatic Control and Computer Sciences Aims and scope Submit manuscript

Diqun Yan^1,2,
Li Xiang¹,
Zhifeng Wang¹ &
…
Rangding Wang¹

69 Accesses
1 Citation
Explore all metrics

Abstract

Automatic speaker verification systems have achieved great performance and been widely adopted in many security applications. One of the important requirements for the verification system is its resilience to spoofing attacks, such as impersonation, replay, speech synthesis and voice conversion. Among these attacks, speech synthesis has a high risk to the verification systems. In this paper, a novel detection method for computer-generated speech, especially for HMM synthetic speech, is proposed. It is found that the wavelet coefficients in specified position show the obvious difference between the synthetic and natural speech. The logarithmic spectrum features are extracted from the wavelet coefficients and support vector machine is used as the classifier to evaluate the performance of our proposed algorithm. The experimental results over SAS corpus show that the proposed algorithm can achieve high detection accuracy and low equal error rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Analyzing Multilingual Automatic Speech Recognition Systems Performance

REFERENCES

Kinnunen, T. and Li, H., An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., 2010, vol. 52, no. 2, pp. 12–40.
Article Google Scholar
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., and Li, H., Spoofing and countermeasures for speaker verification: A survey, Speech Commun., 2015, vol. 66, pp. 130–153.
Article Google Scholar
Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., and Isogai, J., Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, IEEE Trans. Audio Speech Lang. Process., 2009, vol. 17, no. 1, pp. 66–83.
Article Google Scholar
Evans, N., Kinnunen, T., and Yamagishi, J., Spoofing and countermeasures for automatic speaker verification, Proceedings of Annual Conference of the International Speech Communication Association, 2013, pp. 925–929.
Alegre, F., Vipperla, R., Evans, N., and Fauve, B., On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals, Proceedings of European Signal Processing Conference, 2012, pp. 36–40.
Satoh, T., Masuko, T., Kobayashi, T., and Tokuda, K., A robust speaker verification system against imposture using an HMM-based speech synthesis system, Proceedings of European Conference on Speech Communication and Technology, 2001, pp. 759–762.
Chen, L.W., Guo, W.L., and Dai, R., Speaker verification against synthetic speech, Proceedings of 7th International Symposium on Chinese Spoken Language Processing, 2010, pp. 309–312.
De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., and Saratxaga, I., Evaluation of speaker verification security and detection of HMM-based synthetic speech, IEEE Trans. Audio Speech Lang. Process., 2012, vol. 20, no. 8, pp. 2280–2290.
Article Google Scholar
Wu, Z., Chng, E.S., and Li, H., Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 1700–1703.
Ogihara, A., Unno, H., and Shiozakai, A., Discrimination method of synthetic speech using pitch frequency against synthetic speech falsification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2005, vol. 88, no. 1, pp. 280–286.
Article Google Scholar
De Leon, P.L., Stewart, B., and Yamagishi, J., Synthetic speech discrimination using pitch pattern statistics derived from image analysis, Proceedings of Annual Conference of the International Speech Communication Association, 2012, pp. 370–373.
Daubechies, I., The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inf. Theory, 1990, vol. 36, no. 5, pp. 961–1005.
Article MathSciNet MATH Google Scholar
Wu, Z., De Leon, P.L., Demiroglu, C., and Khodabakhsh, A., Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance, IEEE Trans. Audio Speech Lang. Process., 2016, vol. 24, no. 4, pp. 768–783.
Article Google Scholar
Wu, Z., Khodabakhsh, A., Demiroglu, C., Yamagishi, J., Saito, D., Toda, T., and King, S., SAS: A speaker verification spoofing database containing diverse attacks, Proceedings of International Conference on Acoustics, Speech and Signal Proceeding, 2015, pp. 4440–4444.
Zen, H., Tokuda, K., and Black, A.W., Statistical parametric speech synthesis, Speech Commun., 2009, vol. 51, no. 11, pp. 1039–1064.
Article Google Scholar
Chang, C.C. and Lin, C.J., LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2011, vol. 2, no. 3, pp. 1–27.
Article Google Scholar

Download references

ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China (grant nos. 61300055, U1736215, 61672302), Zhejiang Natural Science Foundation (grant nos. LY17F020010, LZ15F020002), Ningbo Natural Science Foundation (grant no. 2017A610123), Ningbo University Fund (grant no. XKXL1503) and K.C. Wong Magna Fund in Ningbo University.

Author information

Authors and Affiliations

College of Information Science and Engineering, Ningbo University, 315211, Ningbo, China
Diqun Yan, Li Xiang, Zhifeng Wang & Rangding Wang
Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, 518060, Shenzhen, China
Diqun Yan

Authors

Diqun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Li Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rangding Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diqun Yan.

Additional information

The article is published in the original.

About this article

Cite this article

Diqun Yan, Xiang, L., Wang, Z. et al. Detection of HMM Synthesized Speech by Wavelet Logarithmic Spectrum. Aut. Control Comp. Sci. 53, 72–79 (2019). https://doi.org/10.3103/S014641161901005X

Download citation

Received: 27 February 2018
Revised: 17 May 2018
Accepted: 17 May 2018
Published: 16 April 2019
Issue Date: January 2019
DOI: https://doi.org/10.3103/S014641161901005X

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions