End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics

Alam, Jahangir; Fathan, Abderrahim; Kang, Woo Hyun

doi:10.1007/978-3-030-87802-3_2

Jahangir Alam¹⁰,
Abderrahim Fathan¹⁰ &
Woo Hyun Kang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1680 Accesses
1 Citations

Abstract

Technological progress and proliferation of sophisticated software has made it easier than ever to spoof a person’s voice and audio in general. Like other biometrics, speaker verification is vulnerable to spoofing attacks. Detecting these attacks using the artifacts present in the recordings is a major challenge. Current trend in spoofing detection is to employ deep learning architectures to perform end-to-end detection by employing a pooling layer which aggregates the frame-level information into utterance-level embeddings. To do so, only the first or first and second order statistics are normally pooled across temporal dimension. In this paper, we investigate the influence of higher order statistics, such as third and fourth order moments, on spoofing detection performance. A Time Delay Neural Network (TDNN) architecture is used on the top of linear frequency cepstral coefficients for carrying out spoofing detection experiments on the ASVspoof2019 challenge logical access and physical access corpora. Experiments results, in terms of equal error rate (EER) and minimum tandem detection cost function (min-tDCF), show that inclusion of higher order statistics is accommodating for improving the performance of spoofing detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alam, J., Kenny, P.: Spoofing detection employing infinite impulse response-constant q transform-based feature representations. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 101–105. IEEE (2017)
Google Scholar
Alam, M.J., Kenny, P., Bhattacharya, G., Stafylakis, T.: Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Alam, M.J., Kenny, P., Gupta, V., Stafylakis, T.: Spoofing detection on the asvspoof2015 challenge corpus employing deep neural networks. In: Proceedings Odyssey, pp. 270–276 (2016)
Google Scholar
Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 132–137 (2020)
Google Scholar
Chettri, B., Stoller, D., Morfi, V., Ramírez, M.A.M., Benetos, E., Sturm, B.L.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings Interspeech (2019), pp. 1018–1022 (2019)
Google Scholar
consortium, A.: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf. Accessed 13 May 2020
Evans, N.W., Kinnunen, T., Yamagishi, J.: Spoofing and countermeasures for automatic speaker verification. In: Interspeech, pp. 925–929 (2013)
Google Scholar
Kinnunen, T., et al.: T-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv preprint arXiv:1804.09618 (2018)
Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: Leveraging from spoofing countermeasures for speech artifact assessment. arXiv preprint arXiv:1804.08438 (2018)
Kinnunen, T., et al.: The asvspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: Proceedings Interspeech, (2017), pp. 2–6 (2017)
Google Scholar
Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., Kozlov, A.: STC antispoofing systems for the asvspoof2019 challenge. In: Proceedings Interspeech (2019), pp. 1033–1037 (2019)
Google Scholar
Lorenzo-Trueba, J., Fang, F., Wang, X., Echizen, I., Yamagishi, J., Kinnunen, T.: Can we steal your vocal identity from the internet?: Initial investigation of cloning obama’s voice using gan, wavenet and low-quality found data. arXiv preprint arXiv:1803.00860 (2018)
Monteiro, J., Alam, J.: Development of voice spoofing detection systems for 2019 edition of automatic speaker verification and countermeasures challenge. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1003–1010 (2019)
Google Scholar
Monteiro, J., Alam, J., Falk, T.: A multi-condition training strategy for countermeasures against spoofing attacks to speaker recognizers. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 296–303 (2020)
Google Scholar
Monteiro, J., Alam, J., Falk, T.H.: Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Comput. Speech Lang. 63, 101096 (2020)
Article Google Scholar
van den Oord, A., et al.: Wavenet: a generative model for raw audio. CoRR arXiv:1609.03499 (2016).
Oord, A.V.D., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Patel, T.B., Patil, H.A.: Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)
Google Scholar
Ping, W., et al.: Deep voice 3: 2000-speaker neural text-to-speech. CoRR arXiv:1710.07654 (2017)
RahulT, P., Aravind, P.R., Ranjith, C., Nechiyil, U., Paramparambath, N.: Audio spoofing verification using deep convolutional neural networks by transfer learning. ArXiv:abs/2008.03464 (2020)
Sahidullah, M., et al.: Introduction to voice presentation attack detection and recent advances. In: Marcel, S., Nixon, M.S., Fierrez, J., Evans, N. (eds.) Handbook of Biometric Anti-Spoofing. ACVPR, pp. 321–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-92627-8_15
Chapter Google Scholar
Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection (2015)
Google Scholar
Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings Interspeech 2020, pp. 1106–1110 (2020)
Google Scholar
Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with rawnet2. In: IEEE (ed.) ICASSP 2021, IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada (Virtual Conference), June 2021, pp. 6–11. Ontario (2021)
Google Scholar
Tamamori, A., Hayashi, T., Kobayashi, K., Takeda, K., Toda, T.: Speaker-dependent wavenet vocoder. In: Interspeech 2017, pp. 1118–1122 (2017)
Google Scholar
Tian, X., Wu, Z., Xiao, X., Chng, E.S., Li, H.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123. IEEE (2016)
Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Article Google Scholar
Todisco, M., et al.: Asvspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Wang, Y., et al.: Tacotron: a fully end-to-end text-to-speech synthesis model. CoRR arXiv:1703.10135 (2017)
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Wielgus, A., Magiera, W., Smagowski, P.: Efficiency of the nonlinear schur-type estimation algorithms for higher-order stochastic processes. In: International Conference on Signals and Electronic Systems (2018)
Google Scholar
Wu, Z., Das, R.K., Yang, J., Li, H.: Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. In: Proceedings Interspeech 2020, pp. 1101–1105 (2020)
Google Scholar
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification. Surv. Speech Commun. 66, 130–153 (2015)
Article Google Scholar
Wu, Z., et al.: Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Xiang, X., Wang, S., Huang, H., Qian, Y., Yu, K.: Margin matters: towards more discriminative deep neural network embeddings for speaker recognition. arXiv preprint arXiv:1906.07317 (2019)
Xiao, X., Tian, X., Du, S., Xu, H., Chng, E.S., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for asvspoof 2015 challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
You, L., Guo, W., Dai, L., Du, J.: Multi-task learning with high-order statistics for x-vector based text-independent speaker verification (2019)
Google Scholar
Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Research Institute of Montreal (CRIM), Montreal, Québec, H3N 1M3, Canada
Jahangir Alam, Abderrahim Fathan & Woo Hyun Kang

Authors

Jahangir Alam
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahim Fathan
View author publications
You can also search for this author in PubMed Google Scholar
Woo Hyun Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jahangir Alam .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, J., Fathan, A., Kang, W.H. (2021). End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_2
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics