Skip to main content

End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

Abstract

Technological progress and proliferation of sophisticated software has made it easier than ever to spoof a person’s voice and audio in general. Like other biometrics, speaker verification is vulnerable to spoofing attacks. Detecting these attacks using the artifacts present in the recordings is a major challenge. Current trend in spoofing detection is to employ deep learning architectures to perform end-to-end detection by employing a pooling layer which aggregates the frame-level information into utterance-level embeddings. To do so, only the first or first and second order statistics are normally pooled across temporal dimension. In this paper, we investigate the influence of higher order statistics, such as third and fourth order moments, on spoofing detection performance. A Time Delay Neural Network (TDNN) architecture is used on the top of linear frequency cepstral coefficients for carrying out spoofing detection experiments on the ASVspoof2019 challenge logical access and physical access corpora. Experiments results, in terms of equal error rate (EER) and minimum tandem detection cost function (min-tDCF), show that inclusion of higher order statistics is accommodating for improving the performance of spoofing detection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alam, J., Kenny, P.: Spoofing detection employing infinite impulse response-constant q transform-based feature representations. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 101–105. IEEE (2017)

    Google Scholar 

  2. Alam, M.J., Kenny, P., Bhattacharya, G., Stafylakis, T.: Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  3. Alam, M.J., Kenny, P., Gupta, V., Stafylakis, T.: Spoofing detection on the asvspoof2015 challenge corpus employing deep neural networks. In: Proceedings Odyssey, pp. 270–276 (2016)

    Google Scholar 

  4. Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 132–137 (2020)

    Google Scholar 

  5. Chettri, B., Stoller, D., Morfi, V., Ramírez, M.A.M., Benetos, E., Sturm, B.L.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings Interspeech (2019), pp. 1018–1022 (2019)

    Google Scholar 

  6. consortium, A.: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf. Accessed 13 May 2020

  7. Evans, N.W., Kinnunen, T., Yamagishi, J.: Spoofing and countermeasures for automatic speaker verification. In: Interspeech, pp. 925–929 (2013)

    Google Scholar 

  8. Kinnunen, T., et al.: T-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv preprint arXiv:1804.09618 (2018)

  9. Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: Leveraging from spoofing countermeasures for speech artifact assessment. arXiv preprint arXiv:1804.08438 (2018)

  10. Kinnunen, T., et al.: The asvspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: Proceedings Interspeech, (2017), pp. 2–6 (2017)

    Google Scholar 

  11. Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., Kozlov, A.: STC antispoofing systems for the asvspoof2019 challenge. In: Proceedings Interspeech (2019), pp. 1033–1037 (2019)

    Google Scholar 

  12. Lorenzo-Trueba, J., Fang, F., Wang, X., Echizen, I., Yamagishi, J., Kinnunen, T.: Can we steal your vocal identity from the internet?: Initial investigation of cloning obama’s voice using gan, wavenet and low-quality found data. arXiv preprint arXiv:1803.00860 (2018)

  13. Monteiro, J., Alam, J.: Development of voice spoofing detection systems for 2019 edition of automatic speaker verification and countermeasures challenge. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1003–1010 (2019)

    Google Scholar 

  14. Monteiro, J., Alam, J., Falk, T.: A multi-condition training strategy for countermeasures against spoofing attacks to speaker recognizers. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 296–303 (2020)

    Google Scholar 

  15. Monteiro, J., Alam, J., Falk, T.H.: Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Comput. Speech Lang. 63, 101096 (2020)

    Article  Google Scholar 

  16. van den Oord, A., et al.: Wavenet: a generative model for raw audio. CoRR arXiv:1609.03499 (2016).

  17. Oord, A.V.D., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

  18. Patel, T.B., Patil, H.A.: Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  19. Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)

    Google Scholar 

  20. Ping, W., et al.: Deep voice 3: 2000-speaker neural text-to-speech. CoRR arXiv:1710.07654 (2017)

  21. RahulT, P., Aravind, P.R., Ranjith, C., Nechiyil, U., Paramparambath, N.: Audio spoofing verification using deep convolutional neural networks by transfer learning. ArXiv:abs/2008.03464 (2020)

  22. Sahidullah, M., et al.: Introduction to voice presentation attack detection and recent advances. In: Marcel, S., Nixon, M.S., Fierrez, J., Evans, N. (eds.) Handbook of Biometric Anti-Spoofing. ACVPR, pp. 321–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-92627-8_15

    Chapter  Google Scholar 

  23. Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection (2015)

    Google Scholar 

  24. Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings Interspeech 2020, pp. 1106–1110 (2020)

    Google Scholar 

  25. Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with rawnet2. In: IEEE (ed.) ICASSP 2021, IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada (Virtual Conference), June 2021, pp. 6–11. Ontario (2021)

    Google Scholar 

  26. Tamamori, A., Hayashi, T., Kobayashi, K., Takeda, K., Toda, T.: Speaker-dependent wavenet vocoder. In: Interspeech 2017, pp. 1118–1122 (2017)

    Google Scholar 

  27. Tian, X., Wu, Z., Xiao, X., Chng, E.S., Li, H.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123. IEEE (2016)

    Google Scholar 

  28. Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)

    Article  Google Scholar 

  29. Todisco, M., et al.: Asvspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)

  30. Wang, Y., et al.: Tacotron: a fully end-to-end text-to-speech synthesis model. CoRR arXiv:1703.10135 (2017)

  31. Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)

  32. Wielgus, A., Magiera, W., Smagowski, P.: Efficiency of the nonlinear schur-type estimation algorithms for higher-order stochastic processes. In: International Conference on Signals and Electronic Systems (2018)

    Google Scholar 

  33. Wu, Z., Das, R.K., Yang, J., Li, H.: Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. In: Proceedings Interspeech 2020, pp. 1101–1105 (2020)

    Google Scholar 

  34. Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification. Surv. Speech Commun. 66, 130–153 (2015)

    Article  Google Scholar 

  35. Wu, Z., et al.: Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  36. Xiang, X., Wang, S., Huang, H., Qian, Y., Yu, K.: Margin matters: towards more discriminative deep neural network embeddings for speaker recognition. arXiv preprint arXiv:1906.07317 (2019)

  37. Xiao, X., Tian, X., Du, S., Xu, H., Chng, E.S., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for asvspoof 2015 challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  38. You, L., Guo, W., Dai, L., Du, J.: Multi-task learning with high-order statistics for x-vector based text-independent speaker verification (2019)

    Google Scholar 

  39. Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jahangir Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alam, J., Fathan, A., Kang, W.H. (2021). End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics