Abstract
Technological progress and proliferation of sophisticated software has made it easier than ever to spoof a person’s voice and audio in general. Like other biometrics, speaker verification is vulnerable to spoofing attacks. Detecting these attacks using the artifacts present in the recordings is a major challenge. Current trend in spoofing detection is to employ deep learning architectures to perform end-to-end detection by employing a pooling layer which aggregates the frame-level information into utterance-level embeddings. To do so, only the first or first and second order statistics are normally pooled across temporal dimension. In this paper, we investigate the influence of higher order statistics, such as third and fourth order moments, on spoofing detection performance. A Time Delay Neural Network (TDNN) architecture is used on the top of linear frequency cepstral coefficients for carrying out spoofing detection experiments on the ASVspoof2019 challenge logical access and physical access corpora. Experiments results, in terms of equal error rate (EER) and minimum tandem detection cost function (min-tDCF), show that inclusion of higher order statistics is accommodating for improving the performance of spoofing detection systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alam, J., Kenny, P.: Spoofing detection employing infinite impulse response-constant q transform-based feature representations. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 101–105. IEEE (2017)
Alam, M.J., Kenny, P., Bhattacharya, G., Stafylakis, T.: Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Alam, M.J., Kenny, P., Gupta, V., Stafylakis, T.: Spoofing detection on the asvspoof2015 challenge corpus employing deep neural networks. In: Proceedings Odyssey, pp. 270–276 (2016)
Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 132–137 (2020)
Chettri, B., Stoller, D., Morfi, V., RamĂrez, M.A.M., Benetos, E., Sturm, B.L.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings Interspeech (2019), pp. 1018–1022 (2019)
consortium, A.: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf. Accessed 13 May 2020
Evans, N.W., Kinnunen, T., Yamagishi, J.: Spoofing and countermeasures for automatic speaker verification. In: Interspeech, pp. 925–929 (2013)
Kinnunen, T., et al.: T-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv preprint arXiv:1804.09618 (2018)
Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: Leveraging from spoofing countermeasures for speech artifact assessment. arXiv preprint arXiv:1804.08438 (2018)
Kinnunen, T., et al.: The asvspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: Proceedings Interspeech, (2017), pp. 2–6 (2017)
Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., Kozlov, A.: STC antispoofing systems for the asvspoof2019 challenge. In: Proceedings Interspeech (2019), pp. 1033–1037 (2019)
Lorenzo-Trueba, J., Fang, F., Wang, X., Echizen, I., Yamagishi, J., Kinnunen, T.: Can we steal your vocal identity from the internet?: Initial investigation of cloning obama’s voice using gan, wavenet and low-quality found data. arXiv preprint arXiv:1803.00860 (2018)
Monteiro, J., Alam, J.: Development of voice spoofing detection systems for 2019 edition of automatic speaker verification and countermeasures challenge. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1003–1010 (2019)
Monteiro, J., Alam, J., Falk, T.: A multi-condition training strategy for countermeasures against spoofing attacks to speaker recognizers. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 296–303 (2020)
Monteiro, J., Alam, J., Falk, T.H.: Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers. Comput. Speech Lang. 63, 101096 (2020)
van den Oord, A., et al.: Wavenet: a generative model for raw audio. CoRR arXiv:1609.03499 (2016).
Oord, A.V.D., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Patel, T.B., Patil, H.A.: Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Patel, T.B., Patil, H.A.: Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5105–5109. IEEE (2016)
Ping, W., et al.: Deep voice 3: 2000-speaker neural text-to-speech. CoRR arXiv:1710.07654 (2017)
RahulT, P., Aravind, P.R., Ranjith, C., Nechiyil, U., Paramparambath, N.: Audio spoofing verification using deep convolutional neural networks by transfer learning. ArXiv:abs/2008.03464 (2020)
Sahidullah, M., et al.: Introduction to voice presentation attack detection and recent advances. In: Marcel, S., Nixon, M.S., Fierrez, J., Evans, N. (eds.) Handbook of Biometric Anti-Spoofing. ACVPR, pp. 321–361. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-92627-8_15
Sahidullah, M., Kinnunen, T., Hanilçi, C.: A comparison of features for synthetic speech detection (2015)
Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings Interspeech 2020, pp. 1106–1110 (2020)
Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with rawnet2. In: IEEE (ed.) ICASSP 2021, IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada (Virtual Conference), June 2021, pp. 6–11. Ontario (2021)
Tamamori, A., Hayashi, T., Kobayashi, K., Takeda, K., Toda, T.: Speaker-dependent wavenet vocoder. In: Interspeech 2017, pp. 1118–1122 (2017)
Tian, X., Wu, Z., Xiao, X., Chng, E.S., Li, H.: Spoofing detection from a feature representation perspective. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123. IEEE (2016)
Todisco, M., Delgado, H., Evans, N.: Constant q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Todisco, M., et al.: Asvspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Wang, Y., et al.: Tacotron: a fully end-to-end text-to-speech synthesis model. CoRR arXiv:1703.10135 (2017)
Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017)
Wielgus, A., Magiera, W., Smagowski, P.: Efficiency of the nonlinear schur-type estimation algorithms for higher-order stochastic processes. In: International Conference on Signals and Electronic Systems (2018)
Wu, Z., Das, R.K., Yang, J., Li, H.: Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. In: Proceedings Interspeech 2020, pp. 1101–1105 (2020)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification. Surv. Speech Commun. 66, 130–153 (2015)
Wu, Z., et al.: Asvspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Xiang, X., Wang, S., Huang, H., Qian, Y., Yu, K.: Margin matters: towards more discriminative deep neural network embeddings for speaker recognition. arXiv preprint arXiv:1906.07317 (2019)
Xiao, X., Tian, X., Du, S., Xu, H., Chng, E.S., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for asvspoof 2015 challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
You, L., Guo, W., Dai, L., Du, J.: Multi-task learning with high-order statistics for x-vector based text-independent speaker verification (2019)
Zhang, Y., Jiang, F., Duan, Z.: One-class learning towards synthetic voice spoofing detection. IEEE Sig. Process. Lett. 28, 937–941 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Alam, J., Fathan, A., Kang, W.H. (2021). End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)