Abstract
Despite an increasing interest in speaker recognition technologies, a significant obstacle still hinders their wide deployment—their high vulnerability to spoofing or presentation attacks. These attacks can be easy to perform. For instance, if an attacker has access to a speech sample from a target user, he/she can replay it using a loudspeaker or a smartphone to the recognition system during the authentication process. The ease of executing presentation attacks and the fact that no technical knowledge of the biometric system is required to make these attacks especially threatening in practical application. Therefore, late research focuses on collecting data databases with such attacks and on development of presentation attack detection (PAD) systems. In this chapter, we present an overview of the latest databases and the techniques to detect presentation attacks. We consider several prominent databases that contain bona fide and attack data, including ASVspoof 2015, ASVspoof 2017, AVspoof, voicePA, and BioCPqD-PA (the only proprietary database). Using these databases, we focus on the performance of PAD systems in the cross-database scenario or in the presence of “unknown” (not available during training) attacks, as these scenarios are closer to practice, when pretrained systems need to detect attacks in unforeseen conditions. We first present and discuss the performance of PAD systems based on handcrafted features and traditional Gaussian mixture model (GMM) classifiers. We then demonstrate whether the score fusion techniques can improve the performance of PADs. We also present some of the latest results of using neural networks for presentation attack detection. The experiments show that PAD systems struggle to generalize across databases and mostly unable to detect unknown attacks, with systems based on neural networks demonstrating better performance compared to the systems based on handcrafted features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Precomputed CQCC features were provided by the authors.
- 3.
- 4.
- 5.
- 6.
Source code: https://gitlab.idiap.ch/bob/bob.hobpad2.chapter16.
- 7.
- 8.
- 9.
Not all subjects recorded five sessions, due to scheduling difficulties.
- 10.
- 11.
References
ISO/IEC JTC 1/SC 37 Biometrics. (2016) DIS 30107-1, information technology biometrics presentation attack detection. American National Standards Institute
Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015) ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp 2037–2041
Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system?. Tech Rep Idiap-RR-61-2005, Idiap Research Institute
Kucur Ergunay S, Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE international conference on biometrics: Theory, applications and systems, pp 1–6
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: A survey. Speech Commun 66:130–153
Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7234–7238. https://doi.org/10.1109/ICASSP.2013.6639067
Gałka J, Grzywacz M, Samborski R (2015) Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun 67:143–153
Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2015) Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: Sixteenth annual conference of the international speech communication association, pp 239–243
Wu Z, Siong CE, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: INTERSPEECH
De Leon P, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of hmm-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290. https://doi.org/10.1109/TASL.2012.2201472
Patel TB, Patil HA (2015) Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: INTERSPEECH, pp 2062–2066
Todisco M, Delgado H, Evans N (2017) Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Comput Speech Lang
Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: IEEE international conference on biometrics: Theory, applications and systems, pp 1–8
Janicki A (2015) Spoofing countermeasure based on analysis of linear prediction error. In: Sixteenth annual conference of the international speech communication association, pp 2077–2081
Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: INTERSPEECH, pp 2087–2091
Luo D, Wu H, Huang J (2015) Audio recapture detection using deep learning. In: 2015 IEEE China summit and international conference on signal and information processing (ChinaSIP), pp 478–482. https://doi.org/10.1109/ChinaSIP.2015.7230448
Paul D, Sahidullah M, Saha G (2017) Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In: ICASSP, pp 2047–2051
Korshunov P, Marcel S (2016) Cross-database evaluation of audio-based spoofing detection systems. In: Interspeech, pp 1705–1709
Korshunov P, Marcel S (2017) Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations. IEEE J Sel Top Sig Process 11(4):695–705. https://doi.org/10.1109/JSTSP.2017.2692389
Goncalves AR, Korshunov P, Violato RPV, Simões FO, Marcel S (2017) On the generalization of fused systems in voice presentation attack detection. In: Brömme A, Busch C, Dantcheva A, Rathgeb C, Uhl A (eds) 16th international conference of the biometrics special interest group. Darmstadt, Germany
Korshunov P, Marcel S, Muckenhirn H, Gonçalves AR, Mello AGS, Violato RPV, Simoes FO, Neto MU, de, (2016) In: Assis Angeloni M, Stuchi JA, Dinkel H, Chen N, Qian Y, Paul D, Saha G, Sahidullah, M (eds) Overview of BTAS 2016 speaker anti-spoofing competition. IEEE international conference on biometrics: Theory applications and systems. Niagara Falls, NY, USA, pp 1–6
Muckenhirn H, Magimai-Doss M, Marcel S (2017) End-to-end convolutional neural network-based voice presentation attack detection. In: International joint conference on biometrics
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272. https://doi.org/10.1109/TASSP.1981.1163530
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP-97), vol 2, pp 1331–1334. https://doi.org/10.1109/ICASSP.1997.596192
Le PN, Ambikairajah E, Epps J, Sethu V, Choi EHC (2011) Investigation of spectral centroid features for cognitive load classification. Speech Commun 53(4):540–551
Todisco M, Delgado H, Evans N (2016) A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. In: Odyssey, pp 283–290
Violato R, Neto MU, Simões F, Pereira T, Angeloni M (2013) BioCPqD: uma base de dados biométricos com amostras de face e voz de indivíduos brasileiros. Cad CPqD Tecnolo 9(2):7–18
Kinnunen T, Sahidullah M, Delgado H, Todisco M, Evans N, Yamagishi J, Lee KA (2017) The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: INTERSPEECH 2017, annual conference of the international speech communication association, 20–24 August 2017, Stockholm, Sweden
Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Commun 51(11):1039–1064
Lee K, Larcher A, Wang G, Kenny P, Brümmer N, van Leeuwen DA, Aronowitz H, Kockmann M, Vaquero C, Ma B, Li H, Stafylakis T, Alam MJ, Swart A, Perez J (2015) The reddots data collection for speaker recognition. In: Interspeech, pp 2996–2091
Kinnunen T, Sahidullah M, Falcone M, Costantini L, Hautamäki RG, Thomsen D, Sarkar A, Tan Z, Delgado H, Todisco M, Evans N, Hautamäki V, Lee K (2017) RedDots replayed: a new replay spoofing attack corpus for text-dependent speaker verification research. In: ICASSP, pp 5395–5399
ISO/IEC JTC 1/SC 37 Biometrics (2016) DIS 30107-3:2016, information technology—biometrics presentation attack detection—part 3: Testing and reporting. American National Standards Institute
Soong FK, Rosenberg AE (1988) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoust Speech Signal Process 36(6):871–879. https://doi.org/10.1109/29.1598
Mandasari MI, Gnther M, Wallace R, Saeidi R, Marcel S, van Leeuwen DA (2014) Score calibration in face recognition. IET Biom 3(4):246–256. https://doi.org/10.1049/iet-bmt.2013.0066
Scherhag U, Nautsch A, Rathgeb C, Busch C (2016) Unit-selection attack detection based on unfiltered frequency-domain features. In: INTERSPEECH, San Francisco, USA pp 2209–2212
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, pp 61–74
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press. http://www.deeplearningbook.org
Gonçalves AR, Uliani Neto M, Yehia HC (2015) Accelerating replay attack detector synthesis with loudspeaker characterization. In: 7th symposium of instrumentation and medical images /6th symposium of signal processing of UNICAMP
Acknowledgements
This work has been supported by the European H2020-ICT project TeSLA (grant agreement no. 688520), the project on Secure Access Control over Wide Area Networks (SWAN) funded by the Research Council of Norway (grant no. IKTPLUSS 248030/O70), and by the Swiss Center for Biometrics Research and Testing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Korshunov, P., Marcel, S. (2019). A Cross-Database Study of Voice Presentation Attack Detection. In: Marcel, S., Nixon, M., Fierrez, J., Evans, N. (eds) Handbook of Biometric Anti-Spoofing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-92627-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-92627-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92626-1
Online ISBN: 978-3-319-92627-8
eBook Packages: Computer ScienceComputer Science (R0)