Skip to main content

A Cross-Database Study of Voice Presentation Attack Detection

  • Chapter
  • First Online:
Handbook of Biometric Anti-Spoofing

Abstract

Despite an increasing interest in speaker recognition technologies, a significant obstacle still hinders their wide deployment—their high vulnerability to spoofing or presentation attacks. These attacks can be easy to perform. For instance, if an attacker has access to a speech sample from a target user, he/she can replay it using a loudspeaker or a smartphone to the recognition system during the authentication process. The ease of executing presentation attacks and the fact that no technical knowledge of the biometric system is required to make these attacks especially threatening in practical application. Therefore, late research focuses on collecting data databases with such attacks and on development of presentation attack detection (PAD) systems. In this chapter, we present an overview of the latest databases and the techniques to detect presentation attacks. We consider several prominent databases that contain bona fide and attack data, including ASVspoof 2015, ASVspoof 2017, AVspoof, voicePA, and BioCPqD-PA (the only proprietary database). Using these databases, we focus on the performance of PAD systems in the cross-database scenario or in the presence of “unknown” (not available during training) attacks, as these scenarios are closer to practice, when pretrained systems need to detect attacks in unforeseen conditions. We first present and discuss the performance of PAD systems based on handcrafted features and traditional Gaussian mixture model (GMM) classifiers. We then demonstrate whether the score fusion techniques can improve the performance of PADs. We also present some of the latest results of using neural networks for presentation attack detection. The experiments show that PAD systems struggle to generalize across databases and mostly unable to detect unknown attacks, with systems based on neural networks demonstrating better performance compared to the systems based on handcrafted features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://datashare.is.ed.ac.uk/handle/10283/853.

  2. 2.

    Precomputed CQCC features were provided by the authors.

  3. 3.

    https://www.idiap.ch/dataset/avspoof.

  4. 4.

    https://www.idiap.ch/dataset/voicepa.

  5. 5.

    https://www.beat-eu.org/platform/.

  6. 6.

    Source code: https://gitlab.idiap.ch/bob/bob.hobpad2.chapter16.

  7. 7.

    https://datashare.is.ed.ac.uk/handle/10283/3017.

  8. 8.

    http://festvox.org/.

  9. 9.

    Not all subjects recorded five sessions, due to scheduling difficulties.

  10. 10.

    https://www.tensorflow.org/.

  11. 11.

    http://www.tesla-project.eu.

References

  1. ISO/IEC JTC 1/SC 37 Biometrics. (2016) DIS 30107-1, information technology biometrics presentation attack detection. American National Standards Institute

    Google Scholar 

  2. Wu Z, Kinnunen T, Evans N, Yamagishi J, Hanilçi C, Sahidullah M, Sizov A (2015) ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge. In: INTERSPEECH, pp 2037–2041

    Google Scholar 

  3. Mariéthoz J, Bengio S (2005) Can a professional imitator fool a GMM-based speaker verification system?. Tech Rep Idiap-RR-61-2005, Idiap Research Institute

    Google Scholar 

  4. Kucur Ergunay S, Khoury, E., Lazaridis, A., Marcel, S.: On the vulnerability of speaker verification to realistic voice spoofing. In: IEEE international conference on biometrics: Theory, applications and systems, pp 1–6

    Google Scholar 

  5. Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H (2015) Spoofing and countermeasures for speaker verification: A survey. Speech Commun 66:130–153

    Article  Google Scholar 

  6. Wu Z, Xiao X, Chng ES, Li H (2013) Synthetic speech detection using temporal modulation feature. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7234–7238. https://doi.org/10.1109/ICASSP.2013.6639067

  7. Gałka J, Grzywacz M, Samborski R (2015) Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun 67:143–153

    Article  Google Scholar 

  8. Shiota S, Villavicencio F, Yamagishi J, Ono N, Echizen I, Matsui T (2015) Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: Sixteenth annual conference of the international speech communication association, pp 239–243

    Google Scholar 

  9. Wu Z, Siong CE, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: INTERSPEECH

    Google Scholar 

  10. De Leon P, Pucher M, Yamagishi J, Hernaez I, Saratxaga I (2012) Evaluation of speaker verification security and detection of hmm-based synthetic speech. IEEE Trans Audio Speech Lang Process 20(8):2280–2290. https://doi.org/10.1109/TASL.2012.2201472

    Article  Google Scholar 

  11. Patel TB, Patil HA (2015) Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: INTERSPEECH, pp 2062–2066

    Google Scholar 

  12. Todisco M, Delgado H, Evans N (2017) Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Comput Speech Lang

    Google Scholar 

  13. Alegre F, Amehraye A, Evans N (2013) A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: IEEE international conference on biometrics: Theory, applications and systems, pp 1–8

    Google Scholar 

  14. Janicki A (2015) Spoofing countermeasure based on analysis of linear prediction error. In: Sixteenth annual conference of the international speech communication association, pp 2077–2081

    Google Scholar 

  15. Sahidullah M, Kinnunen T, Hanilçi C (2015) A comparison of features for synthetic speech detection. In: INTERSPEECH, pp 2087–2091

    Google Scholar 

  16. Luo D, Wu H, Huang J (2015) Audio recapture detection using deep learning. In: 2015 IEEE China summit and international conference on signal and information processing (ChinaSIP), pp 478–482. https://doi.org/10.1109/ChinaSIP.2015.7230448

  17. Paul D, Sahidullah M, Saha G (2017) Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In: ICASSP, pp 2047–2051

    Google Scholar 

  18. Korshunov P, Marcel S (2016) Cross-database evaluation of audio-based spoofing detection systems. In: Interspeech, pp 1705–1709

    Google Scholar 

  19. Korshunov P, Marcel S (2017) Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations. IEEE J Sel Top Sig Process 11(4):695–705. https://doi.org/10.1109/JSTSP.2017.2692389

    Article  Google Scholar 

  20. Goncalves AR, Korshunov P, Violato RPV, Simões FO, Marcel S (2017) On the generalization of fused systems in voice presentation attack detection. In: Brömme A, Busch C, Dantcheva A, Rathgeb C, Uhl A (eds) 16th international conference of the biometrics special interest group. Darmstadt, Germany

    Google Scholar 

  21. Korshunov P, Marcel S, Muckenhirn H, Gonçalves AR, Mello AGS, Violato RPV, Simoes FO, Neto MU, de, (2016) In: Assis Angeloni M, Stuchi JA, Dinkel H, Chen N, Qian Y, Paul D, Saha G, Sahidullah, M (eds) Overview of BTAS 2016 speaker anti-spoofing competition. IEEE international conference on biometrics: Theory applications and systems. Niagara Falls, NY, USA, pp 1–6

    Google Scholar 

  22. Muckenhirn H, Magimai-Doss M, Marcel S (2017) End-to-end convolutional neural network-based voice presentation attack detection. In: International joint conference on biometrics

    Google Scholar 

  23. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  24. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272. https://doi.org/10.1109/TASSP.1981.1163530

    Article  Google Scholar 

  25. Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP-97), vol 2, pp 1331–1334. https://doi.org/10.1109/ICASSP.1997.596192

  26. Le PN, Ambikairajah E, Epps J, Sethu V, Choi EHC (2011) Investigation of spectral centroid features for cognitive load classification. Speech Commun 53(4):540–551

    Article  Google Scholar 

  27. Todisco M, Delgado H, Evans N (2016) A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. In: Odyssey, pp 283–290

    Google Scholar 

  28. Violato R, Neto MU, Simões F, Pereira T, Angeloni M (2013) BioCPqD: uma base de dados biométricos com amostras de face e voz de indivíduos brasileiros. Cad CPqD Tecnolo 9(2):7–18

    Google Scholar 

  29. Kinnunen T, Sahidullah M, Delgado H, Todisco M, Evans N, Yamagishi J, Lee KA (2017) The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In: INTERSPEECH 2017, annual conference of the international speech communication association, 20–24 August 2017, Stockholm, Sweden

    Google Scholar 

  30. Zen H, Tokuda K, Black AW (2009) Statistical parametric speech synthesis. Speech Commun 51(11):1039–1064

    Article  Google Scholar 

  31. Lee K, Larcher A, Wang G, Kenny P, Brümmer N, van Leeuwen DA, Aronowitz H, Kockmann M, Vaquero C, Ma B, Li H, Stafylakis T, Alam MJ, Swart A, Perez J (2015) The reddots data collection for speaker recognition. In: Interspeech, pp 2996–2091

    Google Scholar 

  32. Kinnunen T, Sahidullah M, Falcone M, Costantini L, Hautamäki RG, Thomsen D, Sarkar A, Tan Z, Delgado H, Todisco M, Evans N, Hautamäki V, Lee K (2017) RedDots replayed: a new replay spoofing attack corpus for text-dependent speaker verification research. In: ICASSP, pp 5395–5399

    Google Scholar 

  33. ISO/IEC JTC 1/SC 37 Biometrics (2016) DIS 30107-3:2016, information technology—biometrics presentation attack detection—part 3: Testing and reporting. American National Standards Institute

    Google Scholar 

  34. Soong FK, Rosenberg AE (1988) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoust Speech Signal Process 36(6):871–879. https://doi.org/10.1109/29.1598

    Article  MATH  Google Scholar 

  35. Mandasari MI, Gnther M, Wallace R, Saeidi R, Marcel S, van Leeuwen DA (2014) Score calibration in face recognition. IET Biom 3(4):246–256. https://doi.org/10.1049/iet-bmt.2013.0066

    Article  Google Scholar 

  36. Scherhag U, Nautsch A, Rathgeb C, Busch C (2016) Unit-selection attack detection based on unfiltered frequency-domain features. In: INTERSPEECH, San Francisco, USA pp 2209–2212

    Google Scholar 

  37. Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. MIT Press, pp 61–74

    Google Scholar 

  38. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press. http://www.deeplearningbook.org

  39. Gonçalves AR, Uliani Neto M, Yehia HC (2015) Accelerating replay attack detector synthesis with loudspeaker characterization. In: 7th symposium of instrumentation and medical images /6th symposium of signal processing of UNICAMP

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the European H2020-ICT project TeSLA (grant agreement no. 688520), the project on Secure Access Control over Wide Area Networks (SWAN) funded by the Research Council of Norway (grant no. IKTPLUSS 248030/O70), and by the Swiss Center for Biometrics Research and Testing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Korshunov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Korshunov, P., Marcel, S. (2019). A Cross-Database Study of Voice Presentation Attack Detection. In: Marcel, S., Nixon, M., Fierrez, J., Evans, N. (eds) Handbook of Biometric Anti-Spoofing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-92627-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92627-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92626-1

  • Online ISBN: 978-3-319-92627-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics