Playback Attack Detection: The Search for the Ultimate Set of Antispoof Features
Automatic speaker verification systems are vulnerable to several kinds of spoofing attacks. Some of them can be quite simple – for example, the playback of an eavesdropped recording does not require any specialized equipment nor knowledge, but still may pose a serious threat for a biometric identification module built into an e-banking application. In this paper we follow the recent approach and convert recordings to images, assuming that original voice can be distinguished from its played back version through the analysis of local texture patterns. We propose improvements to the state-of-the-art solution, but also show its severe limitations. This in turn leads to the fundamental question: is it possible to find one set of features which are characteristic for all playback recordings? We look for the answer by performing a series of optimization experiments, but in general the problem remains open.
KeywordsPlayback detection Antispoof algorithms Biometrics
The author would like to thank Tomasz Szwelnik and Jacek Kawalec from Voicelab for fruitful discussions, sharing the expertise and granting access to the VL-Bio database of playback attacks, which was created at the company’s laboratories.
- 1.Wu, Z., Kinnunen, T., Evans, N., Yamagishi, J., Hanilci, C., Sahidullah, M., Sizov, A.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Interspeech 2015, Dresden, pp. 2037–2041 (2015)Google Scholar
- 3.Villalba, J., Lleida, E.: Preventing replay attacks on speaker verification systems. In: IEEE International Carnahan Conference on Security Technology, Barcelona (2011)Google Scholar
- 4.Wang, Z.-F., Wei, G., He, Q.-H.: Channel pattern noise based playback attack detection algorithm for speaker recognition. In: International Conference on Machine Learning and Cybernetics, vol. 4, Guilin, pp. 1708–1713 (2011)Google Scholar
- 5.Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., Matsui, T.: Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. In: Interspeech 2015, Dresden, pp. 239–243 (2015)Google Scholar
- 7.Yu, G., Slotine, J.-J.: Audio classification from time-frequency texture. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, pp. 1677–1680 (2009)Google Scholar
- 8.Maka, T., Forczmański, P.: Environmental sounds recognition based on image processing methods. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds.) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. AISC, vol. 403, pp. 723–732. Springer, Cham (2016). doi: 10.1007/978-3-319-26227-7_68 CrossRefGoogle Scholar
- 12.Smiatacz, M., Rumiński, J.: Local texture pattern selection for efficient face recognition and tracking. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds.) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. AISC, vol. 403, pp. 359–368. Springer, Cham (2016). doi: 10.1007/978-3-319-26227-7_34 CrossRefGoogle Scholar