Speaker Recognition Based on Multilevel Speech Signal Analysis on Polish Corpus
This article deals with a new approach to the text-independent speaker verification task. It is namely proposed to combine spectral and the so-called high-level features (prosodic, articulatory, and lexical) in order increase accuracy of speaker verification. The presented experiments were performed using a Polish language corpus called PUEPS. It contains semi-spontaneous telephone conversations (acted emergency telephone notifications) recorded in laboratory conditions. As the Polish language is under resourced and the PUEPS corpus is relatively small, another approach is needed than these known from the well known NIST evaluations. The authors proposed to use the fast scoring instead of more complex classifiers and the AdaBoost algorithm for features combination. Combination of features resulted in equal error rate (EER) reduction for various SNR conditions.
KeywordsSpeaker recognition high-level features kernel combination boosting
Unable to display preview. Download preview PDF.
- 2.Baker, B.J.: Speaker verification incorporating high-level linguistic features. PhD thesis, Queensland University of Technology (2008)Google Scholar
- 3.Balcerek, J., Drgas, S., Dabrowski, A., Konieczka, A.: Prototype multimedia database system for registration of emergency situations. In: SPA Conference (2009)Google Scholar
- 5.Cetnarowicz, D., Drgas, S., Dabrowski, A.: Speaker recognition system and experiments with head / torso simulator and telephone transmission. In: Signal Processing Algorithms, Architectures, Arrangements, and Applications Conference Proceedings (SPA), 2010, pp. 99–103 (September 2010)Google Scholar
- 6.Crammer, K., Keshet, J., Singer, Y.: Kernel design using boosting. In: NIPS 2002 (2002)Google Scholar
- 7.Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., Dumouchel, P.: Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Interspeech 2009 (2009)Google Scholar
- 8.Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech 2001, pp. 2521–2524 (2001)Google Scholar
- 9.Drgas, S., Dabrowski, A.: Kernel alignment maximization for speaker recognition based on high-level features. In: Interspeech 2011, pp. 489–492 (2011)Google Scholar
- 10.Drgas, S., Dabrowski, A.: Kernel matrix size reduction methods for speaker verification. In: 5th Language & Technology Conference (2011)Google Scholar
- 11.Frankel, J., Magimai-Doss, M., King, S., Livescu, K., Cetin, O.: Articulatory feature classifiers trained on 2000 hours of telephone speech. In: Interspeech 2007 (2007)Google Scholar
- 12.Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., Xiang, B.: The supersid project: exploiting high-level information for high-accuracy speaker recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), April 6-10, vol. 4, pp. IV–784–7 (2003)Google Scholar