Abstract
It has been widely reported that the information of speaker individuality in the voice is not equally distributed on the speech spectrum, and that this is attributed to the occurrence of different phoneme events (e.g. [115, 156, 170]). Based on this finding, a variety of methods have been developed to conveniently extract the most useful information from the speech signal for further modelling, however most of them limited to clean microphone or to NB telephone speech. Considering WB-transmitted speech, the usefulness of the frequency range beyond the NB cut-off frequencies has not yet been determined. Besides, the commonly adopted MFCC features might not be appropriate for speaker verification in order to take full advantage of the WB signal, since they were developed for speech recognition and from signals band-limited to 5 kHz [51]. This work reveals some causes leading to this benefit, considering clean and degraded speech. It attempts to provide some guidance in speaker verification system configuration, identifying speaker-discriminative information in frequency bands beyond NB, and encouraging its use. First, a sub-band analysis employing transmitted speech segments is presented and the effects of channel degradations on different frequency sub-bands determined. Next, the speaker verification performances from speech signals of 0–4, 4–8, and 0–8 kHz, and from transmitted speech are compared, employing different sets of cepstral features extracted using linearly- and mel-spaced filterbanks (LFCCs and MFCCs). Finally, effective phoneme classes in WB are determined and identified as an important contribution to the superiority of WB over NB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available from https://sites.google.com/site/nikobrummer/focal, last accessed 19th August 2014.
- 2.
An example of these files can be downloaded from https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.phn, last accessed 28th August 2014.
- 3.
Accessible at https://catalog.ldc.upenn.edu/docs/LDC93S1/PHONCODE.TXT, last accessed 28th August 2014.
- 4.
https://sites.google.com/site/dgromeroweb/software/, last accessed 15th July 2014.
- 5.
This result appears counter-intuitive. It was expected that the lack of phonemes affects more severely the performance with WB-transmitted speech. Whether the software is operating in an identical manner as for the clean and the NB-transmitted data experiments with correct input files has been triple-checked by the author. Further research would be needed in order to find a satisfactory explanation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Fernández Gallardo, L. (2016). Detecting Speaker-Discriminative Spectral Content in Wideband for Automatic Speaker Recognition. In: Human and Automatic Speaker Recognition over Telecommunication Channels. T-Labs Series in Telecommunication Services. Springer, Singapore. https://doi.org/10.1007/978-981-287-727-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-287-727-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-726-0
Online ISBN: 978-981-287-727-7
eBook Packages: EngineeringEngineering (R0)