An HMM-Based Subband Processing Approach to Speaker Identification

Higgins, J. E.; Damper, R. I.

doi:10.1007/3-540-45344-X_24

J. E. Higgins⁵ &
R. I. Damper⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2091))

Included in the following conference series:

International Conference on Audio- and Video-Based Biometric Person Authentication

1663 Accesses
2 Citations

Abstract

This paper contributes to the growing literature confirming the effectiveness of subband processing for speaker recognition. Specifically, we investigate speaker identification from noisy test speech modelled using linear prediction and hidden Markov models (HMMs). After filtering the wideband signal into subbands, the output time trajectory of each is represented by 12 pseudo-cepstral coefficients which are used to train and test individual HMMs. During recognition, the HMM outputs are combined to produce an overall score for each test utterance. We find that, for particular numbers of filters, subband processing outperforms traditional wideband techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B.S. Atal, (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55(6), 1304–1312.
Article Google Scholar
L. Besacier and J.-F. Bonastre (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In Proceedings of 1st International Conference on Audio-and Visual-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, pp. 195–202.
Google Scholar
L. Besacier and J.-F. Bonastre (2000). Subband architecture for automatic speaker recognition. Signal Processing 80(7), 1245–1259.
Article MATH Google Scholar
H. Bourlard and S. Dupont (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP’96, Volume 1, Philadelphia, PA, pp. 426–429.
Article Google Scholar
J.R. Deller, J.P. Proakis, and J.H.L. Hansen (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: MacMillan.
Google Scholar
R.A. Finan, R.I. Damper, and A.T. Sapeluk (2001). Text-dependent speaker recognition using sub-band processing. International Journal of Speech Technology 4(1), 45–62.
Article MATH Google Scholar
S. Furui, (1974). An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronic Communications 57-A, 34–42.
Google Scholar
S. Furui, (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(2), 254–272.
Article Google Scholar
S. Geman, E. Bienenstock, and R. Doursat (1992). Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1–58.
Article Google Scholar
J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.
Article Google Scholar
A. Morris, A. Hagen, and H. Bourlard (1999). The full-combination sub-bands approach to noise robust HMM/ANN-based ASR. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 599–602.
Google Scholar
L.R. Rabiner, (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285.
Article Google Scholar
D.A. Reynolds and R.C. Rose (1995). Robust text-independent speaker identification using Gaussian mixture models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83.
Article Google Scholar
S.S. Stevens and J. Volkmann (1940). The relation of pitch to frequency: A revised scale. American Journal of Psychology 53(3), 329–353.
Article Google Scholar
S. Tibrewala and H. Hermansky (1997). Sub-band based recognition of noisy speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’97, Volume II, Munich, Germany, pp. 1255–1258.
Google Scholar
S. Young, J. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (2000). The HTK Book. Available from URL: http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Image, Speech and Intelligent Systems Research Group Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
J. E. Higgins & R. I. Damper

Authors

J. E. Higgins
View author publications
You can also search for this author in PubMed Google Scholar
R. I. Damper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Science, Computer and Electrical Engineering, Halmstad University, P.O. Box 823, S-301 18, Halmstad, Sweden
Josef Bigun & Fabrizio Smeraldi &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Higgins, J.E., Damper, R.I. (2001). An HMM-Based Subband Processing Approach to Speaker Identification. In: Bigun, J., Smeraldi, F. (eds) Audio- and Video-Based Biometric Person Authentication. AVBPA 2001. Lecture Notes in Computer Science, vol 2091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45344-X_24

Download citation

DOI: https://doi.org/10.1007/3-540-45344-X_24
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42216-7
Online ISBN: 978-3-540-45344-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics