Skip to main content

An HMM-Based Subband Processing Approach to Speaker Identification

  • Conference paper
  • First Online:
Audio- and Video-Based Biometric Person Authentication (AVBPA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2091))

Abstract

This paper contributes to the growing literature confirming the effectiveness of subband processing for speaker recognition. Specifically, we investigate speaker identification from noisy test speech modelled using linear prediction and hidden Markov models (HMMs). After filtering the wideband signal into subbands, the output time trajectory of each is represented by 12 pseudo-cepstral coefficients which are used to train and test individual HMMs. During recognition, the HMM outputs are combined to produce an overall score for each test utterance. We find that, for particular numbers of filters, subband processing outperforms traditional wideband techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B.S. Atal, (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55(6), 1304–1312.

    Article  Google Scholar 

  2. L. Besacier and J.-F. Bonastre (1997). Subband approach for automatic speaker recognition: Optimal division of the frequency domain. In Proceedings of 1st International Conference on Audio-and Visual-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, pp. 195–202.

    Google Scholar 

  3. L. Besacier and J.-F. Bonastre (2000). Subband architecture for automatic speaker recognition. Signal Processing 80(7), 1245–1259.

    Article  MATH  Google Scholar 

  4. H. Bourlard and S. Dupont (1996). A new ASR approach based on independent processing and recombination of partial frequency bands. In Proceedings of Fourth International Conference on Spoken Language Processing, ICSLP’96, Volume 1, Philadelphia, PA, pp. 426–429.

    Article  Google Scholar 

  5. J.R. Deller, J.P. Proakis, and J.H.L. Hansen (1993). Discrete-Time Processing of Speech Signals. Englewood Cliffs, NJ: MacMillan.

    Google Scholar 

  6. R.A. Finan, R.I. Damper, and A.T. Sapeluk (2001). Text-dependent speaker recognition using sub-band processing. International Journal of Speech Technology 4(1), 45–62.

    Article  MATH  Google Scholar 

  7. S. Furui, (1974). An analysis of long-term variation of feature parameters of speech and its application to talker recognition. Electronic Communications 57-A, 34–42.

    Google Scholar 

  8. S. Furui, (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(2), 254–272.

    Article  Google Scholar 

  9. S. Geman, E. Bienenstock, and R. Doursat (1992). Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1–58.

    Article  Google Scholar 

  10. J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.

    Article  Google Scholar 

  11. A. Morris, A. Hagen, and H. Bourlard (1999). The full-combination sub-bands approach to noise robust HMM/ANN-based ASR. In Proceedings of 6th European Conference on Speech Communication and Technology, Eurospeech’99, Volume 2, Budapest, Hungary, pp. 599–602.

    Google Scholar 

  12. L.R. Rabiner, (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285.

    Article  Google Scholar 

  13. D.A. Reynolds and R.C. Rose (1995). Robust text-independent speaker identification using Gaussian mixture models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83.

    Article  Google Scholar 

  14. S.S. Stevens and J. Volkmann (1940). The relation of pitch to frequency: A revised scale. American Journal of Psychology 53(3), 329–353.

    Article  Google Scholar 

  15. S. Tibrewala and H. Hermansky (1997). Sub-band based recognition of noisy speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’97, Volume II, Munich, Germany, pp. 1255–1258.

    Google Scholar 

  16. S. Young, J. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland (2000). The HTK Book. Available from URL: http://htk.eng.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Higgins, J.E., Damper, R.I. (2001). An HMM-Based Subband Processing Approach to Speaker Identification. In: Bigun, J., Smeraldi, F. (eds) Audio- and Video-Based Biometric Person Authentication. AVBPA 2001. Lecture Notes in Computer Science, vol 2091. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45344-X_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-45344-X_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42216-7

  • Online ISBN: 978-3-540-45344-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics