Skip to main content

Language–Independent Speaker Classification over a Far–Field Microphone

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

The speaker classification approach described in this contribution leverages the analysis of both speaker and verbal content information, so as to use two light-weight components for classification: a spectral matching component based on a global representation of the entire utterance, and a temporal alignment component based on more conventional frame-level evidence. The paradigm behind the spectral matching component is related to latent semantic mapping, which postulates that the underlying structure in the data is partially obscured by the randomness of local phenomena with respect to information extraction. Uncovering this latent structure results in a parsimonious continuous parameter description of feature frames and spectral bands, which then replaces the original parameterization in clustering and identification. Such global analysis can then be advantageously combined with elementary temporal alignment. This approach has been commercially deployed for the purpose of language-independent desktop voice login over a far-field microphone.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ariki, Y., Doi, K.: Speaker Recognition Based on Subspace Methods. In: Proc. Int. Conf. Spoken Lang. Proc. Yokohama, Japan, pp. 1859–1862 (September 1994)

    Google Scholar 

  2. Assaleh, K.T., Farrell, K.R., Zilovic, M.S., Sharma, M., Naik, D.K., Mammone, R.J.: Text–Dependent Speaker Verification Using Data Fusion and Channel Detection. In: Proc. SPIE, San Diego, CA, vol. 2277, pp. 72–82 (August 1994)

    Google Scholar 

  3. Bellegarda, J.R.: Exploiting Latent Semantic Information in Statistical Language Modeling. In: Juang, B.H., Furui, S. (eds.) Proc. IEEE, Spec. Issue Speech Recog. Understanding, vol. 88(8), pp. 1279–1296 (August 2000)

    Google Scholar 

  4. Bellegarda, J.R.: Latent Semantic Mapping. In: Deng, L., Wang, K., Chou, W. (eds.) Signal Proc. Magazine, Special Issue Speech Technol. Syst. Human–Machine Communication, vol. 22(5), pp. 70–80 (September 2005)

    Google Scholar 

  5. Bellegarda, J.R.: A Global Boundary–Centric Framework for Unit Selection Text–to–Speech Synthesis. IEEE Trans. Speech Audio Proc. vol. SAP–14(4) (July 2006)

    Google Scholar 

  6. Bellegarda, J.R., Naik, D., Neeracher, M., Silverman, K.E.A.: Language–Independent, Short–Enrollment Voice Verification over a Far–Field Microphone. In: Proc. 2001 IEEE Int. Conf. Acoust. Speech, Signal Proc. Salt Lake City, Utah (May 2001)

    Google Scholar 

  7. Campbell Jr., J.P.: Speaker Recognition: A Tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  8. Cullum, J.K., Willoughby, R.A.: Lanczos Algorithms for Large Symmetric Eigenvalue Computations – vol. 1 Theory, Ch. 5: Real Rectangular Matrices, Brickhauser, Boston (1985)

    Google Scholar 

  9. Doddington, G.: Speaker Recognition—Identifying People by their Voices, Proc. IEEE, vol. 73 (November 1985)

    Google Scholar 

  10. Golub, G., Van Loan, C.: Matrix Computations, 2nd edn. Johns Hopkins, Baltimore, MD (1989)

    MATH  Google Scholar 

  11. Higgins, A., Bahler, L., Porter, J.: Digital Signal Processing, 1, 89–106 (1991)

    Google Scholar 

  12. Li, Q., Juang, B.-H., Zhou, Q., Lee, C.-H.: Automatic Verbal Information Verification for User Authentification. IEEE Trans. Speech Acoust. Proc. 8(5), 585–596 (2000)

    Article  Google Scholar 

  13. Lindberg, J., Bloomberg, M.: Vulnerability in Speaker Verification—A Study of Technical Impostor Techniques. In: Proc. EuroSpeech, Budapest, Hungary, pp. 1211–1214 (September 1999)

    Google Scholar 

  14. Matsui, T., Furui, S.: Speaker Adaptation of Tied-Mixture-Based Phoneme Models for Text-Prompted Speaker Recognition. In: Proc. 1994 ICASSP, Adelaide, Australia, pp. 125–128 (April 1994)

    Google Scholar 

  15. Parthasaraty, S., Rosenberg, A.E.: General Phrase Speaker Verification Using Sub–Word Background Models and Likelihood–Ratio Scoring. In: Proc. Int. Conf. Spoken Language Proc. Philadelphia, PA (October 1996)

    Google Scholar 

  16. Rabiner, L.R., Juang, B.H., Lee, C.-H.: An Overview of Automatic Speech Recognition. In: Lee, C.-H., Soong, F.K., Paliwal, K.K. (eds.) Chapter 1 in Automatic Speech and Speaker Recognition: Advanced Topics, pp. 1–30. Kluwer Academic Publishers, Boston, MA (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bellegarda, J.R. (2007). Language–Independent Speaker Classification over a Far–Field Microphone. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74122-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74121-3

  • Online ISBN: 978-3-540-74122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics