Skip to main content

Multimodal Speaker Identification Based on Text and Speech

  • Conference paper
  • 1300 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5372))

Abstract

This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker’s utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker’s vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) are extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Campbell, P.J.: Speaker recognition: A turorial. Proceedings of the IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  2. Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Proc. Eurospeech, pp. 2521–2524 (2001)

    Google Scholar 

  3. Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: Using prosodic and lexical information for speaker identification. In: Proc. 2002 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. I, pp. 141–144 (2002)

    Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. 22nd Annual Int. Conf. Research and Development in Information Retrieval (SIGIR 1999), pp. 50–57 (1999)

    Google Scholar 

  5. Davies, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions Acoustic, Speech, and Signal Processing 31, 793–807 (1983)

    Article  Google Scholar 

  6. Schildt, H.: C++ The Complete Reference, 4th edn. McGraw-Hill, New York (2002)

    Google Scholar 

  7. Strassel, S., Walker, C., Lee, H.: RT-03 MDE Training Data Text and Annotations, Linguistic Data Consortium (LDC), Philadelphia (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moschonas, P., Kotropoulos, C. (2008). Multimodal Speaker Identification Based on Text and Speech. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds) Biometrics and Identity Management. BioID 2008. Lecture Notes in Computer Science, vol 5372. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89991-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89991-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89990-7

  • Online ISBN: 978-3-540-89991-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics