Skip to main content

UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

Abstract

In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure Cwhile during the clustering process Cthe difference of log-likelihood scores between two speaker models is used as a speaker classification criterion. A complete system for NIST 2002 2-speaker task is presented using the methods mentioned above. Experimental results on NIST 2002 Switchboard Cellular speaker segmentation corpus, 1-speaker evaluation corpus and 2- speaker evaluation corpus show the potentiality of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)

    Google Scholar 

  2. Chen, S.S., Gopalakrishnan, P.S.: Speaker environment and channel change detection and clustering via the Bayesian Information Criterion. In: DARPA Speech Recognition Workshop (1998)

    Google Scholar 

  3. Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)

    MATH  Google Scholar 

  4. Gish, H., Siu, M.-H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 873–876 (1991)

    Google Scholar 

  5. Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Signal Processing Mag. 11, 18–32 (1994)

    Article  Google Scholar 

  6. Siegler, M.A., Jain, U., Raj, B., Stern, R.M.: Automatic segmentation classi®cation and clustering of broadcast news audio. In: DARPA Speech Recognition Workshop, pp. 97–99 (1997)

    Google Scholar 

  7. Campbell Jr., J.P.: Speaker recognition: A tutorial. Proc. IEEE 9(85), 1437–1462 (1997)

    Google Scholar 

  8. Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication (32), 111–126 (2000)

    Article  Google Scholar 

  9. Meignier, S., Bonastre, J.-F., Igounet, S.: E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: A Speaker Odyssey, Chania, Crete, June 2001, pp. 175–180 (2001)

    Google Scholar 

  10. Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y.: The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 89–92 (2003)

    Google Scholar 

  11. Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada (2004)

    Google Scholar 

  12. Wu, T., Lu, L., Chen, K., Zhang, H.: UBM-based real-time speaker segmentation for broadcasting news. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing ICASSP 2003 Hong Kong, China, vol. (2), pp. 193–196 (2003)

    Google Scholar 

  13. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (10), 19–41 (2000)

    Article  Google Scholar 

  14. Xiong, Z., Zheng, T.F., Song, Z., Wu, W.: Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM. In: Proc. ICASSP 2005, pp. 625–628 (2005)

    Google Scholar 

  15. http://www.nist.gov/speech/tests/spk/2002/resource/index.htm

  16. Bonastre, J.-F., Meignier, S., Merlin, T.: Speaker detection using multispeaker audio files for both enrollment and test. In: ICASSP 2003, Hong Kong, China (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deng, J., Zheng, T.F., Wu, W. (2006). UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_16

Download citation

  • DOI: https://doi.org/10.1007/11939993_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics