UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection

Deng, Jing; Zheng, Thomas Fang; Wu, Wenhu

doi:10.1007/11939993_16

Jing Deng²²,
Thomas Fang Zheng²² &
Wenhu Wu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1594 Accesses
1 Citations

Abstract

In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure Cwhile during the clustering process Cthe difference of log-likelihood scores between two speaker models is used as a speaker classification criterion. A complete system for NIST 2002 2-speaker task is presented using the methods mentioned above. Experimental results on NIST 2002 Switchboard Cellular speaker segmentation corpus, 1-speaker evaluation corpus and 2- speaker evaluation corpus show the potentiality of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Speaker environment and channel change detection and clustering via the Bayesian Information Criterion. In: DARPA Speech Recognition Workshop (1998)
Google Scholar
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
MATH Google Scholar
Gish, H., Siu, M.-H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 873–876 (1991)
Google Scholar
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Signal Processing Mag. 11, 18–32 (1994)
Article Google Scholar
Siegler, M.A., Jain, U., Raj, B., Stern, R.M.: Automatic segmentation classi®cation and clustering of broadcast news audio. In: DARPA Speech Recognition Workshop, pp. 97–99 (1997)
Google Scholar
Campbell Jr., J.P.: Speaker recognition: A tutorial. Proc. IEEE 9(85), 1437–1462 (1997)
Google Scholar
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication (32), 111–126 (2000)
Article Google Scholar
Meignier, S., Bonastre, J.-F., Igounet, S.: E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: A Speaker Odyssey, Chania, Crete, June 2001, pp. 175–180 (2001)
Google Scholar
Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y.: The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 89–92 (2003)
Google Scholar
Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada (2004)
Google Scholar
Wu, T., Lu, L., Chen, K., Zhang, H.: UBM-based real-time speaker segmentation for broadcasting news. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing ICASSP 2003 Hong Kong, China, vol. (2), pp. 193–196 (2003)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (10), 19–41 (2000)
Article Google Scholar
Xiong, Z., Zheng, T.F., Song, Z., Wu, W.: Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM. In: Proc. ICASSP 2005, pp. 625–628 (2005)
Google Scholar
http://www.nist.gov/speech/tests/spk/2002/resource/index.htm
Bonastre, J.-F., Meignier, S., Merlin, T.: Speaker detection using multispeaker audio files for both enrollment and test. In: ICASSP 2003, Hong Kong, China (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Speech Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084
Jing Deng, Thomas Fang Zheng & Wenhu Wu

Authors

Jing Deng
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wenhu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, J., Zheng, T.F., Wu, W. (2006). UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_16

Download citation

DOI: https://doi.org/10.1007/11939993_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics