Abstract
In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure Cwhile during the clustering process Cthe difference of log-likelihood scores between two speaker models is used as a speaker classification criterion. A complete system for NIST 2002 2-speaker task is presented using the methods mentioned above. Experimental results on NIST 2002 Switchboard Cellular speaker segmentation corpus, 1-speaker evaluation corpus and 2- speaker evaluation corpus show the potentiality of the proposed algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
Chen, S.S., Gopalakrishnan, P.S.: Speaker environment and channel change detection and clustering via the Bayesian Information Criterion. In: DARPA Speech Recognition Workshop (1998)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. Series in Computer Science, vol. 15, ch. 3. World Scientific, Singapore (1989)
Gish, H., Siu, M.-H., Rohlicek, R.: Segregation of speakers for speech recognition and speaker identification. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 873–876 (1991)
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Signal Processing Mag. 11, 18–32 (1994)
Siegler, M.A., Jain, U., Raj, B., Stern, R.M.: Automatic segmentation classi®cation and clustering of broadcast news audio. In: DARPA Speech Recognition Workshop, pp. 97–99 (1997)
Campbell Jr., J.P.: Speaker recognition: A tutorial. Proc. IEEE 9(85), 1437–1462 (1997)
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication (32), 111–126 (2000)
Meignier, S., Bonastre, J.-F., Igounet, S.: E-HMM approach for learning and adapting sound models for speaker indexing. In: 2001: A Speaker Odyssey, Chania, Crete, June 2001, pp. 175–180 (2001)
Moraru, D., Meignier, S., Besacier, L., Bonastre, J.-F., Magrin-Chagnolleau, Y.: The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2003), Hong Kong, pp. 89–92 (2003)
Moraru, D., Meignier, S., Fredouille, C., Besacier, L., Bonastre, J.-F.: The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. In: Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP 2004), Montreal, Canada (2004)
Wu, T., Lu, L., Chen, K., Zhang, H.: UBM-based real-time speaker segmentation for broadcasting news. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing ICASSP 2003 Hong Kong, China, vol. (2), pp. 193–196 (2003)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing (10), 19–41 (2000)
Xiong, Z., Zheng, T.F., Song, Z., Wu, W.: Combining Selection Tree with Observation Reordering Pruning for Efficient Speaker Identification Using GMM-UBM. In: Proc. ICASSP 2005, pp. 625–628 (2005)
http://www.nist.gov/speech/tests/spk/2002/resource/index.htm
Bonastre, J.-F., Meignier, S., Merlin, T.: Speaker detection using multispeaker audio files for both enrollment and test. In: ICASSP 2003, Hong Kong, China (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deng, J., Zheng, T.F., Wu, W. (2006). UBM Based Speaker Segmentation and Clustering for 2-Speaker Detection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_16
Download citation
DOI: https://doi.org/10.1007/11939993_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)