Tracking Multiple Speakers with Probabilistic Data Association Filters
In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. In particular, the TDOAs comprised the observation associated with an iterated extended Kalman filter (IEKF) whose state corresponds to the speaker position. In other work, we followed the same approach to develop a system that could use both audio and video information to track a moving lecturer. While these systems functioned well, their utility was limited to scenarios in which a single speaker was to be tracked. In this work, we seek to remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, we demonstrate that the JPDAF provides tracking performance superior to the IEKF.
KeywordsData Association Microphone Array Active Speaker Probabilistic Data Association Speaker Position
Unable to display preview. Download preview PDF.
- 1.Omologo, M., Svaizer, P.: Acoustic event localization using a crosspower-spectrum phase based technique. In: Proc. ICASSP, vol. 2, pp. 273–276 (1994)Google Scholar
- 3.Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (to appear)Google Scholar
- 5.Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York (2005)Google Scholar
- 6.Strobel, N., Spors, S., Rabenstein, R.: Joint audio-video signal processing for object localization and tracking. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, Springer, Heidelberg (2001)Google Scholar
- 7.Welch, G., Bishop, G.: SCAAT: Incremental tracking with incomplete information. In: Proc. Computer Graphics and Interactive Techniques (Aug. (1997)Google Scholar
- 8.Gennari, G., Hager, G.D.: Probabilistic data association methods in the visual tracking of groups. In: Proc. CVPR, pp. 1063–1069 (2004)Google Scholar
- 9.Bechler, D.: Akustische Sprecherlokalisation mit Hilfe eines Mikrofonarrays. Ph.D. dissertation, Universität Karlsruhe, Karlsruhe, Germany (2006)Google Scholar
- 11.Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proc. ICASSP, pp. I–605–608 (2004)Google Scholar
- 14.Armani, L., Matassoni, M., Omologo, M., Svaizer, P.: Use of a CSP-based voice activity detector for distant-talking ASR. In: Proc. Eurospeech, vol. 2, pp. 501–504 (2003)Google Scholar