Tracking Multiple Speakers with Probabilistic Data Association Filters

  • Tobias Gehrig
  • John McDonough
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4122)


In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. In particular, the TDOAs comprised the observation associated with an iterated extended Kalman filter (IEKF) whose state corresponds to the speaker position. In other work, we followed the same approach to develop a system that could use both audio and video information to track a moving lecturer. While these systems functioned well, their utility was limited to scenarios in which a single speaker was to be tracked. In this work, we seek to remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, we demonstrate that the JPDAF provides tracking performance superior to the IEKF.


Data Association Microphone Array Active Speaker Probabilistic Data Association Speaker Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Omologo, M., Svaizer, P.: Acoustic event localization using a crosspower-spectrum phase based technique. In: Proc. ICASSP, vol. 2, pp. 273–276 (1994)Google Scholar
  2. 2.
    Kay, S.: Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Englewood Cliffs (1993)zbMATHGoogle Scholar
  3. 3.
    Klee, U., Gehrig, T., McDonough, J.: Kalman filters for time delay of arrival-based source localization. Journal of Advanced Signal Processing, Special Issue on Multi-Channel Speech Processing (to appear)Google Scholar
  4. 4.
    Brandstein, M.S., Adcock, J.E., Silverman, H.F.: A closed-form location estimator for use with room environment microphone arrays. IEEE Trans. Speech Audio Proc. 5(1), 45–50 (1997)CrossRefGoogle Scholar
  5. 5.
    Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: Proc. Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York (2005)Google Scholar
  6. 6.
    Strobel, N., Spors, S., Rabenstein, R.: Joint audio-video signal processing for object localization and tracking. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, Springer, Heidelberg (2001)Google Scholar
  7. 7.
    Welch, G., Bishop, G.: SCAAT: Incremental tracking with incomplete information. In: Proc. Computer Graphics and Interactive Techniques (Aug. (1997)Google Scholar
  8. 8.
    Gennari, G., Hager, G.D.: Probabilistic data association methods in the visual tracking of groups. In: Proc. CVPR, pp. 1063–1069 (2004)Google Scholar
  9. 9.
    Bechler, D.: Akustische Sprecherlokalisation mit Hilfe eines Mikrofonarrays. Ph.D. dissertation, Universität Karlsruhe, Karlsruhe, Germany (2006)Google Scholar
  10. 10.
    Bar-Shalom, Y., Fortmann, T.E.: Tracking and Data Association. Academic Press, San Diego (1988)zbMATHGoogle Scholar
  11. 11.
    Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proc. ICASSP, pp. I–605–608 (2004)Google Scholar
  12. 12.
    Chen, J., Benesty, J., Huang, Y.A.: Robust time delay estimation exploiting redundancy among multiple microphones. IEEE Trans. Speech Audio Proc. 11(6), 549–557 (2003)CrossRefGoogle Scholar
  13. 13.
    Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Academic Press, New York (1970)zbMATHGoogle Scholar
  14. 14.
    Armani, L., Matassoni, M., Omologo, M., Svaizer, P.: Use of a CSP-based voice activity detector for distant-talking ASR. In: Proc. Eurospeech, vol. 2, pp. 501–504 (2003)Google Scholar
  15. 15.
    Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Analysis Machine Intel. 22, 1330–1334 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Tobias Gehrig
    • 1
  • John McDonough
    • 1
  1. 1.Institut für Theoretische Informatik, Universität Karlsruhe, Am Fasanengarten 5, 76131 KarlsruheGermany

Personalised recommendations