3D Audiovisual Person Tracking Using Kalman Filtering and Information Theory

  • Nikos Katsarakis
  • George Souretis
  • Fotios Talantzis
  • Aristodemos Pnevmatikakis
  • Lazaros Polymenakos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4122)


This paper proposes a system for tracking people in three dimensions, utilizing audiovisual information from multiple acoustic and video sensors. The proposed system comprises a video and an audio subsystem combined using a Kalman filter. The video subsystem combines in 3D a number of 2D trackers based on a variation of Stauffer’s adaptive background algorithm with spacio-temporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. The audio subsystem uses an information theoretic metric upon a pair of microphones to estimate the direction from which sound is arriving from. Combining measurements from a series of pairs the actual coordinate of the speaker in space is derived.


Kalman Filter Foreground Pixel Acoustic Source Microphone Array Time Delay Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Waibel, A., Steusloff, H., Stiefelhagen, R., et al.: Chill: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)Google Scholar
  2. 2.
    Pnevmatikakis, F., Talantzis, J., Soldatos, L., Polymenakos, L.: Robust Multimodal Audio-Visual Processing for Advanced Context Awareness in Smart Spaces. In: Artificial Intelligence Applications and Innovations, Peania, Greece (June 2006)Google Scholar
  3. 3.
    Strobel, N., Spors, S., Rabenstein, R.: Joint Audio-Video Signal Processing for Object Localization and Tracking. In: Brandstein, M., Ward, D. (eds.) Microphone Arrays, Springer, Heidelberg (2001)Google Scholar
  4. 4.
    Talantzis, F., Constantinides, A.G., Polymenakos, L.: Estimation of Direction of Arrival Using Information Theory. IEEE Signal Processing 12(8), 561–564 (2005)CrossRefGoogle Scholar
  5. 5.
    Talantzis, F., Constantinides, A.G., Polymenakos, L.: Real-Time Audio Source Localization Using Information Theory. In: Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006) (May 2006)Google Scholar
  6. 6.
    Zhang, Z.: A Flexible New Technique for Camera Calibration, Technical Report MSR-TR-98-71, Microsoft Research (Aug. 2002)Google Scholar
  7. 7.
    Pnevmatikakis, A., Polymenakos, L.: 2D Person Tracking Using Kalman Filtering and Adaptive Background Learning in a Feedback Loop. CLEAR 2006 (Apr. 2006)Google Scholar
  8. 8.
    Stauffer, C., Grimson, W.E.L.: Learning patterns of activity using real-time tracking. IEEE Trans. on Pattern Anal. and Machine Intel. 22(8), 747–757 (2000)CrossRefGoogle Scholar
  9. 9.
    Kaew Tra Kul Pong, P., Bowden, R.: An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection. In: Proc. 2nd European Workshop on Advanced Video Based Surveillance Systems (AVBS01) (Sept. 2001)Google Scholar
  10. 10.
    Landabaso, J.L., Pardas, M.: Foreground regions extraction and characterization towards real-time object tracking. In: Proceedings of Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI ’05) (July 2005)Google Scholar
  11. 11.
    Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Transactions of the ASME – Journal of Basic Engineering 82(Series D), 35–45 (1960)Google Scholar
  12. 12.
    Knapp, H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process. ASSP-24(4), 320–327 (1976)CrossRefGoogle Scholar
  13. 13.
    Bell, A.J., Sejnowski, T.: An information maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995)CrossRefGoogle Scholar
  14. 14.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)zbMATHGoogle Scholar
  15. 15.
    Benesty, J.: Adaptive eigenvalue decomposition algorithm for passive acoustic source Localization. Journal of the Acoustical Society of America 107(1), 384–391 (2000)CrossRefGoogle Scholar
  16. 16.
    Brandstein, M.S., Adcock, J.E., Silverman, H.: A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays. IEEE Trans. on Acoust. Speech and Sig. Proc. 5, 45–50 (1997)CrossRefGoogle Scholar
  17. 17.
    Mostefa, et al.: CLEAR Evaluation Plan. Document CHIL-CLEAR-V1.1-2006-02-21 (Feb. 2006)Google Scholar
  18. 18.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (March 2004)zbMATHGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Nikos Katsarakis
    • 1
  • George Souretis
    • 1
  • Fotios Talantzis
    • 1
  • Aristodemos Pnevmatikakis
    • 1
  • Lazaros Polymenakos
    • 1
  1. 1.Athens Information Technology, Autonomic and Grid Computing, P.O. Box 64, Markopoulou Ave., 19002 PeaniaGreece

Personalised recommendations