Audio-Video Sensor Fusion with Probabilistic Graphical Models

Beal, Matthew J.; Attias, Hagai; Jojic, Nebojsa

doi:10.1007/3-540-47969-4_49

Matthew J. Beal^7,8,
Hagai Attias⁷ &
Nebojsa Jojic⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2350))

Included in the following conference series:

European Conference on Computer Vision

3790 Accesses
13 Citations

Abstract

We present a new approach to modeling and processing multimedia data. This approach is based on graphical models that combine audio and video variables. We demonstrate it by developing a new algorithm for tracking a moving object in a cluttered, noisy scene using two microphones and a camera. Our model uses unobserved variables to describe the data in terms of the process that generates them. It is therefore able to capture and exploit the statistical structure of the audio and video data separately, as well as their mutual dependencies. Model parameters are learned from data via an EM algorithm, and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location from data. We demonstrate successful performance on multimedia clips captured in real world scenarios using off-the-shelf equipment.

Download to read the full chapter text

Chapter PDF

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Multivariate mutual information for audio video fusion

Article 06 April 2016

Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

H. Attias and C.E. Schreiner (1998), Blind source separation and deconvolution: the dynamic component analysis algorithm. Neural Computation 10, 1373–1424.
Article Google Scholar
H. Attias et al (2001), A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise. Proc. Eurospeech 2001.
Google Scholar
M. S. Brandstein (1999). Time-delay estimation of reverberant speech exploiting harmonic structure. Journal of the Accoustic Society of America 105(5), 2914–2919.
Article Google Scholar
C. Bregler and Y. Konig (1994). Eigenlips for robust speech recognition. Proc. ICASSP.
Google Scholar
B. Frey and N. Jojic (1999). Estimating mixture models of images and inferring spatial transformations using the EM algorithm. Proc. of IEEE Conf. on Computer Vision and Pattern Recognition.
Google Scholar
B. Frey and N. Jojic (2001). Fast, large-scale transformation-invariant clustering. Proc. of Neural Information Processing Systems, December 2001, Vancouver, BC, Canada.
Google Scholar
N. Jojic and B. Frey (2001). Learning flexible sprites in video layers. Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Maui, HI.
Google Scholar
Jordan, M.I. (Ed.) (1998). Learning in Graphical Models. MIT Press, Cambridge, MA.
MATH Google Scholar
J. Vermaak, M. Gagnet, A. Blake and P. Pérez (2001). Sequential Monte-Carlo fusion of sound and vision for speaker tracking. Proc. IEEE Intl. Conf. on Computer Vision.
Google Scholar
H. Wang and P. Chu (1997). Voice source localization for automatic camera pointing system in videoconferencing. Proc. ICASSP, 187–190.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, 1 Microsoft Way, Redmond, WA, 98052, USA
Matthew J. Beal, Hagai Attias & Nebojsa Jojic
Gatsby Computational Neuroscience Unit, University College London, 17 Queen Square, London, WC1N 3AR, UK
Matthew J. Beal

Authors

Matthew J. Beal
View author publications
You can also search for this author in PubMed Google Scholar
Hagai Attias
View author publications
You can also search for this author in PubMed Google Scholar
Nebojsa Jojic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Mathematical Sciences, Lund University, Box 118, 22100, Lund, Sweden
Anders Heyden & Gunnar Sparr &
The IT University of Copenhagen, Glentevej 67-69, 2400, Copenhagen, NW, Denmark
Mads Nielsen
University of Copenhagen, Universitetsparken 1, 2100, Copenhagen, Denmark
Peter Johansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beal, M.J., Attias, H., Jojic, N. (2002). Audio-Video Sensor Fusion with Probabilistic Graphical Models. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2350. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47969-4_49

Download citation

DOI: https://doi.org/10.1007/3-540-47969-4_49
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43745-1
Online ISBN: 978-3-540-47969-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Audio-Video Sensor Fusion with Probabilistic Graphical Models

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Multivariate mutual information for audio video fusion

Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Audio-Video Sensor Fusion with Probabilistic Graphical Models

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Resource Management for Sensor Fusion in Visual Tracking

Multivariate mutual information for audio video fusion

Context-Based Fusion of Physical and Human Data for Level 5 Information Fusion

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation