Probabalistic Models and Informative Subspaces for Audiovisual Correspondence

Fisher, John W.; Darrell, Trevor

doi:10.1007/3-540-47977-5_39

John W. Fisher III⁷ &
Trevor Darrell⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2352))

Included in the following conference series:

European Conference on Computer Vision

3024 Accesses
3 Citations

Abstract

We propose a probabalistic model of single source multi-modal generation and show how algorithms for maximizing mutual information can find the correspondences between components of each signal. We show how non-parametric techniques for finding informative subspaces can capture the complex statistical relationship between signals in different modalities. We extend a previous technique for finding informative subspaces to include new priors on the projection weights, yielding more robust results. Applied to human speakers, our model can find the relationship between audio speech and video of facial motion, and partially segment out background events in both channels. We present new results on the problem of audio-visual verification, and show how the audio and video of a speaker can be matched even when no prior model of the speaker’s voice or appearance is available.

Download to read the full chapter text

Chapter PDF

Model-Independent Analytic Nonlinear Blind Source Separation

Gaussian mixture model decomposition of multivariate signals

Article Open access 29 October 2021

Free Component Analysis: Theory, Algorithms and Applications

Article 11 April 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Suzanna Becker. An Information-theoretic Unsupervised Learning Algorithm for Neural Networks. PhD thesis, University of Toronto, 1992.
Google Scholar
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., New York, 1991.
MATH Google Scholar
G. Deco and D. Obradovic. An Information Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.
MATH Google Scholar
John W. Fisher III, Trevor Darrell, William T. Freeman, and Paul. Viola. Learning joint statistical models for audio-visual fusion and segregation. In Advances in Neural Information Processing Systems 13, 2000.
Google Scholar
John W. Fisher III and Jose Principe. Entropy manipulation of arbitrary nonlinear mappings. In J.C. Principe, editor, Proc. IEEE Workshop, Neural Networks for Signal Processing VII, pages 14–23, 1997.
Google Scholar
John W. Fisher III and Jose Principe. A methodology for information theoretic feature extraction. In A. Stuberud, editor, Proceedings of the IEEE International Joint Conference on Neural Networks, 1998.
Google Scholar
John Hershey and Javier Movellan. Using audio-visual synchrony to locate sounds. In S. A. Solla, T. K. Leen, and K-R. Mller, editors, Advances in Neural Information Processing Systems 12, pages 813–819, 1999.
Google Scholar
A. Mahalanobis, B. Kumar, and D. Casasent. Minimum average correlation energy filters. Applied Optics, 26(17):3633–3640, 1987.
Article Google Scholar
Uwe Meier, Rainer Stiefelhagen, Jie Yang, and Alex Waibel. Towards unrestricted lipreading. In Second International Conference on Multimodal Interfaces (ICMI99), 1999.
Google Scholar
E. Parzen. On estimation of a probability density function and mode. Ann. of Math Stats., 33:1065–1076, 1962.
Article MathSciNet MATH Google Scholar
M. Plumbley. On information theory and unsupervised neural networks. Technical Report CUED/F-INFENG/TR. 78, Cambridge University Engineering Department, UK, 1991.
Google Scholar
M. Plumbley and S Fallside. An information-theoretic approach to unsupervised connectionist models. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionists Models Summer School, pages 239–245. Morgan Kaufman, San Mateo, CA, 1988.
Google Scholar
Malcolm Slaney and Michele Covell. Facesync: A linear operator for measuring synchronization of video facial images and audio tracks. In T. K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, 2000.
Google Scholar
G. Wolff, K. V. Prasad, D. G. Stork, and M. Hennecke. Lipreading by neural networks: Visual preprocessing, learning and sensory integration. In Proc. of Neural Information Proc. Sys. NIPS-6, pages 1027–1034, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Artficial Intelligence Laboratory, Massachusett Institute of Technology, Cambridge, Massachusetts, USA
John W. Fisher III & Trevor Darrell

Authors

John W. Fisher III
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Darrell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Mathematical Sciences, Lund University, Box 118, 22100, Lund, Sweden
Anders Heyden & Gunnar Sparr &
The IT University of Copenhagen, Glentevej 67-69, 2400, Copenhagen, NW, Denmark
Mads Nielsen
University of Copenhagen, Universitetsparken 1, 2100, Copenhagen, Denmark
Peter Johansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fisher, J.W., Darrell, T. (2002). Probabalistic Models and Informative Subspaces for Audiovisual Correspondence. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47977-5_39

Download citation

DOI: https://doi.org/10.1007/3-540-47977-5_39
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43746-8
Online ISBN: 978-3-540-47977-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Probabalistic Models and Informative Subspaces for Audiovisual Correspondence

Abstract

Chapter PDF

Similar content being viewed by others

Model-Independent Analytic Nonlinear Blind Source Separation

Gaussian mixture model decomposition of multivariate signals

Free Component Analysis: Theory, Algorithms and Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Probabalistic Models and Informative Subspaces for Audiovisual Correspondence

Abstract

Chapter PDF

Similar content being viewed by others

Model-Independent Analytic Nonlinear Blind Source Separation

Gaussian mixture model decomposition of multivariate signals

Free Component Analysis: Theory, Algorithms and Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation