Abstract
We propose a probabalistic model of single source multi-modal generation and show how algorithms for maximizing mutual information can find the correspondences between components of each signal. We show how non-parametric techniques for finding informative subspaces can capture the complex statistical relationship between signals in different modalities. We extend a previous technique for finding informative subspaces to include new priors on the projection weights, yielding more robust results. Applied to human speakers, our model can find the relationship between audio speech and video of facial motion, and partially segment out background events in both channels. We present new results on the problem of audio-visual verification, and show how the audio and video of a speaker can be matched even when no prior model of the speaker’s voice or appearance is available.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Suzanna Becker. An Information-theoretic Unsupervised Learning Algorithm for Neural Networks. PhD thesis, University of Toronto, 1992.
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., New York, 1991.
G. Deco and D. Obradovic. An Information Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.
John W. Fisher III, Trevor Darrell, William T. Freeman, and Paul. Viola. Learning joint statistical models for audio-visual fusion and segregation. In Advances in Neural Information Processing Systems 13, 2000.
John W. Fisher III and Jose Principe. Entropy manipulation of arbitrary nonlinear mappings. In J.C. Principe, editor, Proc. IEEE Workshop, Neural Networks for Signal Processing VII, pages 14–23, 1997.
John W. Fisher III and Jose Principe. A methodology for information theoretic feature extraction. In A. Stuberud, editor, Proceedings of the IEEE International Joint Conference on Neural Networks, 1998.
John Hershey and Javier Movellan. Using audio-visual synchrony to locate sounds. In S. A. Solla, T. K. Leen, and K-R. Mller, editors, Advances in Neural Information Processing Systems 12, pages 813–819, 1999.
A. Mahalanobis, B. Kumar, and D. Casasent. Minimum average correlation energy filters. Applied Optics, 26(17):3633–3640, 1987.
Uwe Meier, Rainer Stiefelhagen, Jie Yang, and Alex Waibel. Towards unrestricted lipreading. In Second International Conference on Multimodal Interfaces (ICMI99), 1999.
E. Parzen. On estimation of a probability density function and mode. Ann. of Math Stats., 33:1065–1076, 1962.
M. Plumbley. On information theory and unsupervised neural networks. Technical Report CUED/F-INFENG/TR. 78, Cambridge University Engineering Department, UK, 1991.
M. Plumbley and S Fallside. An information-theoretic approach to unsupervised connectionist models. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionists Models Summer School, pages 239–245. Morgan Kaufman, San Mateo, CA, 1988.
Malcolm Slaney and Michele Covell. Facesync: A linear operator for measuring synchronization of video facial images and audio tracks. In T. K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, 2000.
G. Wolff, K. V. Prasad, D. G. Stork, and M. Hennecke. Lipreading by neural networks: Visual preprocessing, learning and sensory integration. In Proc. of Neural Information Proc. Sys. NIPS-6, pages 1027–1034, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fisher, J.W., Darrell, T. (2002). Probabalistic Models and Informative Subspaces for Audiovisual Correspondence. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47977-5_39
Download citation
DOI: https://doi.org/10.1007/3-540-47977-5_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43746-8
Online ISBN: 978-3-540-47977-2
eBook Packages: Springer Book Archive