Advertisement

Probabalistic Models and Informative Subspaces for Audiovisual Correspondence

  • John W. FisherIII
  • Trevor Darrell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2352)

Abstract

We propose a probabalistic model of single source multi-modal generation and show how algorithms for maximizing mutual information can find the correspondences between components of each signal. We show how non-parametric techniques for finding informative subspaces can capture the complex statistical relationship between signals in different modalities. We extend a previous technique for finding informative subspaces to include new priors on the projection weights, yielding more robust results. Applied to human speakers, our model can find the relationship between audio speech and video of facial motion, and partially segment out background events in both channels. We present new results on the problem of audio-visual verification, and show how the audio and video of a speaker can be matched even when no prior model of the speaker’s voice or appearance is available.

Keywords

Mutual Information Video Sequence Video Frame Audio Signal Information Theoretic Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Suzanna Becker. An Information-theoretic Unsupervised Learning Algorithm for Neural Networks. PhD thesis, University of Toronto, 1992.Google Scholar
  2. 2.
    Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., New York, 1991.zbMATHGoogle Scholar
  3. 3.
    G. Deco and D. Obradovic. An Information Theoretic Approach to Neural Computing. Springer-Verlag, New York, 1996.zbMATHGoogle Scholar
  4. 4.
    John W. Fisher III, Trevor Darrell, William T. Freeman, and Paul. Viola. Learning joint statistical models for audio-visual fusion and segregation. In Advances in Neural Information Processing Systems 13, 2000.Google Scholar
  5. 5.
    John W. Fisher III and Jose Principe. Entropy manipulation of arbitrary nonlinear mappings. In J.C. Principe, editor, Proc. IEEE Workshop, Neural Networks for Signal Processing VII, pages 14–23, 1997.Google Scholar
  6. 6.
    John W. Fisher III and Jose Principe. A methodology for information theoretic feature extraction. In A. Stuberud, editor, Proceedings of the IEEE International Joint Conference on Neural Networks, 1998.Google Scholar
  7. 7.
    John Hershey and Javier Movellan. Using audio-visual synchrony to locate sounds. In S. A. Solla, T. K. Leen, and K-R. Mller, editors, Advances in Neural Information Processing Systems 12, pages 813–819, 1999.Google Scholar
  8. 8.
    A. Mahalanobis, B. Kumar, and D. Casasent. Minimum average correlation energy filters. Applied Optics, 26(17):3633–3640, 1987.CrossRefGoogle Scholar
  9. 9.
    Uwe Meier, Rainer Stiefelhagen, Jie Yang, and Alex Waibel. Towards unrestricted lipreading. In Second International Conference on Multimodal Interfaces (ICMI99), 1999.Google Scholar
  10. 10.
    E. Parzen. On estimation of a probability density function and mode. Ann. of Math Stats., 33:1065–1076, 1962.MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    M. Plumbley. On information theory and unsupervised neural networks. Technical Report CUED/F-INFENG/TR. 78, Cambridge University Engineering Department, UK, 1991.Google Scholar
  12. 12.
    M. Plumbley and S Fallside. An information-theoretic approach to unsupervised connectionist models. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionists Models Summer School, pages 239–245. Morgan Kaufman, San Mateo, CA, 1988.Google Scholar
  13. 13.
    Malcolm Slaney and Michele Covell. Facesync: A linear operator for measuring synchronization of video facial images and audio tracks. In T. K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, 2000.Google Scholar
  14. 14.
    G. Wolff, K. V. Prasad, D. G. Stork, and M. Hennecke. Lipreading by neural networks: Visual preprocessing, learning and sensory integration. In Proc. of Neural Information Proc. Sys. NIPS-6, pages 1027–1034, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • John W. FisherIII
    • 1
  • Trevor Darrell
    • 1
  1. 1.Artficial Intelligence LaboratoryMassachusett Institute of TechnologyCambridgeUSA

Personalised recommendations