Audio-Video Integration for Background Modelling
Abstract
This paper introduces a new concept of surveillance, namely, audio-visual data integration for background modelling. Actually, visual data acquired by a fixed camera can be easily supported by audio information allowing a more complete analysis of the monitored scene. The key idea is to build a multimodal model of the scene background, able to promptly detect single auditory or visual events, as well as simultaneous audio and visual foreground situations. In this way, it is also possible to tackle some open problems (e.g., the sleeping foreground problems) of standard visual surveillance systems, if they are also characterized by an audio foreground. The method is based on the probabilistic modelling of the audio and video data streams using separate sets of adaptive Gaussian mixture models, and on their integration using a coupled audio-video adaptive model working on the frame histogram, and the audio frequency spectrum. This framework has shown to be able to evaluate the time causality between visual and audio foreground entities. To the best of our knowledge, this is the first attempt to the multimodal modelling of scenes working on-line and using one static camera and only one microphone. Preliminary results show the effectiveness of the approach at facing problems still unsolved by only visual monitoring approaches.
Keywords
Gaussian Mixture Model Background Modelling Audio Signal Blind Source Separation Multimodal ModelReferences
- 1.PAMI: Special issue on video surveillance. IEEE Trans. on Pattern Analysis and Machine Intelligence 22 (2000)Google Scholar
- 2.Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Int. Conf. Computer Vision and Pattern Recognition, vol. 2 (1999)Google Scholar
- 3.Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallflower: Principles and practice of background maintenance. In: Int. Conf. Computer Vision, pp. 255–261 (1999)Google Scholar
- 4.Niebur, E., Hsiao, S., Johnson, K.: Synchrony: a neuronal mechanism for attentional selection? Current Opinion in Neurobiology, 190–194 (2002)Google Scholar
- 5.Stein, B., Meredith, M.: The Merging of the Senses. MIT Press, Cambridge (1993)Google Scholar
- 6.Checka, N., Wilson, K.: Person tracking using audio-video sensor fusion. Technical report, MIT Artificial Intelligence Laboratory (2002)Google Scholar
- 7.Zotkin, D., Duraiswami, R., Davis, L.: Joint audio-visual tracking using particle filters. EURASIP Journal of Applied Signal Processing 2002, 1154–1164 (2002)zbMATHCrossRefGoogle Scholar
- 8.Wilson, K., Checka, N., Demirdjian, D., Darrell, T.: Audio-video array source separation for perceptual user interfaces. In: Proceedings of Workshop on Perceptive User Interfaces (2001)Google Scholar
- 9.Darrell, T., Fisher, J., Wilson, K.: Geometric and statistical approaches to audiovisual segmentation for unthetered interaction. Technical report, CLASS Project (2002)Google Scholar
- 10.Hershey, J., Movellan, J.R.: Audio-vision: Using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems 12, pp. 813–819. MIT Press, Cambridge (2000)Google Scholar
- 11.Mason, M., Duric, Z.: Using histograms to detect and track objects in color video. In: The 30th IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2001), Washington, D.C., USA, pp. 154–159 (2001)Google Scholar
- 12.Bregman, A.: Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, London (1990)Google Scholar
- 13.Peltonen, V.: Computational auditory scene recognition. Master’s thesis, Tampere University of Tech., Finland (2001)Google Scholar
- 14.Cowling, M., Sitte, R.: Comparison of techniques for environmental sound recognition. Pattern Recognition Letters, 2895–2907 (2003)Google Scholar
- 15.Zhang, T., Kuo, C.: Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing 9, 441–457 (2001)CrossRefGoogle Scholar
- 16.Roweis, S.: One microphone source separation. In: Advances in Neural Information Processing Systems, pp. 793–799 (2000)Google Scholar
- 17.Hild II, K., Erdogmus, D., Principe, J.: On-line minimum mutual information method for time-varying blind source separation. In: Intl. Workshop on Independent Component Analysis and Signal Separation (ICA 2001), pp. 126–131 (2001)Google Scholar
- 18.Marple, S.: Digital Spectral Analysis, 2nd edn. Prentice-Hall, Englewood Cliffs (1987)Google Scholar