Person Identification Based on Multichannel and Multimodality Fusion
Person ID is a very useful information for high level video analysis and retrieval. In some scenario, the recording is not only multimodality and also multichannel(microphone array, camera array). In this paper, we describe a Multimodal person ID system base on multichannel and multimodal fusion. The audio only system is combining 7 channel microphone recording at decision output individual audio-only system. The modeling technique of audio system is Universal Background Model(UBM) and Maximum a Posterior adaptation framework which is very popular in speaker recognition literature. The visual only system works directly on the appearance space via l 1 norm and nearest neighbor classifier. The linear fusion is then combining the two modalities to improve the ID performance. The experiments indicate the effectiviness of micropohone array fusion and audio/visual fusion.
KeywordsFace Recognition Face Image Gaussian Mixture Model Speaker Recognition Microphone Array
Unable to display preview. Download preview PDF.
- 1.Doddington, G.: Speaker recognition - identifying people by their voices, pp. 1651–1664 (1985)Google Scholar
- 3.Furui, S.: An overview of speaker recognition technology, pp. 31–56 (1996)Google Scholar
- 6.Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech ’97, Rhodes, Greece, pp. 963–966 (1997)Google Scholar
- 7.Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing (2000)Google Scholar
- 8.Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia (to appear, 2000)Google Scholar
- 9.Garg, A., Potamianos, G., Neti, C., Huang, T.S.: Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. In: Proc. of international conference on Acoustics, Speech and Signal Processing (ICASSP) (2003)Google Scholar
- 10.Potamianos, G.: Audio-Visual Speech Recognition. In: Encyclopedia of Language and Linguistics (2005)Google Scholar