Person Identification Based on Multichannel and Multimodality Fusion

  • Ming Liu
  • Hao Tang
  • Huazhong Ning
  • Thomas Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4122)


Person ID is a very useful information for high level video analysis and retrieval. In some scenario, the recording is not only multimodality and also multichannel(microphone array, camera array). In this paper, we describe a Multimodal person ID system base on multichannel and multimodal fusion. The audio only system is combining 7 channel microphone recording at decision output individual audio-only system. The modeling technique of audio system is Universal Background Model(UBM) and Maximum a Posterior adaptation framework which is very popular in speaker recognition literature. The visual only system works directly on the appearance space via l 1 norm and nearest neighbor classifier. The linear fusion is then combining the two modalities to improve the ID performance. The experiments indicate the effectiviness of micropohone array fusion and audio/visual fusion.


Face Recognition Face Image Gaussian Mixture Model Speaker Recognition Microphone Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Doddington, G.: Speaker recognition - identifying people by their voices, pp. 1651–1664 (1985)Google Scholar
  2. 2.
    Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)CrossRefGoogle Scholar
  3. 3.
    Furui, S.: An overview of speaker recognition technology, pp. 31–56 (1996)Google Scholar
  4. 4.
    Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: A literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech ’97, Rhodes, Greece, pp. 963–966 (1997)Google Scholar
  7. 7.
    Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing (2000)Google Scholar
  8. 8.
    Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia (to appear, 2000)Google Scholar
  9. 9.
    Garg, A., Potamianos, G., Neti, C., Huang, T.S.: Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. In: Proc. of international conference on Acoustics, Speech and Signal Processing (ICASSP) (2003)Google Scholar
  10. 10.
    Potamianos, G.: Audio-Visual Speech Recognition. In: Encyclopedia of Language and Linguistics (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Ming Liu
    • 1
  • Hao Tang
    • 1
  • Huazhong Ning
    • 1
  • Thomas Huang
    • 1
  1. 1.IFP Group, University of Illinois at Urbana-Champaign, Urbana, IL 61801 

Personalised recommendations