Skip to main content

The AIT Multimodal Person Identification System for CLEAR 2007

  • Conference paper
Multimodal Technologies for Perception of Humans (RT 2007, CLEAR 2007)

Abstract

This paper presents the person identification system developed at Athens Information Technology and its performance in the CLEAR 2007 evaluations. The system operates on the audiovisual information (speech and faces) collected over the duration of gallery and probe videos. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem. Audio recognition is based on the Gaussian Mixture modeling of the principal components of composite feature vectors, consisting of Mel-Frequency Cepstral Coefficients and Perceptual Linear Prediction coefficients of speech. Video recognition is based on combining three different classification algorithms: Principal Components Analysis with a modified Mahalanobis distance, sub-class Linear Discriminant Analysis (featuring automatic sub-class generation) with cosine distance and Bayesian classifier based on Gaussian modeling of intrapersonal differences. A nearest neighbor classification rule is applied. A decision fusion scheme across time and classifiers returns the video identity. The audiovisual subsystem fuses the unimodal identities into the multimodal one, using a suitable confidence metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Phillips, J., Flynn, P., Scruggs, T., Boyer, K., Worek, W.: Preliminary Face Recognition Grand Challenge Results. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, Southampton, UK, pp. 15–21 (2006)

    Google Scholar 

  2. http://www.clear-evaluation.org

  3. Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 Evaluation. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 1–44. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Ekenel, H., Pnevmatikakis, A.: Video-Based Face Recognition Evaluation in the CHIL Project – Run 1, Face and Gesture Recognition 2006, Southampton, UK, April 2006, pp. 85–90 (2006)

    Google Scholar 

  5. Waibe1, A., Steusloff, H., Stiefelhagen, R., et al.: CHIL: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)

    Google Scholar 

  6. Stergiou, A., Pnevmatikakis, A., Polymenakos, L.: A Decision Fusion System Across Time and Classifiers for Audio-Visual Person Identification. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 223–232. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. HTK (Hidden Markov Toolkit), http://htk.eng.cam.ac.uk/

  8. Weng, J., Evans, C.H., Hwang, W.-S.: An Incremental Learning Method for Face Recognition under Continuous Video Stream. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 251–256 (2000)

    Google Scholar 

  9. Lee, K.-C., Ho, J., Yang, M.-H., Kriegman, D.: Video-based face recognition using probabilistic appearance manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 313–320 (2003)

    Google Scholar 

  10. Liu, X., Chen, T.: Video-based face recognition using adaptive hidden markov models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 340–345 (2003)

    Google Scholar 

  11. Raytchev, B., Murase, H.: Unsupervised recognition of multi-view face sequences based on pairwise clustering with attraction and repulsion. Computer Vision and Image Understanding 91, 22–52 (2003)

    Article  Google Scholar 

  12. Aggarwal, G., Roy-Chowdhury, A.K., Chellappa, R.: A System Identification Approach for Video-based Face Recognition. In: Proceedings of International Conference on Pattern Recognition, Cambridge, UK (2004)

    Google Scholar 

  13. Xie, C., Vijaya Kumar, B.V.K., Palanivel, S., Yegnanarayana, B.: A Still-to-Video Face Verification System Using Advanced Correlation Filters. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072, pp. 102–108. Springer, Heidelberg (2004)

    Google Scholar 

  14. Pnevmatikakis, A., Polymenakos, L.: Far-Field Multi-Camera Video-to-Video Face Recognition. In: Delac, K., Grgic, M. (eds.) Face Recognition”, Advanced Robotics Systems, ISBN 978-3-902613-03-5

    Google Scholar 

  15. Fukunaga, K.: Statistical Pattern Recognition. Academic Press, London (1990)

    MATH  Google Scholar 

  16. Moghaddam, B.: Principal Manifolds and Probabilistic Subspaces for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(6) (2002)

    Google Scholar 

  17. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  18. Sohn, J., Kim, N.S., Sung, W.: A Statistical Model Based Voice Activity Detection. IEEE Sig. Proc. Letters 6(1) (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stergiou, A., Pnevmatikakis, A., Polymenakos, L. (2008). The AIT Multimodal Person Identification System for CLEAR 2007. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68585-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68584-5

  • Online ISBN: 978-3-540-68585-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics