Abstract
This paper presents the person identification system developed at Athens Information Technology and its performance in the CLEAR 2007 evaluations. The system operates on the audiovisual information (speech and faces) collected over the duration of gallery and probe videos. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem. Audio recognition is based on the Gaussian Mixture modeling of the principal components of composite feature vectors, consisting of Mel-Frequency Cepstral Coefficients and Perceptual Linear Prediction coefficients of speech. Video recognition is based on combining three different classification algorithms: Principal Components Analysis with a modified Mahalanobis distance, sub-class Linear Discriminant Analysis (featuring automatic sub-class generation) with cosine distance and Bayesian classifier based on Gaussian modeling of intrapersonal differences. A nearest neighbor classification rule is applied. A decision fusion scheme across time and classifiers returns the video identity. The audiovisual subsystem fuses the unimodal identities into the multimodal one, using a suitable confidence metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Phillips, J., Flynn, P., Scruggs, T., Boyer, K., Worek, W.: Preliminary Face Recognition Grand Challenge Results. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, Southampton, UK, pp. 15–21 (2006)
Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 Evaluation. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 1–44. Springer, Heidelberg (2007)
Ekenel, H., Pnevmatikakis, A.: Video-Based Face Recognition Evaluation in the CHIL Project – Run 1, Face and Gesture Recognition 2006, Southampton, UK, April 2006, pp. 85–90 (2006)
Waibe1, A., Steusloff, H., Stiefelhagen, R., et al.: CHIL: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)
Stergiou, A., Pnevmatikakis, A., Polymenakos, L.: A Decision Fusion System Across Time and Classifiers for Audio-Visual Person Identification. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 223–232. Springer, Heidelberg (2007)
HTK (Hidden Markov Toolkit), http://htk.eng.cam.ac.uk/
Weng, J., Evans, C.H., Hwang, W.-S.: An Incremental Learning Method for Face Recognition under Continuous Video Stream. In: Proceedings of IEEE Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 251–256 (2000)
Lee, K.-C., Ho, J., Yang, M.-H., Kriegman, D.: Video-based face recognition using probabilistic appearance manifolds. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 313–320 (2003)
Liu, X., Chen, T.: Video-based face recognition using adaptive hidden markov models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, pp. 340–345 (2003)
Raytchev, B., Murase, H.: Unsupervised recognition of multi-view face sequences based on pairwise clustering with attraction and repulsion. Computer Vision and Image Understanding 91, 22–52 (2003)
Aggarwal, G., Roy-Chowdhury, A.K., Chellappa, R.: A System Identification Approach for Video-based Face Recognition. In: Proceedings of International Conference on Pattern Recognition, Cambridge, UK (2004)
Xie, C., Vijaya Kumar, B.V.K., Palanivel, S., Yegnanarayana, B.: A Still-to-Video Face Verification System Using Advanced Correlation Filters. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072, pp. 102–108. Springer, Heidelberg (2004)
Pnevmatikakis, A., Polymenakos, L.: Far-Field Multi-Camera Video-to-Video Face Recognition. In: Delac, K., Grgic, M. (eds.) Face Recognition”, Advanced Robotics Systems, ISBN 978-3-902613-03-5
Fukunaga, K.: Statistical Pattern Recognition. Academic Press, London (1990)
Moghaddam, B.: Principal Manifolds and Probabilistic Subspaces for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(6) (2002)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Sohn, J., Kim, N.S., Sung, W.: A Statistical Model Based Voice Activity Detection. IEEE Sig. Proc. Letters 6(1) (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stergiou, A., Pnevmatikakis, A., Polymenakos, L. (2008). The AIT Multimodal Person Identification System for CLEAR 2007. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)