Abstract
This paper presents a speaker-independent audio-visual digit recognition system that utilizes speech and visual lip signals. The extracted visual features are based on line-motion estimation obtained from video sequences with low resolution (128 ×128 pixels) to increase the robustness of audio recognition. The core experiments investigate lip motion biometrics as stand-alone as well as merged modality in speech recognition system. It uses Support Vector Machines, showing favourable experimental results with digit recognition featuring 83% to 100% on the XM2VTS database depending on the amount of available visual information.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Brunelli, K.R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)
Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)
Duc, B., Fischer, S., Bigun, J.: Face authentication with sparse grid gabor information. IEEE International Conference Acoustics, Speech, and Signal Processing 4(21), 3053–3056 (1997)
Tang, X., Li, X.: Video based face recognition using multiple classifiers. In: FGR 2004. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 345–349. IEEE Computer Society Press, Los Alamitos (2004)
Faraj, M.I., Bigun, J.: Speaker and speech recognition by audio-visual lip biometrics. In: The 2nd International Conference on Biometrics, Seoul Korea (2007)
Faraj, M.I., Bigun, J.: Person verification by lip-motion. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pp. 37–45 (2006)
Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Article accepted for publication in Pattern Recognition Letters – 2007 (2007)
Luettin, J., Maitre, G.: Evaluation protocol for the extended m2vts database xm2vtsdb 1998. In: IDIAP Communication 98-054, Technical report R R-21, number = IDIAP (1998)
Dieckmann, U., Plankensteiner, P., Wagner, T.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 301–310. Springer, Heidelberg (1997)
Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 319–326. Springer, Heidelberg (1997)
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)
Liang, L., Zhao, X.L.Y., Pi, X., Nefian, A.: Speaker independent audio-visual continuous speech recognition. In: ICME 2002. Proceedings of IEEE International Conference on Multimedia and Expo, 2002, vol. 2, pp. 26–29 (2002)
Kollreider, K., Fronthaler, H., Bigun, J.: Evaluating liveness by face images and the structure tensor. In: AutoID 2005. Fourth Workshop on Automatic Identification Advanced Technologies, pp. 75–80. IEEE Computer Society Press, Los Alamitos (2005)
Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis of optical flow. IEEE-Trans Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991)
Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8(2), 155–173 (1978)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The htk book (for htk version 3.0) (2000), http://htk.eng.cam.ac.uk/docs/docs.shtml
Chang, C.C., Lin, C.J.: Libsvm–a library for support vector machines (2001), software available at www.csie.ntu.edu.tw/~cjlin/libsvm
Messer, K., Matas, J., Kittler, J., Luettin, J.: Xm2vtsdb: The extended m2vts database. In: ICSLP 1996. Second International Conference of Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faraj, M.I., Bigun, J. (2007). Lip Biometrics for Digit Recognition. In: Kropatsch, W.G., Kampel, M., Hanbury, A. (eds) Computer Analysis of Images and Patterns. CAIP 2007. Lecture Notes in Computer Science, vol 4673. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-74272-2_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74271-5
Online ISBN: 978-3-540-74272-2
eBook Packages: Computer ScienceComputer Science (R0)