Lip Biometrics for Digit Recognition

  • Maycel Isaac Faraj
  • Josef Bigun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4673)


This paper presents a speaker-independent audio-visual digit recognition system that utilizes speech and visual lip signals. The extracted visual features are based on line-motion estimation obtained from video sequences with low resolution (128 ×128 pixels) to increase the robustness of audio recognition. The core experiments investigate lip motion biometrics as stand-alone as well as merged modality in speech recognition system. It uses Support Vector Machines, showing favourable experimental results with digit recognition featuring 83% to 100% on the XM2VTS database depending on the amount of available visual information.


Support Vector Machine Speech Recognition System Audiovisual Speech Person Authentication Digit Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)CrossRefGoogle Scholar
  2. 2.
    Brunelli, K.R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)CrossRefGoogle Scholar
  3. 3.
    Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)CrossRefGoogle Scholar
  4. 4.
    Duc, B., Fischer, S., Bigun, J.: Face authentication with sparse grid gabor information. IEEE International Conference Acoustics, Speech, and Signal Processing 4(21), 3053–3056 (1997)Google Scholar
  5. 5.
    Tang, X., Li, X.: Video based face recognition using multiple classifiers. In: FGR 2004. Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 345–349. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  6. 6.
    Faraj, M.I., Bigun, J.: Speaker and speech recognition by audio-visual lip biometrics. In: The 2nd International Conference on Biometrics, Seoul Korea (2007)Google Scholar
  7. 7.
    Faraj, M.I., Bigun, J.: Person verification by lip-motion. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pp. 37–45 (2006)Google Scholar
  8. 8.
    Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Article accepted for publication in Pattern Recognition Letters – 2007 (2007)Google Scholar
  9. 9.
    Luettin, J., Maitre, G.: Evaluation protocol for the extended m2vts database xm2vtsdb 1998. In: IDIAP Communication 98-054, Technical report R R-21, number = IDIAP (1998)Google Scholar
  10. 10.
    Dieckmann, U., Plankensteiner, P., Wagner, T.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 301–310. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  11. 11.
    Jourlin, P., Luettin, J., Genoud, D., Wassner, H.: Acoustic-labial speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 319–326. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  12. 12.
    Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Liang, L., Zhao, X.L.Y., Pi, X., Nefian, A.: Speaker independent audio-visual continuous speech recognition. In: ICME 2002. Proceedings of IEEE International Conference on Multimedia and Expo, 2002, vol. 2, pp. 26–29 (2002)Google Scholar
  14. 14.
    Kollreider, K., Fronthaler, H., Bigun, J.: Evaluating liveness by face images and the structure tensor. In: AutoID 2005. Fourth Workshop on Automatic Identification Advanced Technologies, pp. 75–80. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  15. 15.
    Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis of optical flow. IEEE-Trans Pattern Analysis and Machine Intelligence 13(8), 775–790 (1991)CrossRefGoogle Scholar
  16. 16.
    Granlund, G.H.: In search of a general picture processing operator. Computer Graphics and Image Processing 8(2), 155–173 (1978)CrossRefGoogle Scholar
  17. 17.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  18. 18.
    Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The htk book (for htk version 3.0) (2000),
  19. 19.
    Chang, C.C., Lin, C.J.: Libsvm–a library for support vector machines (2001), software available at
  20. 20.
    Messer, K., Matas, J., Kittler, J., Luettin, J.: Xm2vtsdb: The extended m2vts database. In: ICSLP 1996. Second International Conference of Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Maycel Isaac Faraj
    • 1
  • Josef Bigun
    • 1
  1. 1.Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad University, Box 823, SE-301 18 Halmstad 

Personalised recommendations