A New Manifold Representation for Visual Speech Recognition

  • Dahai Yu
  • Ovidiu Ghita
  • Alistair Sutherland
  • Paul F. Whelan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4673)


In this paper, we propose a new manifold representation capable of being applied for visual speech recognition. In this regard, the real time input video data is compressed using Principal Component Analysis (PCA) and the low-dimensional points calculated for each frame define the manifolds. Since the number of frames that from the video sequence is dependent on the word complexity, in order to use these manifolds for visual speech classification it is required to re-sample them into a fixed number of keypoints that are used as input for classification. In this paper two classification schemes, namely the k Nearest Neighbour (kNN) algorithm that is used in conjunction with the two-stage PCA and Hidden-Markov-Model (HMM) classifier are evaluated. The classification results for a group of English words indicate that the proposed approach is able to produce accurate classification results.


Visual speech recognition PCA manifolds spline interpolation k-Nearest Neighbour Hidden Markov Model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eveno, N., Caplier, A., Coulon, P.: Accurate and quasi-automatic lip tracking. IEEE Trans. Circuits Syst. Video Techn. 14(5), 706–715 (2004)CrossRefGoogle Scholar
  2. 2.
    Ghahramani, Z.: Machine Learning Toolbox, Version 1.0 01-04-96, University of TorontoGoogle Scholar
  3. 3.
    Tian, Y.L., Kanade, T.: Robust lip tracking by combining shape colour and motion. In: Proc. of the Asian Conference on Computer Vision, pp. 1040 –1045 (2000)Google Scholar
  4. 4.
    Nefian, A.V., Liang, L.H., Liu, X., Pi, X.: Audio-visual speech recognition. Intel Technology & Research (2002)Google Scholar
  5. 5.
    Eveno, N., Caplier, A., Coulon, P.Y.: A new color transformation for lips segmentation. In: IEEE Fourth Workshop on Multimedia Signal Processing, pp. 3–8, Cannes, France (2001)Google Scholar
  6. 6.
    Luettin, J., Thacker, N.A., Beet, S.W.: Active Shape Models for Visual Speech Feature Extraction. University of Sheffield, U.K., Tech. Rep. 95/44 (1995)Google Scholar
  7. 7.
    Roweis, S.: EM algorithms for PCA and SPCA. Advances in Neural Information Processing Systems 10, 626–632 (1998)Google Scholar
  8. 8.
    Cootes, T., Edwards, G., Taylor, C.: A comparative evaluation of active appearance model algorithms. In: Proc. of the British Machine Vision Conference, pp. 680–689 (1988)Google Scholar
  9. 9.
    Foo, S.W., Lian, Y.: Recognition of visual speech elements using adaptively boosted HMM. IEEE Trans. on Circuits Syst. Video Techn. 14(5), 693–705 (2004)CrossRefGoogle Scholar
  10. 10.
    Shamaie, A., Sutherland, A.: Accurate recognition of large number of hand gestures. In: Proc of Iranian Conference on Machine Vision and Image Processing, University of Technology, Tehran (2003)Google Scholar
  11. 11.
    Das, S.R., Wilson, R.C., Lazarewicz, M.T., Finkel, L.H.: Gait recognition by two-stage principal component analysis. Automatic Face and Gesture Recognition, pp. 579–584 (2006)Google Scholar
  12. 12.
    Gordan, M., Kotropoulos, C., Pitas, I.: Application of support vector machines classifiers to visual speech recognition. In: Proc. of the 2002 Int. Conf. on Image Processing (2002)Google Scholar
  13. 13.
    Hong, X.P., Yao, H.X., Wan, Y.Q., Chen, R.: A PCA based visual DCT feature extraction method for lip-reading. In: Proc. of Intelligent Information Hiding and Multimedia Signal Processing, pp. 321–326 (2006)Google Scholar
  14. 14.
    Yau, W.C., Kumar, D.K., Arjunan, S.P., Kumar, S.: Visual speech recognition using image moments and multi-resolution wavelet images. Computer Graphics, Imaging and Visualisation, pp. 194–199 (2006)Google Scholar
  15. 15.
    Cetingul, H.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Discriminative analysis of lip motion features for speaker identification and speech-reading. IEEE Trans. on Image Processing 15(10), 2879–2891 (2006)CrossRefGoogle Scholar
  16. 16.
    Dong, L., Foo, S.W., Lian, Y.: A two-channel training algorithm for Hidden Markov Model and its application to lip reading. EURASIP Journal on Applied Signal Processing 2005(9), 1382–1399 (2005)zbMATHCrossRefGoogle Scholar
  17. 17.
    Harvey, R., Matthews, I., Bangham, J.A., Cox, S.: Lip reading from scale-space measurements. In: Proc. of Computer Vision and Pattern Recognition, pp. 582–587 (1997)Google Scholar
  18. 18.
    Bregler, C., Omohundro, S.M.: Nonlinear manifold learning for visual speech recognition. In: Proc. of the International Conference on Computer Vision, pp. 494–499 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Dahai Yu
    • 1
  • Ovidiu Ghita
    • 1
  • Alistair Sutherland
    • 1
  • Paul F. Whelan
    • 1
  1. 1.School of Computing & Electronic Engineering, Vision Systems Group, Dublin City University, Dublin 9Ireland

Personalised recommendations