Abstract
The paper aims to establish a effective feature form of visual speech to realize the Chinese viseme recognition. We propose and discuss a representation model of the visual speech which bases on the local binary pattern (LBP) and the discrete cosine transform (DCT) of mouth images. The joint model combines the advantages of the local and global texture information together, which shows better performance than using the global feature only. By computing LBP and DCT of each mouth frame capturing during the subject speaking, the Hidden Markov Model (HMM) is trained based on the training dataset and is employed to recognize the new visual speech. The experiments show this visual speech feature model exhibits good performance in classifying the difference speaking states.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ertan Çetingül, H., Yemez, Y., Erzin, E., Murat Tekalp, A.: Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading. IEEE Transactions on Image Processing 15 (December 2006)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEEÂ 91 (September 2003)
El Aroussi, M., Amine, A., Ghouzali, S., Rziza, M., Aboutajdine, D.: Combining DCT and LBP Feature Sets For Effcient Face Recognition. Information and Communication Technologies (May 2008)
Lienharrt, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. IEEE Transactions on Image Processing 1, 900–903 (2002)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine Transform. IEEE Transactions Computers, 90–93 (1974)
Rao, K., Yip, P.: Discrete Cosine Transform - Algorithms, Advantages, Applications. Academic, NewYork (1990)
He, J., Zhang, H., Liu, J.: The extraction method in the DCT domain of lip reading for LDA feature. Computer Engineering and Applications (2009)
Ojala, T., Pietikainen, M., Harwood, D.: A comparative sludy of texture measures with classification based on feature distributions. Pattern Recognition 29, 51–59 (1996)
Tang, H., Sun, Y., Yin, B., Ge, Y.: Expression-robust 3D face recognition using LBP representation. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 334–339 (2010)
Ojala, T., Pietikanen, M., Maenpaa, T.: Multire solution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on PAMI 24, 971–987 (2002)
Lu, Q., Ping, Q.: Applying Stochastic Process Tutorial. Tsinghua University Press, Beijing (2004)
Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar. In: Conference on Spoken Language Processing, pp. 1802–1805 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
MeiXia, Z., XiBin, J. (2012). Joint LBP and DCT Model for Visual Speech. In: Luo, J. (eds) Affective Computing and Intelligent Interaction. Advances in Intelligent and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27866-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-27866-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27865-5
Online ISBN: 978-3-642-27866-2
eBook Packages: EngineeringEngineering (R0)