Joint LBP and DCT Model for Visual Speech

MeiXia, Zheng; XiBin, Jia

doi:10.1007/978-3-642-27866-2_13

Zheng MeiXia³ &
Jia XiBin^3,4

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 137))

183 Accesses

Abstract

The paper aims to establish a effective feature form of visual speech to realize the Chinese viseme recognition. We propose and discuss a representation model of the visual speech which bases on the local binary pattern (LBP) and the discrete cosine transform (DCT) of mouth images. The joint model combines the advantages of the local and global texture information together, which shows better performance than using the global feature only. By computing LBP and DCT of each mouth frame capturing during the subject speaking, the Hidden Markov Model (HMM) is trained based on the training dataset and is employed to recognize the new visual speech. The experiments show this visual speech feature model exhibits good performance in classifying the difference speaking states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ertan Çetingül, H., Yemez, Y., Erzin, E., Murat Tekalp, A.: Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading. IEEE Transactions on Image Processing 15 (December 2006)
Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91 (September 2003)
Google Scholar
El Aroussi, M., Amine, A., Ghouzali, S., Rziza, M., Aboutajdine, D.: Combining DCT and LBP Feature Sets For Effcient Face Recognition. Information and Communication Technologies (May 2008)
Google Scholar
Lienharrt, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. IEEE Transactions on Image Processing 1, 900–903 (2002)
Google Scholar
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine Transform. IEEE Transactions Computers, 90–93 (1974)
Google Scholar
Rao, K., Yip, P.: Discrete Cosine Transform - Algorithms, Advantages, Applications. Academic, NewYork (1990)
MATH Google Scholar
He, J., Zhang, H., Liu, J.: The extraction method in the DCT domain of lip reading for LDA feature. Computer Engineering and Applications (2009)
Google Scholar
Ojala, T., Pietikainen, M., Harwood, D.: A comparative sludy of texture measures with classification based on feature distributions. Pattern Recognition 29, 51–59 (1996)
Article Google Scholar
Tang, H., Sun, Y., Yin, B., Ge, Y.: Expression-robust 3D face recognition using LBP representation. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 334–339 (2010)
Google Scholar
Ojala, T., Pietikanen, M., Maenpaa, T.: Multire solution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on PAMI 24, 971–987 (2002)
Article Google Scholar
Lu, Q., Ping, Q.: Applying Stochastic Process Tutorial. Tsinghua University Press, Beijing (2004)
Google Scholar
Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar. In: Conference on Spoken Language Processing, pp. 1802–1805 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Technology, Beijing, China
Zheng MeiXia & Jia XiBin
Multimedia and Intelligent Software Technology, Beijing Municipal Key Laboratory, Beijing, China
Jia XiBin

Authors

Zheng MeiXia
View author publications
You can also search for this author in PubMed Google Scholar
Jia XiBin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Kinmen Institute of Technology, Jinning Township 892, Kinmen, Taiwan R.O.C.
Jia Luo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

MeiXia, Z., XiBin, J. (2012). Joint LBP and DCT Model for Visual Speech. In: Luo, J. (eds) Affective Computing and Intelligent Interaction. Advances in Intelligent and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27866-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-27866-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27865-5
Online ISBN: 978-3-642-27866-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics