Skip to main content

Joint LBP and DCT Model for Visual Speech

  • Chapter
Affective Computing and Intelligent Interaction

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 137))

  • 183 Accesses

Abstract

The paper aims to establish a effective feature form of visual speech to realize the Chinese viseme recognition. We propose and discuss a representation model of the visual speech which bases on the local binary pattern (LBP) and the discrete cosine transform (DCT) of mouth images. The joint model combines the advantages of the local and global texture information together, which shows better performance than using the global feature only. By computing LBP and DCT of each mouth frame capturing during the subject speaking, the Hidden Markov Model (HMM) is trained based on the training dataset and is employed to recognize the new visual speech. The experiments show this visual speech feature model exhibits good performance in classifying the difference speaking states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ertan Çetingül, H., Yemez, Y., Erzin, E., Murat Tekalp, A.: Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading. IEEE Transactions on Image Processing 15 (December 2006)

    Google Scholar 

  2. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent advances in the automatic recognition of audio-visual speech. Proceedings of the IEEE 91 (September 2003)

    Google Scholar 

  3. El Aroussi, M., Amine, A., Ghouzali, S., Rziza, M., Aboutajdine, D.: Combining DCT and LBP Feature Sets For Effcient Face Recognition. Information and Communication Technologies (May 2008)

    Google Scholar 

  4. Lienharrt, R., Maydt, J.: An Extended Set of Haar-like Features for Rapid Object Detection. IEEE Transactions on Image Processing 1, 900–903 (2002)

    Google Scholar 

  5. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine Transform. IEEE Transactions Computers, 90–93 (1974)

    Google Scholar 

  6. Rao, K., Yip, P.: Discrete Cosine Transform - Algorithms, Advantages, Applications. Academic, NewYork (1990)

    MATH  Google Scholar 

  7. He, J., Zhang, H., Liu, J.: The extraction method in the DCT domain of lip reading for LDA feature. Computer Engineering and Applications (2009)

    Google Scholar 

  8. Ojala, T., Pietikainen, M., Harwood, D.: A comparative sludy of texture measures with classification based on feature distributions. Pattern Recognition 29, 51–59 (1996)

    Article  Google Scholar 

  9. Tang, H., Sun, Y., Yin, B., Ge, Y.: Expression-robust 3D face recognition using LBP representation. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 334–339 (2010)

    Google Scholar 

  10. Ojala, T., Pietikanen, M., Maenpaa, T.: Multire solution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on PAMI 24, 971–987 (2002)

    Article  Google Scholar 

  11. Lu, Q., Ping, Q.: Applying Stochastic Process Tutorial. Tsinghua University Press, Beijing (2004)

    Google Scholar 

  12. Wu, Z., Zhang, S., Cai, L., Meng, H.M.: Real-time synthesis of chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar. In: Conference on Spoken Language Processing, pp. 1802–1805 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

MeiXia, Z., XiBin, J. (2012). Joint LBP and DCT Model for Visual Speech. In: Luo, J. (eds) Affective Computing and Intelligent Interaction. Advances in Intelligent and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27866-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27866-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27865-5

  • Online ISBN: 978-3-642-27866-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics