Skip to main content

Recognition of Visual Speech Elements Using Hidden Markov Models

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing — PCM 2002 (PCM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2532))

Included in the following conference series:

Abstract

In this paper, a novel subword lip reading system using continuous Hidden Markov Models (HMMs) is presented. The constituent HMMs are configured according to the statistical features of lip motion and trained with the Baum-Welch method. The performance of the proposed system in identifying the fourteen visemes defined in MPEG-4 standards is addressed. Experiment results show that an average accuracy above 80% can be achieved using the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. McGurk and J. MacDonald: Hearing lips and seeing voices, Nature, (1976) 748–756

    Google Scholar 

  2. W. Sumby and I. Pollack: Visual contributions to speech intelligibility in noise, J. Acoust. Soc. Amer. (1954)

    Google Scholar 

  3. M. Kass, A. Witkin and D. Terzopoulus: Snakes: Active contour models, International Journal of Computer Vision, (1988) 321–331

    Google Scholar 

  4. Tsuhan Chen and Ram R. Rao: audio-visual Integration in Multimodal Communication, Proc. IEEE, Vol. 86, No.5, (1998) 837–852

    Article  Google Scholar 

  5. C. Bregler and S. Omohundro: Nonlinear manifold learning for visual speech recognition, Proc. IEEE ICCV, (1995) 494–499

    Google Scholar 

  6. Alan L. Yuille, David S. Cohen and Peter W. Hallinan: Feature extraction from faces using deformable templates, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (1989) 104–109

    Google Scholar 

  7. M. E. Hennecke, K. V. Prasad and D. G. Stork: Using deformable templates to infer visual speech dynamics, Technical report, Ricoh California Research Center, (1994)

    Google Scholar 

  8. L. R. Rabiner: A tutorial on Hidden Markov Models and selected applications in speech recognition, Proc. IEEE, Vol. 77, No. 2, (1989) 257–286

    Article  Google Scholar 

  9. Y. Wu, A. Ganapathiraju and J. Picone: Report for Baum-Welch Re-estimation of Hidden Markov Model, Institute for Signal and Information Processing, (1999)

    Google Scholar 

  10. M. Tekalp and J. Ostermann: Face and 2-D mesh animation in MPEG-4, Image Communication J. (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Foo, S.W., Dong, L. (2002). Recognition of Visual Speech Elements Using Hidden Markov Models. In: Chen, YC., Chang, LW., Hsu, CT. (eds) Advances in Multimedia Information Processing — PCM 2002. PCM 2002. Lecture Notes in Computer Science, vol 2532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36228-2_75

Download citation

  • DOI: https://doi.org/10.1007/3-540-36228-2_75

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00262-8

  • Online ISBN: 978-3-540-36228-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics