Skip to main content

Shape Feature Analysis for Visual Speech and Speaker Recognition

  • Conference paper
Applied Informatics and Communication (ICAIC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 226))

Included in the following conference series:

  • 1564 Accesses

Abstract

Visual information is always combined as a complementary source to enhance the understanding of what the speaker is talking about, especially in a noisy environment. This paper researches on different lip features for visual speech and speaker recognition, and their robustness to different uttering habits is conducted in-depth analysis. Five feature candidates extracted from lip shape are tested and compared on a multispeaker visual speech recognition task of isolated English digits (0~9). Our experimental results demonstrate that the rotational angle caused by head pose is highly correlated with the individual speaker, but independent of the content of speech. The best shape features for speech and speaker recognition are considered to be those providing the “dynamic” information, like rotation and lip motion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McGurk, H., McDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)

    Article  Google Scholar 

  2. Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)

    Article  Google Scholar 

  3. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.: Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91(9), 1306–1326 (2003)

    Article  Google Scholar 

  4. Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A real-time automatic lipreading system. In: Proc. 2004 Int. Symp. Circuits and Systems, vol. 2, pp. 101–104 (2004)

    Google Scholar 

  5. Mattews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of visual features for lipreading. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(2), 198–213 (2002)

    Article  Google Scholar 

  6. Perez, J.F.G., Frangi, A.F., Solano, E.L., Lukas, K.: Lip reading for robust speech recognition on embedded devices. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing, vol. I, pp. 473–476 (2005)

    Google Scholar 

  7. Cetingul, H.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Discriminative analysis of lip motion features for speaker identification and speech-reading. IEEE Transactions on Image Processing 15, 2879–2891 (2006)

    Article  MATH  Google Scholar 

  8. Leung, S.H., Wang, S.L., Lau, W.H.: Lip Image segmentation using fuzzy clustering incorporating an elliptic shape function. IEEE Trans. Image Process. 13(1), 51–62 (2004)

    Article  Google Scholar 

  9. Sum, K.L., Lau, W.H., Leung, S.H., Liew, A.W.W., Tse, K.W.: A new optimization procedure for extracting the point-based lip contour using active shape model. In: Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1485–1488 (2001)

    Google Scholar 

  10. http://en.wikipedia.org/wiki/Optical_flow

  11. Lucas, B.D., Kanade, T.: An iterative technique of image registration and its application to stereo. In: Proc. 7th Int. Joint Conf. on Artificial Intelligence, pp. 674–679 (August 1981)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gui, J., Wang, S. (2011). Shape Feature Analysis for Visual Speech and Speaker Recognition. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23235-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23235-0_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23234-3

  • Online ISBN: 978-3-642-23235-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics