Skip to main content

Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

  • Conference paper
Perception in Multimodal Dialogue Systems (PIT 2008)

Abstract

In order to realize multimodal speech recognition on a mobile phone, it is necessary to develop a small sensor which enables to measure lip movement with small calculation cost. In the previous study, we have developed a simple infrared lip movement sensor located on the front of mouth and cleared that the possibility of HMM based word recognition with 87.1% recognition rate. However, in practical use, it is difficult to set the sensor in front of mouth. In this paper, we developed a new lip movement sensor which can extract the lip movement from either side of a speaker’s face and examine the performance. From experimental results, we have achieved 85.3% speaker independent word recognition rate only with the lip movement from the side sensor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Beaumesnil, B., Luthon, F.: Real Time Tracking for 3D Realistic Lip Animation. In: Proceedings of 18th International Conference on Pattern Recognition 2006, vol. 1, pp. 219–222 (2006)

    Google Scholar 

  • Chan, M.T., Zhang, Y., Huang, T.S.: Real-time lip tracking and bimodal continuous speech recognition. In: Proceedings of IEEE Second Workshop on Multimedia Signal Processin 1998, pp. 65–70 (1998)

    Google Scholar 

  • Delmas, P., Eveno, N., Lievin, M.: Towards robust lip tracking. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 2, pp. 528–531 (2002)

    Google Scholar 

  • Huang, J., Potamianos, G., Neti, C.: Improving Audio-Visual Speech Recognition with an Infrared Headset. In: Proceedings of AVS 2003, pp. 175–178 (2003)

    Google Scholar 

  • Kaucic, R., Blake, A.: Accurate, real-time, unadorned lip tracking. In: Proceedings of Sixth International Conference on Computer Vision, pp. 370–375 (1998)

    Google Scholar 

  • Luettin, J., Potamianos, G., Neti, C.: Asynchronous Stream Modeling For Large Vocabulary Audio-Visual Speech Recognition. In: Proceedings of IEEE ICASS 2001 (2001)

    Google Scholar 

  • Meier, U., Hurst, W., Duchnowski, P.: Adaptive Bimodal Sensor Fusion for Automatic Speechreading. In: Proceedings of ICASS 1996, pp. 833–836 (1996)

    Google Scholar 

  • Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in the Automatic Recognition of Audio-Visual Speech. Proceedings of the IEEE, 91,9 (2003)

    Google Scholar 

  • Thambiratnam, D., et al.: Speech Recognition in Adverse Environments using Lip Information. In: Proceedings of IEEE TENCON (1997)

    Google Scholar 

  • Wark, T., Sridharan, S., Chandran, V.: The Use of Temporal Speech and Lip Information for Multi-Modal Speaker Identification via Multi-Stream HMM’S. In: Proceedings of ICASSP 2000, vol. 6, pp. 2389–2392 (2000)

    Google Scholar 

  • Yoshida, T., Hamamoto, T., Hangai, S.: A Study on Multi-modal Word Recognition System for Car Navigation. In: Proceedings of URSI ISSS 2001, pp. 452–455 (2001)

    Google Scholar 

  • Yoshida, T., Hangai, S.: Development of Infrared Lip Move-ment Sensor for Spoken Word Recognition. In: Proceedings of WMSCI 2007, vol. 2, pp. 239–242 (2007)

    Google Scholar 

  • Zhang, J., Kaynak, M.N., Cheok, A.D., Ko, C.C.: Real-time lip tracking for virtual lip implementation in virtual environments and computer games. In: Proceedings of 10th IEEE International Conference on Fuzzy Systems, vol. 3, pp. 1359–1362 (2001)

    Google Scholar 

  • Zhang, Z., Liu, Z., Sinclair, M., Acero, A., Deng, L., Droppo, J., Huang, X., Zheng, Y.: Multi-Sensory Microphones for Robust Speech Detection, Enhancement and Recognition. In: Proceedings of IEEE ICASSP (2004)

    Google Scholar 

  • Zhi, Q., et al.: HMM Modeling for Audio-Visual Speech Recognition. In: Proceedings of IEEE ICME 2001 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yoshida, T., Yamazaki, E., Hangai, S. (2008). Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69369-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69368-0

  • Online ISBN: 978-3-540-69369-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics