Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor

Yoshida, Takahiro; Yamazaki, Erika; Hangai, Seiichiro

doi:10.1007/978-3-540-69369-7_15

Takahiro Yoshida¹,
Erika Yamazaki¹ &
Seiichiro Hangai¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Included in the following conference series:

International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems

1399 Accesses

Abstract

In order to realize multimodal speech recognition on a mobile phone, it is necessary to develop a small sensor which enables to measure lip movement with small calculation cost. In the previous study, we have developed a simple infrared lip movement sensor located on the front of mouth and cleared that the possibility of HMM based word recognition with 87.1% recognition rate. However, in practical use, it is difficult to set the sensor in front of mouth. In this paper, we developed a new lip movement sensor which can extract the lip movement from either side of a speaker’s face and examine the performance. From experimental results, we have achieved 85.3% speaker independent word recognition rate only with the lip movement from the side sensor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beaumesnil, B., Luthon, F.: Real Time Tracking for 3D Realistic Lip Animation. In: Proceedings of 18th International Conference on Pattern Recognition 2006, vol. 1, pp. 219–222 (2006)
Google Scholar
Chan, M.T., Zhang, Y., Huang, T.S.: Real-time lip tracking and bimodal continuous speech recognition. In: Proceedings of IEEE Second Workshop on Multimedia Signal Processin 1998, pp. 65–70 (1998)
Google Scholar
Delmas, P., Eveno, N., Lievin, M.: Towards robust lip tracking. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 2, pp. 528–531 (2002)
Google Scholar
Huang, J., Potamianos, G., Neti, C.: Improving Audio-Visual Speech Recognition with an Infrared Headset. In: Proceedings of AVS 2003, pp. 175–178 (2003)
Google Scholar
Kaucic, R., Blake, A.: Accurate, real-time, unadorned lip tracking. In: Proceedings of Sixth International Conference on Computer Vision, pp. 370–375 (1998)
Google Scholar
Luettin, J., Potamianos, G., Neti, C.: Asynchronous Stream Modeling For Large Vocabulary Audio-Visual Speech Recognition. In: Proceedings of IEEE ICASS 2001 (2001)
Google Scholar
Meier, U., Hurst, W., Duchnowski, P.: Adaptive Bimodal Sensor Fusion for Automatic Speechreading. In: Proceedings of ICASS 1996, pp. 833–836 (1996)
Google Scholar
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in the Automatic Recognition of Audio-Visual Speech. Proceedings of the IEEE, 91,9 (2003)
Google Scholar
Thambiratnam, D., et al.: Speech Recognition in Adverse Environments using Lip Information. In: Proceedings of IEEE TENCON (1997)
Google Scholar
Wark, T., Sridharan, S., Chandran, V.: The Use of Temporal Speech and Lip Information for Multi-Modal Speaker Identification via Multi-Stream HMM’S. In: Proceedings of ICASSP 2000, vol. 6, pp. 2389–2392 (2000)
Google Scholar
Yoshida, T., Hamamoto, T., Hangai, S.: A Study on Multi-modal Word Recognition System for Car Navigation. In: Proceedings of URSI ISSS 2001, pp. 452–455 (2001)
Google Scholar
Yoshida, T., Hangai, S.: Development of Infrared Lip Move-ment Sensor for Spoken Word Recognition. In: Proceedings of WMSCI 2007, vol. 2, pp. 239–242 (2007)
Google Scholar
Zhang, J., Kaynak, M.N., Cheok, A.D., Ko, C.C.: Real-time lip tracking for virtual lip implementation in virtual environments and computer games. In: Proceedings of 10th IEEE International Conference on Fuzzy Systems, vol. 3, pp. 1359–1362 (2001)
Google Scholar
Zhang, Z., Liu, Z., Sinclair, M., Acero, A., Deng, L., Droppo, J., Huang, X., Zheng, Y.: Multi-Sensory Microphones for Robust Speech Detection, Enhancement and Recognition. In: Proceedings of IEEE ICASSP (2004)
Google Scholar
Zhi, Q., et al.: HMM Modeling for Audio-Visual Speech Recognition. In: Proceedings of IEEE ICME 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Tokyo University of Science, 1-14-6 Kudan-kita, Chiyoda-ku, Tokyo, 102-0073, Japan
Takahiro Yoshida, Erika Yamazaki & Seiichiro Hangai

Authors

Takahiro Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Erika Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar
Seiichiro Hangai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoshida, T., Yamazaki, E., Hangai, S. (2008). Spoken Word Recognition from Side of Face Using Infrared Lip Movement Sensor. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics