Lip-Reading Based on Deep Learning Model

Zhu, Mei-li; Wang, Qing-qing; Luo, Jiang-lin

doi:10.1007/978-3-662-59351-6_4

Lip-Reading Based on Deep Learning Model

Mei-li Zhu²⁰,
Qing-qing Wang²¹ &
Jiang-lin Luo²⁰

Chapter
First Online: 27 April 2019

1059 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((TEDUTAIN,volume 11345))

Abstract

With the rapid development of computer computing power, deep learning plays a more and more important role in the fields of automatic driving, medical research, industrial automation and so on. In order to improve the accuracy of lip-reading recognition, an algorithm based on the model of lip deep learning was proposed in this paper. Binary image of the lip contour motion sequence was projected to the spatio-temporal energy, lip dynamic grayscale was used to reduce noise interference in the recognition process and then lip-reading recognition result was improved by using the excellent characteristics of deep learning ability. The experimental results show that deep learning can obtain the effective characteristics of lip dynamic change from the lip dynamic gray scale and get better recognition results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
Article Google Scholar
Bruce, K.B., Cardelli, L., Pierce, B.C.: Comparing object encodings. In: Abadi, M., Ito, T. (eds.) Theoretical Aspects of Computer Software. LNCS, vol. 1281, pp. 415–438. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0014561
Chapter Google Scholar
van Leeuwen, J. (ed.): Computer Science Today. Recent Trends and Developments. Lecture Notes in Computer Science, vol. 1000. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0015232
Book MATH Google Scholar
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-662-03315-9
Book MATH Google Scholar
Yao, H., Gao, W., Wang, R.: A survey of lipreading-one of visual languages. Acta Electronica Sinica 2, 239–246 (2001)
Google Scholar
Yao, W., Liang, Y., Du, M.: A real-time lip localization and tacking for lipreading. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 363–366. IEEE, Chengdu (2010)
Google Scholar
Rao, R.A., Russell, R.M.: Lip modeling for visual speech recognition. In: Proceeding of 28th Annual Asilomar Conference on Signals Systems and Computers, Pacific Grove: [s.n.] (1994)
Google Scholar
Jun, H., Hua, Z:. A real time lip detection method in lipreading. In: 2007 Chinese Control Conference, CCC 2007, 31 June–26 July 2007, pp. 516–520 (2007)
Google Scholar
Pao, T.L., Liao, W.Y.: A motion feature approach for audio-visual recognition. In: Proceedings of 48th Midwest Symposium on Circuits and Systems, vol. 1, pp. 421–424 (2005)
Google Scholar
Da Silveira, L.G., Facon, J., Borges, D.L.: Visual speech recognition: a solution from feature extraction to words classification. In: Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003)
Google Scholar
Hong, X., Yao, H., Liu, Q., Chen, R.: An information acquiring channel — lip movement. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 232–238. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_30
Chapter Google Scholar
Leszczynski, M., Skarbek, W.: Viseme recognition - a comparative study. In: AVSS-Advanced Video and Signal Based Surveillance, pp. 287–292 (2005)
Google Scholar
Kaynak, M.N., Zhi, Q., Cheok, A.D., et al.: Analysis of lip geometric features for audio—visual speech recognition. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(4), 564–570 (2004)
Article Google Scholar
Seguier, R., Cladel, N.: Multiobjectives genetic snakes: application on audio-visual speech recognition. In: Proceedings of Fourth EURASIP Conference Focused on Video/Image Processing and Multimedia Communications, vol. 2, pp. 625–630 (2003)
Google Scholar
Matthews, I., Cootes, T.F., Bangham, J.A., et al.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)
Article Google Scholar
Wang, W., Cosker, D., Hicks, Y., Saneit, S., Chambers, J.: Video assisted speech source separation. In: 2005 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005), pp. 425–428. IEEE (2005)
Google Scholar
Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 227–232 (2000)
Google Scholar
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)
Article MathSciNet Google Scholar
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)
Google Scholar
Cootes, T.F., Hill, A., Taylor, C.J., et al.: The use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–366 (1994)
Article Google Scholar
Li, G., Wang, M., Lin, L.: Improving Chinese lip-reading recognizing rate by unsymmetrical lip contour model. Optics Precis. Eng. (3), 473–477 (2006)
Google Scholar

Download references

Acknowledgments

This work was supported by The Education Department of Jilin Province. I would like to thank those who took care of me, encouraged me and helped me when I am finishing this paper.

Author information

Authors and Affiliations

Science and Technology Innovation Center, Jilin Animation Institute, Changchun, 130000, China
Mei-li Zhu & Jiang-lin Luo
College of Optical and Electronic Information, ChangChun University of Science and Technology, Changchun, 130000, China
Qing-qing Wang

Authors

Mei-li Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qing-qing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiang-lin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mei-li Zhu .

Editor information

Editors and Affiliations

Hangzhou Normal University, Hangzhou, China
Zhigeng Pan
Imagineering Institute, Nusajaya, Malaysia
Adrian David Cheok
University of Education, Weingarten, Germany
Wolfgang Müller
Zhejiang University, Hangzhou, China
Mingmin Zhang
John Moores University, Liverpool, UK
Abdennour El Rhalibi
Air University, Islamabad, Pakistan
Kashif Kifayat

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, Ml., Wang, Qq., Luo, Jl. (2019). Lip-Reading Based on Deep Learning Model. In: Pan, Z., Cheok, A., Müller, W., Zhang, M., El Rhalibi, A., Kifayat, K. (eds) Transactions on Edutainment XV. Lecture Notes in Computer Science(), vol 11345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59351-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-59351-6_4
Published: 27 April 2019
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-59350-9
Online ISBN: 978-3-662-59351-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics