Abstract
With the rapid development of computer computing power, deep learning plays a more and more important role in the fields of automatic driving, medical research, industrial automation and so on. In order to improve the accuracy of lip-reading recognition, an algorithm based on the model of lip deep learning was proposed in this paper. Binary image of the lip contour motion sequence was projected to the spatio-temporal energy, lip dynamic grayscale was used to reduce noise interference in the recognition process and then lip-reading recognition result was improved by using the excellent characteristics of deep learning ability. The experimental results show that deep learning can obtain the effective characteristics of lip dynamic change from the lip dynamic gray scale and get better recognition results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
Bruce, K.B., Cardelli, L., Pierce, B.C.: Comparing object encodings. In: Abadi, M., Ito, T. (eds.) Theoretical Aspects of Computer Software. LNCS, vol. 1281, pp. 415–438. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0014561
van Leeuwen, J. (ed.): Computer Science Today. Recent Trends and Developments. Lecture Notes in Computer Science, vol. 1000. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0015232
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-662-03315-9
Yao, H., Gao, W., Wang, R.: A survey of lipreading-one of visual languages. Acta Electronica Sinica 2, 239–246 (2001)
Yao, W., Liang, Y., Du, M.: A real-time lip localization and tacking for lipreading. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 363–366. IEEE, Chengdu (2010)
Rao, R.A., Russell, R.M.: Lip modeling for visual speech recognition. In: Proceeding of 28th Annual Asilomar Conference on Signals Systems and Computers, Pacific Grove: [s.n.] (1994)
Jun, H., Hua, Z:. A real time lip detection method in lipreading. In: 2007 Chinese Control Conference, CCC 2007, 31 June–26 July 2007, pp. 516–520 (2007)
Pao, T.L., Liao, W.Y.: A motion feature approach for audio-visual recognition. In: Proceedings of 48th Midwest Symposium on Circuits and Systems, vol. 1, pp. 421–424 (2005)
Da Silveira, L.G., Facon, J., Borges, D.L.: Visual speech recognition: a solution from feature extraction to words classification. In: Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003)
Hong, X., Yao, H., Liu, Q., Chen, R.: An information acquiring channel — lip movement. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 232–238. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_30
Leszczynski, M., Skarbek, W.: Viseme recognition - a comparative study. In: AVSS-Advanced Video and Signal Based Surveillance, pp. 287–292 (2005)
Kaynak, M.N., Zhi, Q., Cheok, A.D., et al.: Analysis of lip geometric features for audio—visual speech recognition. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(4), 564–570 (2004)
Seguier, R., Cladel, N.: Multiobjectives genetic snakes: application on audio-visual speech recognition. In: Proceedings of Fourth EURASIP Conference Focused on Video/Image Processing and Multimedia Communications, vol. 2, pp. 625–630 (2003)
Matthews, I., Cootes, T.F., Bangham, J.A., et al.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)
Wang, W., Cosker, D., Hicks, Y., Saneit, S., Chambers, J.: Video assisted speech source separation. In: 2005 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005), pp. 425–428. IEEE (2005)
Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 227–232 (2000)
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)
Cootes, T.F., Hill, A., Taylor, C.J., et al.: The use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–366 (1994)
Li, G., Wang, M., Lin, L.: Improving Chinese lip-reading recognizing rate by unsymmetrical lip contour model. Optics Precis. Eng. (3), 473–477 (2006)
Acknowledgments
This work was supported by The Education Department of Jilin Province. I would like to thank those who took care of me, encouraged me and helped me when I am finishing this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Zhu, Ml., Wang, Qq., Luo, Jl. (2019). Lip-Reading Based on Deep Learning Model. In: Pan, Z., Cheok, A., Müller, W., Zhang, M., El Rhalibi, A., Kifayat, K. (eds) Transactions on Edutainment XV. Lecture Notes in Computer Science(), vol 11345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59351-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-59351-6_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-59350-9
Online ISBN: 978-3-662-59351-6
eBook Packages: Computer ScienceComputer Science (R0)