Skip to main content

Lip-Reading Based on Deep Learning Model

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((TEDUTAIN,volume 11345))

Abstract

With the rapid development of computer computing power, deep learning plays a more and more important role in the fields of automatic driving, medical research, industrial automation and so on. In order to improve the accuracy of lip-reading recognition, an algorithm based on the model of lip deep learning was proposed in this paper. Binary image of the lip contour motion sequence was projected to the spatio-temporal energy, lip dynamic grayscale was used to reduce noise interference in the recognition process and then lip-reading recognition result was improved by using the excellent characteristics of deep learning ability. The experimental results show that deep learning can obtain the effective characteristics of lip dynamic change from the lip dynamic gray scale and get better recognition results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997)

    Article  Google Scholar 

  2. Bruce, K.B., Cardelli, L., Pierce, B.C.: Comparing object encodings. In: Abadi, M., Ito, T. (eds.) Theoretical Aspects of Computer Software. LNCS, vol. 1281, pp. 415–438. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0014561

    Chapter  Google Scholar 

  3. van Leeuwen, J. (ed.): Computer Science Today. Recent Trends and Developments. Lecture Notes in Computer Science, vol. 1000. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0015232

    Book  MATH  Google Scholar 

  4. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996). https://doi.org/10.1007/978-3-662-03315-9

    Book  MATH  Google Scholar 

  5. Yao, H., Gao, W., Wang, R.: A survey of lipreading-one of visual languages. Acta Electronica Sinica 2, 239–246 (2001)

    Google Scholar 

  6. Yao, W., Liang, Y., Du, M.: A real-time lip localization and tacking for lipreading. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 363–366. IEEE, Chengdu (2010)

    Google Scholar 

  7. Rao, R.A., Russell, R.M.: Lip modeling for visual speech recognition. In: Proceeding of 28th Annual Asilomar Conference on Signals Systems and Computers, Pacific Grove: [s.n.] (1994)

    Google Scholar 

  8. Jun, H., Hua, Z:. A real time lip detection method in lipreading. In: 2007 Chinese Control Conference, CCC 2007, 31 June–26 July 2007, pp. 516–520 (2007)

    Google Scholar 

  9. Pao, T.L., Liao, W.Y.: A motion feature approach for audio-visual recognition. In: Proceedings of 48th Midwest Symposium on Circuits and Systems, vol. 1, pp. 421–424 (2005)

    Google Scholar 

  10. Da Silveira, L.G., Facon, J., Borges, D.L.: Visual speech recognition: a solution from feature extraction to words classification. In: Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003)

    Google Scholar 

  11. Hong, X., Yao, H., Liu, Q., Chen, R.: An information acquiring channel — lip movement. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 232–238. Springer, Heidelberg (2005). https://doi.org/10.1007/11573548_30

    Chapter  Google Scholar 

  12. Leszczynski, M., Skarbek, W.: Viseme recognition - a comparative study. In: AVSS-Advanced Video and Signal Based Surveillance, pp. 287–292 (2005)

    Google Scholar 

  13. Kaynak, M.N., Zhi, Q., Cheok, A.D., et al.: Analysis of lip geometric features for audio—visual speech recognition. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(4), 564–570 (2004)

    Article  Google Scholar 

  14. Seguier, R., Cladel, N.: Multiobjectives genetic snakes: application on audio-visual speech recognition. In: Proceedings of Fourth EURASIP Conference Focused on Video/Image Processing and Multimedia Communications, vol. 2, pp. 625–630 (2003)

    Google Scholar 

  15. Matthews, I., Cootes, T.F., Bangham, J.A., et al.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)

    Article  Google Scholar 

  16. Wang, W., Cosker, D., Hicks, Y., Saneit, S., Chambers, J.: Video assisted speech source separation. In: 2005 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005), pp. 425–428. IEEE (2005)

    Google Scholar 

  17. Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 227–232 (2000)

    Google Scholar 

  18. Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)

    Article  MathSciNet  Google Scholar 

  19. Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)

    Google Scholar 

  20. Cootes, T.F., Hill, A., Taylor, C.J., et al.: The use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–366 (1994)

    Article  Google Scholar 

  21. Li, G., Wang, M., Lin, L.: Improving Chinese lip-reading recognizing rate by unsymmetrical lip contour model. Optics Precis. Eng. (3), 473–477 (2006)

    Google Scholar 

Download references

Acknowledgments

This work was supported by The Education Department of Jilin Province. I would like to thank those who took care of me, encouraged me and helped me when I am finishing this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei-li Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhu, Ml., Wang, Qq., Luo, Jl. (2019). Lip-Reading Based on Deep Learning Model. In: Pan, Z., Cheok, A., Müller, W., Zhang, M., El Rhalibi, A., Kifayat, K. (eds) Transactions on Edutainment XV. Lecture Notes in Computer Science(), vol 11345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-59351-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-59351-6_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-59350-9

  • Online ISBN: 978-3-662-59351-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics