Skip to main content

Routine Statistical Framework to Speculate Kannada Lip Reading

  • Conference paper
  • First Online:
  • 704 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1192))

Abstract

This paper envisage the system provides a statistical based effort to predict lip movements of speaker. The words spoken by a person is identified by analyzing the shape of lip movement in every instance of time. The approach learns the process of prediction of shapes of lips based on recognition of movement. The lip shapes are predicted by annotating and tracking the movement of lips and synchronization of shape recognition with respect to time is achieved by extracting the shape of lips with different statistical information extracted from every frames of a video. Hence, grooved statistical data lends the system with more appropriate shape based in terms of mean, variance, standard deviation and various other statistical features. The proposed system based on statistical features extraction leads to lip movement recognition and mapping of various Kannada words into different classes based on recognition of shape leads the system perform good initiation towards achieving the Lip Reading. The effort has provided overall accuracy of 40.21% with distinct statistical pattern of features extraction and classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  2. Assael, Y.M., Shillingford, B., Whitestone, S., de Freitas, N.: LipNet: sentence-level lipreading. arXiv:1611.01599 (2016)

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)

    Google Scholar 

  4. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1171–1179 (2015)

    Google Scholar 

  5. Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell. arXiv preprint arXiv:1508.01211 (2015)

  6. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC (2014)

    Google Scholar 

  7. Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results. ArXiv Preprint arXiv: 1412.1602 (2014)

  8. Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)

    Google Scholar 

  9. Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 87–103. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54184-6_6

    Chapter  Google Scholar 

  10. Chung, J.S., Zisserman, A.: Out of time: automated lip sync in the wild. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 251–263. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54427-4_19

    Chapter  Google Scholar 

  11. Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)

    Article  Google Scholar 

  12. Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M.: An Audio-visual corpus for multimodal auto-matic speech recognition. J. Intell. Inform. Syst. 49, 167–192 (2017)

    Google Scholar 

  13. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of CVPR (2016)

    Google Scholar 

  14. Galatas, G., Potamianos, G., Makedon, F.: Audio-visual speech recognition incorporating facial depth information captured by the Kinect. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2714–2717. IEEE (2012)

    Google Scholar 

  15. Graves, A., Fern\(\acute{a}\)ndez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)

    Google Scholar 

  16. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772 (2014)

    Google Scholar 

  17. Graves, A., Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)

    Google Scholar 

  18. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  19. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)

    Google Scholar 

  20. King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)

    Google Scholar 

  21. Koller, O., Ney, H., Bowden, R.: Deep learning of mouth shapes for sign language. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 85–91 (2015)

    Google Scholar 

  22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Nandini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nandini, M.S., Bhajantri, N.U., Nagavi, T.C. (2020). Routine Statistical Framework to Speculate Kannada Lip Reading. In: Saha, A., Kar, N., Deb, S. (eds) Advances in Computational Intelligence, Security and Internet of Things. ICCISIoT 2019. Communications in Computer and Information Science, vol 1192. Springer, Singapore. https://doi.org/10.1007/978-981-15-3666-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3666-3_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3665-6

  • Online ISBN: 978-981-15-3666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics