Routine Statistical Framework to Speculate Kannada Lip Reading

Nandini, M. S.; Bhajantri, Nagappa U.; Nagavi, Trisiladevi C.

doi:10.1007/978-981-15-3666-3_3

M. S. Nandini¹⁰,
Nagappa U. Bhajantri¹¹ &
Trisiladevi C. Nagavi¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1192))

Included in the following conference series:

International Conference on Computational Intelligence, Security and Internet of Things

706 Accesses

Abstract

This paper envisage the system provides a statistical based effort to predict lip movements of speaker. The words spoken by a person is identified by analyzing the shape of lip movement in every instance of time. The approach learns the process of prediction of shapes of lips based on recognition of movement. The lip shapes are predicted by annotating and tracking the movement of lips and synchronization of shape recognition with respect to time is achieved by extracting the shape of lips with different statistical information extracted from every frames of a video. Hence, grooved statistical data lends the system with more appropriate shape based in terms of mean, variance, standard deviation and various other statistical features. The proposed system based on statistical features extraction leads to lip movement recognition and mapping of various Kannada words into different classes based on recognition of shape leads the system perform good initiation towards achieving the Lip Reading. The effort has provided overall accuracy of 40.21% with distinct statistical pattern of features extraction and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Assael, Y.M., Shillingford, B., Whitestone, S., de Freitas, N.: LipNet: sentence-level lipreading. arXiv:1611.01599 (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)
Google Scholar
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 1171–1179 (2015)
Google Scholar
Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell. arXiv preprint arXiv:1508.01211 (2015)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC (2014)
Google Scholar
Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent NN: first results. ArXiv Preprint arXiv: 1412.1602 (2014)
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)
Google Scholar
Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10112, pp. 87–103. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54184-6_6
Chapter Google Scholar
Chung, J.S., Zisserman, A.: Out of time: automated lip sync in the wild. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10117, pp. 251–263. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54427-4_19
Chapter Google Scholar
Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)
Article Google Scholar
Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M.: An Audio-visual corpus for multimodal auto-matic speech recognition. J. Intell. Inform. Syst. 49, 167–192 (2017)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of CVPR (2016)
Google Scholar
Galatas, G., Potamianos, G., Makedon, F.: Audio-visual speech recognition incorporating facial depth information captured by the Kinect. In: 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2714–2717. IEEE (2012)
Google Scholar
Graves, A., Fern\(\acute{a}\)ndez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772 (2014)
Google Scholar
Graves, A., Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Article Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)
Google Scholar
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Koller, O., Ney, H., Bowden, R.: Deep learning of mouth shapes for sign language. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 85–91 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of IS and Engineering, NIE Institute of Technology, Mysore, Karnataka, India
M. S. Nandini
Department of CS and Engineering, Government Engineering College, Chamarajanagara, Karnataka, India
Nagappa U. Bhajantri
Department of Engineering, Jayachamaraja College of Engineering, JSS Science and Technology University, Mysore, Mysore, Karnataka, India
Trisiladevi C. Nagavi

Authors

M. S. Nandini
View author publications
You can also search for this author in PubMed Google Scholar
Nagappa U. Bhajantri
View author publications
You can also search for this author in PubMed Google Scholar
Trisiladevi C. Nagavi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. S. Nandini .

Editor information

Editors and Affiliations

National Institute of Technology Agartala, Agartala, India
Ashim Saha
National Institute of Technology Agartala, Agartala, India
Nirmalya Kar
National Institute of Technology Agartala, Agartala, India
Suman Deb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nandini, M.S., Bhajantri, N.U., Nagavi, T.C. (2020). Routine Statistical Framework to Speculate Kannada Lip Reading. In: Saha, A., Kar, N., Deb, S. (eds) Advances in Computational Intelligence, Security and Internet of Things. ICCISIoT 2019. Communications in Computer and Information Science, vol 1192. Springer, Singapore. https://doi.org/10.1007/978-981-15-3666-3_3

Download citation

DOI: https://doi.org/10.1007/978-981-15-3666-3_3
Published: 05 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3665-6
Online ISBN: 978-981-15-3666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics