Advertisement

Speech Training System for Hearing Impaired Individuals Based on Automatic Lip-Reading Recognition

  • Yuanyao LuEmail author
  • Shenyao Yang
  • Zheng Xu
  • Jingzhong Wang
Conference paper
  • 20 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1207)

Abstract

Using automatic lip recognition technology to promote social interaction and integration of hearing impaired individuals and dysphonic people is one of promising applications of artificial intelligence in healthcare and rehabilitation. Due to inaccurate mouth shapes and unclear expressions, hearing impaired individuals and dysphonic people cannot communicate as normal people do. In this paper, a speech training system for hearing impaired individuals and dysphonic people is constructed using state-of-the-art automatic lip-reading technology which combines convolutional neural network (CNN) and recurrent neural network (RNN). We train their speech skills by comparing different mouth shapes between the hearing impaired individuals and normal people. The speech training system can be divided into four parts. Firstly, we create a speech training database that stores mouth shapes of normal people and corresponding sign language vocabulary. Secondly, the system implements automatic lip-reading through a hybrid neural network of the MobileNet and the Long-Short-Term Memory Networks (LSTM). Thirdly, the system finds correct lip shape matched by sign language vocabulary from the speech training database and compares the result with lip shapes of hearing impaired individuals. Finally, the system draws comparison data and similarity rate based on the size of lips of hearing impaired individuals, the angle of opening lips, and the differences between different lip shapes. Giving a standard lip-reading sequence for the hearing impaired for their learning and training. As a result, hearing impaired individuals and dysphonic people can analyze and correct their vocal lip shapes based on the comparison results. They can perform training independently to improve their mouth shape. Besides, the system can help hearing impaired individuals learn how to pronounce correctly with the help of medical devices such as cochlear implants. Experiments show that the speech training system based on automatic lip-reading recognition can effectively correct lip shape of the hearing impaired individuals while they speak and improve their speech ability without help from others.

Keywords

Hearing impaired individuals speech training Human-computer interaction Automatic lip-reading Sign language recognition Deep learning 

Notes

Acknowledgements

The research was partially supported by the National Natural Science Foundation of China (no. 61571013 and no. 61971007), the Beijing Natural Science Foundation of China (no. 4143061).

References

  1. 1.
    Ogawa, T., Uchida, Y.: Hearing-impaired elderly people have smaller social networks: A population-based aging study. Arch. Gerontol. Geriatricsr 83, 75–80 (2019)Google Scholar
  2. 2.
    Xiang, Z.: China Statistical Yearbook on the Work for Persons with Disabilities (2018). ISBN 978-7-5037-8563-4Google Scholar
  3. 3.
    Melissa, R.: A survey of clinicians with specialization in childhood apraxia of speech. Am. J. Speech-Lang. Pathol. 28, 1659–1672 (2019)CrossRefGoogle Scholar
  4. 4.
    Bhutta, M.F.: Models of service delivery for ear and hearing care in remote or resource-constrained environments. J. Laryngol. Otol. 18, 1–10 (2018)Google Scholar
  5. 5.
    Perry, H.B., Zulliger, R., Rogers, M.M.: Community health workers in low-, middle-, and high-income countries: an overview of their history, recent evolution, and current effectiveness. Annu. Rev. Public Health 35(1), 399–421 (2014)CrossRefGoogle Scholar
  6. 6.
    Jaimes, A., Sebe, N.: Multimodal human–computer interaction: a survey. Comput. Vis. Image Underst. 108, 116–134 (2007)CrossRefGoogle Scholar
  7. 7.
    Ma, N.W.: Enlightenment of domestic research of lip-reading on speech rehabilitation of hearing-impaired children. Modern Spec. Educ. 12, 54–57 (2015)Google Scholar
  8. 8.
    Lu, Y.Y., Li, H.B.: Automatic lip-reading system based on deep convolutional neural network and attention-based long short-term memory. Appl. Sci. Basel. 9 (2019).  https://doi.org/10.3390/app9081599
  9. 9.
    Hassanat, A.B.: Visual passwords using automatic lip reading. arXiv 2014. arXiv:1409.0924
  10. 10.
    Thanda, A., Venkatesan, S.M.: Multi-task learning of deep neural networks for audio visual automatic speech recognition. arXiv 2017 arXiv:1701.02477
  11. 11.
    Biswas, A., Sahu, P.K., Chandra, M.: Multiple cameras audio visual speech recognition using active appearance model visual features in car environment. Int. J. Speech Technol. 19, 159–171 (2016)CrossRefGoogle Scholar
  12. 12.
    Werth, J., Radha, M.: Deep learning approach for ECG-based automatic sleep state classification in preterm infants. Biomed. Sig. Process. Control  https://doi.org/10.1016/j.bspc.2019.101663
  13. 13.
    McNeely-White, D., Beveridge, J.R., Draper, B.A.: Inception and ResNet features are (almost) equivalent. Cogn. Syst. Res. 59, 312–318 (2020)CrossRefGoogle Scholar
  14. 14.
    Rauf, HT., Miu, L., Zahoor, S.: Visual features based automated identification of fish species using deep convolutional neural networks. Comput. Electron. Agric.  https://doi.org/10.1016/j.compag.2019.105075
  15. 15.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017Google Scholar
  16. 16.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018Google Scholar
  17. 17.
    Chen, H.Y., Su, C.Y.: An enhanced hybrid MobileNet. In: Proceedings of the International Conference on Awareness Science and Technology (2018)Google Scholar
  18. 18.
    Michele, A., Colin, V., Santika, D.D.: Santika MobileNet convolutional neural networks and support vector machines for palmprint recognition. Procedia Comput. Sci. 157, 110–117 (2019)CrossRefGoogle Scholar
  19. 19.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  20. 20.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: Proceedings of the 9th International Conference on Artificial Neural Networks: ICANN 1999, Edinburgh, UK, 7–10 September 1999Google Scholar

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Yuanyao Lu
    • 1
    Email author
  • Shenyao Yang
    • 1
  • Zheng Xu
    • 1
  • Jingzhong Wang
    • 1
  1. 1.School of Information Science and TechnologyNorth China University of TechnologyBeijingChina

Personalised recommendations