Skip to main content

Evaluating Video and Facial Muscle Activity for a Better Assistive Technology: A Silent Speech Based HCI

  • Chapter
  • First Online:
Book cover Computational Models of Complex Systems

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 53))

  • 1360 Accesses

Abstract

There is an urgent need for having interfaces that directly employ the natural communication and manipulation skills of humans. Vision based systems that are suitable for identifying small actions and suitable for communication applications will allow the deployment for machine control by people with restricted limb movements, such as neuro-trauma patients. Because of the limited abilities of these people, it is also important that these systems have inbuilt intelligence and are suitable for learning about the user and reconfigure itself appropriately. Patients who have suffered neuro-trauma often have restricted body and limb movements. In such cases, hand, arms and the body movements may be impossible, thus head activity and face expression become important in designing Human computer interface (HCI) systems for machine control. Silent speech-based assistive technologies (AT) are important for users with difficulty to vocalize by providing the flexibility for the users to control computers without making a sound. This chapter evaluates the feasibility of using facial muscle activity signals and mouth video to identify speech commands, in the absence of voice signals. This chapter investigates the classification power of mouth videos in identifying English vowels and consonants. This research also examines the use of non invasive, facial surface Electromyogram (SEMG) to identify unvoiced English and German vowels based on the muscle activity and also provide a feedback to the visual system. The results suggest that video-based systems and facial muscle activity work reliably for simple speech-based commands for AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ursula, H., Pierre, P.: Facial reactions to emotional facial expressions: affect or cognition? Cogn. Emot. 12(4), 509–531 (1998)

    Google Scholar 

  2. Feng, J., Sears, A., Karat, C.: A longitudinal evaluation of hands-free speech-based navigation during dictation. Int. J. Hum. Comput. Stud. 64, 553–569 (2006)

    Article  Google Scholar 

  3. Kuhn, T., Jameel, A., Stuempfle, M., Haddadi, A.: Hybrid in-car speech recognition for mobile multimedia applications. In: IEEE Vehicular Technology Conference, Houston, TX., USA, 2009–2013 (1999)

    Google Scholar 

  4. Starkie, B., 2001. Programming Spoken Dialogs Using Grammatical Inference. AI 2001: Advances in Artificial Intelligence: 14th International Joint Conference on Artificial Intelligence. Adelaide, Australia

    Google Scholar 

  5. Yau, W.C., Kumar, D.K., Arjunan, S.P. 2006. Visual Speech Recognition Method Using Translation. Scale and Rotation Invariant Features, IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia

    Google Scholar 

  6. Dikshit, P. S., Schubert, R. W., 1995. Electroglottograph as an additional source of information in isolated word recognition. Fourteenth Southern Biomedical, Engineering Conference, 1–4.

    Google Scholar 

  7. Arjunan, S., Kumar, D.K., Weghorn, H., Naik, G.: Facial muscle activity patterns for recognition of utterances in native and foreign language: testing for its reliability and flexibility. In: Mago, V., Bhatia, N. (eds.) Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition: Advancing Technologies, pp. 212–231. Information Science Reference, Hershey (2012)

    Google Scholar 

  8. Potamianos, G., Neti, C., Gravier, G., Senior, A. W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proceedings of IEEE, vol. 91 (2003)

    Google Scholar 

  9. Hazen, T.J.: Visual model structures and synchrony constraints for audio-visual speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 1082–1089 (2006)

    Article  Google Scholar 

  10. Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. IEEE Global Telecommunication Conference (1984)

    Google Scholar 

  11. Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Trans. Syst. Man Cybern. B Cybern. 34, 564–570 (2001)

    Article  Google Scholar 

  12. Adjoudani, A., Benoit, C., Levine, E.P.: On the integration of auditory and visual parameters in an HMM-based ASR, Models, Systems, and Applications, Speechreading by Humans and Machines, pp. 461–472. (1996)

    Google Scholar 

  13. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)

    Article  Google Scholar 

  14. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1998)

    Google Scholar 

  15. Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transform. IEEE Trans. Inf. Theory 38, 587–607 (1992)

    Article  MathSciNet  Google Scholar 

  16. Khontazad, A., Hong, Y.H.: Invariant image recognition by zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12, 489–497 (1990)

    Article  Google Scholar 

  17. Teague, M.R.: Image analysis via the general theory of moments. J. Opt. Soc. Am. 70, 920–930 (1980)

    Article  MathSciNet  Google Scholar 

  18. Teh, C.H., Chin, R.T.: On image analysis by the methods of moments. IEEE Trans. Pattern Anal. Mach. Intell. 10, 496–513 (1988)

    Article  MATH  Google Scholar 

  19. Yau, W.C., Kumar, D.K., Weghorn, H.: Motion Features for Visual Speech Recognition. In: Liew, A., Wang, S. (eds.) Visual speech recognition: Lip segmentation and mapping, pp. 388–415. Medical Information Science Reference, Hershey (2009)

    Chapter  Google Scholar 

  20. Lapatki, G., Stegeman, D.F., Jonas, I.E.: A surface EMG electrode for the simultaneous observation of multiple facial muscles. J. Neurosci. Methods 123, 117–128 (2003)

    Article  Google Scholar 

  21. Parsons, T. W.: Voice and speech processing, 1st edn, McGraw-Hill Book Company, New York (1986)

    Google Scholar 

  22. Basmajian, J.V., Deluca, C.J.: Muscles alive: Their functions revealed by electromyography. 5th edn. (1985)

    Google Scholar 

  23. Chan, D.C., Englehart, K., Hudgins, B., Lovely, D.F.: A multi-expert speech recognition system using acoustic and myoelectric signals. 24th Annual IEEE EMBS/BMES Conference (2002)

    Google Scholar 

  24. Kumar, S., Kumar, D.K., Alemu, M., Burry, M.: EMG based voice recognition. In: Prooceddings of intelligent sensors, sensor networks and information processing conference (2004)

    Google Scholar 

  25. Arjunan, S.P., Weghorn, H., Kumar, D.K., Naik, G., Yau, W.C.: Recognition of human voice utterances from facial surface EMG without using audio signals. Enterp. Info. Syst. Lect. Notes Bus. Info. Process. 12(6), 366–378 (2009)

    Google Scholar 

  26. Tuisku, O., Surakka, V., Vanhala, T., Rantanen, V., Lekkala, J.: Wireless Face Interface: Using voluntary gaze direction and facial muscle activations for human computer interaction. Interact. Comput. 24(1), 1–9 (2012)

    Article  Google Scholar 

  27. Fridlund, A.J., Cacioppo, J.T.: Guidelines for human electromyographic research. J. Biol. Psychol. 23(5), 567–589 (1986)

    Google Scholar 

  28. Freedman, D., Pisani, R., Purves, R.: Statistics. Norton College Books, New York (1997)

    Google Scholar 

  29. Gutierrez-Osuna, R., Lecture 13: Validation. http://research.cs.tamu.edu/prism/lectures/iss/iss_l13.pdf (Last Access: June 2012)

  30. Foo, S.W., Dong, L.: Recognition of visual speech elements using hidden Markov models. Lect. Notes Comput. Sci. Springer-Verlag 2532, 607–614 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sridhar P. Arjunan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Arjunan, S.P., Yau, W.C., Kumar, D.K. (2014). Evaluating Video and Facial Muscle Activity for a Better Assistive Technology: A Silent Speech Based HCI. In: Mago, V., Dabbaghian, V. (eds) Computational Models of Complex Systems. Intelligent Systems Reference Library, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-319-01285-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01285-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01284-1

  • Online ISBN: 978-3-319-01285-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics