Skip to main content

Secure Recognition of Voice-Less Commands Using Videos

  • Conference paper
  • 675 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 12))

Abstract

Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ferguson, G.T., Hodo, C.K., O’Mahony, R.M.: Fortune favors the innovative: How new forms of e-commerce will transform the business landscape, http://www.accenture.com

  2. PR Newswire on behalf of The Voice Commerce Group: Voice Commerce Gives All e-businesses a ‘Voice’ on the Web, http://www.prnewswire.co.uk/cgi/news

  3. Stork, D.G., Hennecke, M.E.: Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition technique. In: FG 1996 (1996)

    Google Scholar 

  4. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. In: Proc. of IEEE (2003)

    Google Scholar 

  5. Luettin, J., Thacker, N.A., Beet, S.W.: Speaker identification by lipreading. In: Proc. of International Conference on Spoken Language Processing (1996)

    Google Scholar 

  6. Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)

    Article  Google Scholar 

  7. Yau, W.C., Kumar, D.K., Arjunan, S.P.: Visual Recognition of Speech Consonants using Facial Movement Features. Integrated Computer-Aided Engineering 14(1), 9–61 (2007)

    Google Scholar 

  8. Zhang, D., Lu, G.: Review of Shape Representation and Description Techniques. Pattern Recognition Letters 37 (2004)

    Google Scholar 

  9. Teh, C.H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 496–513 (1988)

    Article  MATH  Google Scholar 

  10. Khontazad, A., Hong, Y.H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 489–497 (1990)

    Article  Google Scholar 

  11. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 955–974 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yau, W.C., Kumar, D.K., Weghorn, H. (2008). Secure Recognition of Voice-Less Commands Using Videos. In: Jahankhani, H., Revett, K., Palmer-Brown, D. (eds) Global E-Security. ICGeS 2008. Communications in Computer and Information Science, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69403-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69403-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69402-1

  • Online ISBN: 978-3-540-69403-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics