Abstract
Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ferguson, G.T., Hodo, C.K., O’Mahony, R.M.: Fortune favors the innovative: How new forms of e-commerce will transform the business landscape, http://www.accenture.com
PR Newswire on behalf of The Voice Commerce Group: Voice Commerce Gives All e-businesses a ‘Voice’ on the Web, http://www.prnewswire.co.uk/cgi/news
Stork, D.G., Hennecke, M.E.: Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition technique. In: FG 1996 (1996)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. In: Proc. of IEEE (2003)
Luettin, J., Thacker, N.A., Beet, S.W.: Speaker identification by lipreading. In: Proc. of International Conference on Spoken Language Processing (1996)
Bobick, A.F., Davis, J.W.: The Recognition of Human Movement Using Temporal Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)
Yau, W.C., Kumar, D.K., Arjunan, S.P.: Visual Recognition of Speech Consonants using Facial Movement Features. Integrated Computer-Aided Engineering 14(1), 9–61 (2007)
Zhang, D., Lu, G.: Review of Shape Representation and Description Techniques. Pattern Recognition Letters 37 (2004)
Teh, C.H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 496–513 (1988)
Khontazad, A., Hong, Y.H.: Invariant Image Recognition by Zernike Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 489–497 (1990)
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 955–974 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yau, W.C., Kumar, D.K., Weghorn, H. (2008). Secure Recognition of Voice-Less Commands Using Videos. In: Jahankhani, H., Revett, K., Palmer-Brown, D. (eds) Global E-Security. ICGeS 2008. Communications in Computer and Information Science, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69403-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-69403-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69402-1
Online ISBN: 978-3-540-69403-8
eBook Packages: Computer ScienceComputer Science (R0)