Abstract
This paper describes a novel approach to visual speech recognition. The intensity of each pixel in an image sequence is considered as a function of time. One-dimensional Fourier transform is applied to this intensity-versus-time function to model the lip movements. We present experimental results performed on two databases of ten English digits and letters, respectively.
Preview
Unable to display preview. Download preview PDF.
References
W.E. Adam and F. Bitter, “Advances in Heart Imaging”, Proc. of Int. Symposium on Medical Radionuclide Imaging, 1980.
M. Boehm, U. Obermoeller, and K.H. Hoehne, “Determination of Heart Dynamics from X-Ray and Ultrasound Image Sequences”, Proc. of Int. Conf. on Pattern Recognition, pp. 403–408, 1980.
C. Bregler, S. Manke, H. Hild, and A. Waibel, “Bimodal Sensor Integration on the Example of 'speech-Reading'”, Proc. of IEEE Int. Conf. on Neural Networks, pp. 667–671, 1993.
A.J. Goldschen, O.N. Garcia, and E. Petajan, “Continuous Optical Automatic Speech Recognition by Lipreading”, Proc. of 28th Annual Asilomar Conference on Signals, Systems, and Computers, pp. 572–577, 1995.
M. Hennecke, D.G. Stork, and K.V. Prasad, “Visionary Speech: Looking Ahead to Practical Speechreading Systems”, in Speechreading by Humans and Machines, D.G. Stork and M.E. Hennecke (Eds.), pp. 331–350, 1995.
M. Kirby, F. Weisser, and G. Dangelmayr, “A Model Problem in the Representation of Digital Image Sequences”, Pattern Recognition, Vol. 26, No. 1, pp. 63–73, 1993.
N. Li, S. Dettmer, and M. Shah, “Lipreading Using Eigensequences”, Proc. of Int. Workshop on Automatic Face-and Gesture-Recognition, pp. 30–34, 1995.
J. Luettin, N.A. Thacker, and S.W. Beet, “Visual Speech Recognition Using Active Shape Models and Hidden Markov Models”, Proc. of IEEE Int. Conf. on Acoustic, Speech and Signal Processing, 1996.
U. Meier, W. Hürst, and P. Duchnowski, “Adaptive Bimodal Sensor Fusion for Automatic Speechreading”, Proc. of IEEE Int. Conf. on Acoustic, Speech and Signal Processing, 1996.
J.R. Movellan, “Visual Speech Recognition with Stochastic Networks”, in Advances in Neural Information Processing System, G. Tesauro, D. Toruetzky, and T. Leen (Eds.), Vol. 7, MIT Press, Cambridge, 1995.
J.R. Movellan, “Visual Speech Recognition with Stochastic Networks”, in Advances in Neural Information Processing System, G. Tesauro, D. Toruetzky, and T. Leen (Eds.), Vol. 7, MIT Press, Cambridge, 1995.
C. Nastar and N. Ayache, “Time Representation of Deformations: Combining Vibration Modes and Fourier Analysis”, in Object Representation in Computer Vision, M. Hebert, J. Ponce, T. Boult, and A. Gross (Eds.), pp. 263–275, 1994.
D.G. Stork and M.E. Hennecke (Eds.), Speechreading by Humans and Machines, Springer-Verlag, 1996.
K. Yu, X.Y. Jiang, and H. Bunke, “Lipreading: A Classifier Combination Approach”, accepted by Pattern Recognition in Practice V, 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, K., Jiang, X., Bunke, H. (1997). Lipreading using Fourier transform over time. In: Sommer, G., Daniilidis, K., Pauli, J. (eds) Computer Analysis of Images and Patterns. CAIP 1997. Lecture Notes in Computer Science, vol 1296. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63460-6_152
Download citation
DOI: https://doi.org/10.1007/3-540-63460-6_152
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63460-7
Online ISBN: 978-3-540-69556-1
eBook Packages: Springer Book Archive