Advertisement

A Multifaceted Investigation into Speech Reading

  • Trent W. Lewis
  • David M. W. Powers
Conference paper
Part of the Advances in Soft Computing book series (AINSC, volume 14)

Abstract

Speech reading is the act of speech perception using both acoustic and visual information. This is something we all do as humans and can be utilised by machines to improve traditional speech recognition systems. We have been following a line of research that started with a simple audio-visual speech recognition system to what is now a multifaceted investigation into speech reading. This paper overviews our feature extraction technique, red exclusion, and its analysis using neural networks and then looks at several neural network integration architectures for speech reading.

Keywords

Speech Recognition Speech Perception Automatic Speech Recognition Integration Network Feature Extraction Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. Angelopoulou, R. Molana, and K. Daniilidis. Multispectral color modeling. Technical Report MS-CIS-01–22, University of Pennsylvania, CIS, 2001.Google Scholar
  2. 2.
    C. Bregler, S. Manke, H. Hild, and A. Waibel. Bimodal sensor integration on the example of “speech-reading”. Proceedings of the IEEE International Conference on Neural Networks, pages 667–671, 1993.Google Scholar
  3. 3.
    M. Brookes. VOICEBOX: Speech Processing Toolbox for MATLAB. World Wide Web, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, 2000.Google Scholar
  4. 4.
    S.B. Davis and P. Mermelstein. Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences. In A. Waibel and K.F. Lee, editors, Readings in Speech Recognition, pages 64–74. Morgan Kaufmann Publishers Inc., San Mateo, CA, 1990.Google Scholar
  5. 5.
    H. Demuth and M. Beale. Neural Network Toolbox: User’s Guide. The MathWorks, http://www.mathworks.com, 1998.Google Scholar
  6. 6.
    B. Dodd and R. Campbell, editors. Hearing by Eye: The pyschology of lip-reading. Lawrence Erlbaum Associates, Hillsdale NJ, 1987.Google Scholar
  7. 7.
    M.S. Gray, J.R Movellan, and T. Sejnowski. Dynamic features for visual speechreading: A systematic comparision. In Mozer, Jordan, and Persche, editors, Advances in Neural Information Processing Systems, volume 9. MIT Press, Cambridge MA, 1997.Google Scholar
  8. 8.
    M.E. Hennecke, V.K. Prasad, and D.G. Stork. Using deformable templates to infer visual speech dynamics. In 20 Annual Asimolar Conference on Signals, Systems, and Computer,volume 2, pages 576–582, Pacific Grove, CA, 1994. IEEE Computer.Google Scholar
  9. 9.
    M.E. Hennecke, D.G Stork, and K. Venkatesh Prasad. Visionary speech: Looking ahead to practical speech reading systems. In Stork and Hennecke [18], pages 331–350.Google Scholar
  10. 10.
    M. Hunke and A. Waibel. Face locating and tracking for human-computer interaction. In 28th Annual Asimolar Conference on Signals, Systems, and Computers, volume 2, pages 1277–1281. IEEE Computer Society, Pacific Grove CA, 1994.Google Scholar
  11. 11.
    T. W. Lewis and D.M.W. Powers. Lip feature extraction using red exclusion. In Peter Eades and Jesse Jin, editors, CRPIT: Visualisation, 2000, volume 2, 2001.Google Scholar
  12. 12.
    U. Meier, W. Hurst, and P. Duchnowski. Adaptive bimodal sensor fusion for automatic speechreading. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, volume 2, pages 833–837, 1996.Google Scholar
  13. 13.
    U. Meier, R. Steifelhagen, J. Yang, and A. Waibel. Towards unrestricted lip reading. In Second International Conference on Multimedia Interfaces, Hong Kong, http://www.werner.ir.uks.de/js, 1999.Google Scholar
  14. 14.
    J.R. Movellan and P. Mineiro. Robust sensor fusion: Analysis and application to audio visual speech recognition. Machine Learning, 32: 85–100, 1998.CrossRefGoogle Scholar
  15. 15.
    L. Rabiner and B.H. Juang. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993.Google Scholar
  16. 16.
    R.R. Rao and R.M. Mersereau. Lip modeling for visual speech recognition. In 28th Annual Asimolar Conference on Signals, Systems, and Computers, volume 2. IEEE Computer Society, Pacific Grove CA, 1994.Google Scholar
  17. 17.
    R.W. Schafer and L.W. Rabiner. Digital representations of speech signals. In A. Waibel and K.F. Lee, editors, Readings in Speech Recognition, pages 49–64. Morgan Kaufmann Publishers Inc., San Mateo, CA, 1990.Google Scholar
  18. 18.
    D.G. Stork and M.E. Hennecke, editors. Speechreading by Man and Machine: Models, System, and Applications. NATO/Springer-Verlag, New York, 1996.Google Scholar
  19. 19.
    Q. Summerfield. Some preliminaries to a comprehensive account of audio-visual speech perception, pages 3–52. In Dodd and Campbell [6], 1987.Google Scholar
  20. 20.
    Chr. von der Malsburg. Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14: 85–100, 1973.Google Scholar
  21. 21.
    T. Wark, S. Sridharan, and V. Chandran. An approach to statistical lip modelling for speaker identification via chromatic feature extraction. In Proceedings of the IEEE International Conference on Pattern Recognition, pages 123–125, August 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Trent W. Lewis
    • 1
  • David M. W. Powers
    • 1
  1. 1.School of Informatics and EngineeringFlinders Univeristy of South AustraliaAustralia

Personalised recommendations