Skip to main content

Contribution of Oral Periphery on Visual Speech Intelligibility

  • Conference paper
Advances in Computing and Communications (ACC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 191))

Included in the following conference series:

  • 1395 Accesses

Abstract

Visual speech recognition aims at improving speech recognition for human-computer interaction. Motivated by the cognitive ability of humans to lip-read, visual speech recognition systems take into account the movement of visible speech articulators to classify the spoken word. However, while most of the research has been focussed on lip movement, the contribution of other factors has not been much looked into. This paper studies the effect of the movement of the area around the lips on the accuracy of speech classification. Two sets of visual features are derived: one set corresponds to the parameters from an accurate lip contour while the other feature set takes into account the area around the lips also. The features have been classified using data mining algorithms in WEKA. It is observed from results that features incorporating the area around the lips show an improvement in the performance of machines to recognize the spoken word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models: Their Training and Application. Computer Vision and Image Understanding 61(1), 38–59 (1995)

    Article  Google Scholar 

  2. Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.: Large vocabulary audio-visual speech recognition using active shape models. In: International Conference on Pattern Recognition, vol. 3, pp. 106–109 (2000)

    Google Scholar 

  3. Feng, X., Wang, W.: DTCWT-based dynamic texture features for visual speech recognition. In: IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2008), pp. 497–500 (2008)

    Google Scholar 

  4. Gordan, M., Kotropoulos, C., Pitas, I.: A Support Vector Machine-Based Dynamic Network for Visual Speech Recognition Applications. EURASIP Journal on Applied Signal Processing 2002(11), 1248–1259 (2002)

    Article  MATH  Google Scholar 

  5. Gupta, D., Singh, P., Laxmi, V., Gaur, M.S.: Comparison of Parametric Visual Features For Speech Recognition. In: Proceedings of the IEEE International Conference on Network Communication and Computer (ICNCC 2011), pp. 432–435 (2011)

    Google Scholar 

  6. Kulkarni, A.D.: Artificial neural networks for image understanding. Van Nostrand Reinhold, New York (1994)

    MATH  Google Scholar 

  7. Kumar, K., Chen, T.H., Stern, R.M.: Profile View Lip Reading. In: International Conference on Acoustics, Speech, and Signal Processing, pp. IV: 429–432 (2007)

    Google Scholar 

  8. Lee, J.S., Park, C.H.: Hybrid Simulated Annealing and Its Application to Optimization of Hidden Markov Models for Visual Speech Recognition. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 40(4), 1188–1196 (2010)

    Article  Google Scholar 

  9. Liew, A., Leung, S.H., Lau, W.H.: Lip contour extraction from colour images using a deformable model. Pattern Recognition 35(12), 2949–2962 (2002)

    Article  MATH  Google Scholar 

  10. Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of Visual Features for Lipreading. IEEE Trans. Pattern Analysis and Machine Intelligence 24(2), 198–213 (2002)

    Article  Google Scholar 

  11. Neely, K.: Effect of visual factors on the intelligibility of speech. Journal of the Acoustical Society of America 28(6), 1275–1277 (1956)

    Article  Google Scholar 

  12. Petajan, E., Bischoff, B., Bodoff, D., Brooke, N.M.: An improved automatic lipreading system to enhance speech recognition. In: Proceedings of the SIGCHI conference on Human factors in computing systems (CHI 1988), pp. 19–25 (1988)

    Google Scholar 

  13. Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in the Automatic Recognition of Audiovisual Speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)

    Article  Google Scholar 

  14. Saitoh, T., Konishi, R.: Lip Reading using Video and Thermal Images. In: Proceedings of the International Joint Conference (SICE-ICASE 2006), pp. 5011–5015 (2006)

    Google Scholar 

  15. Singh, P., Laxmi, V., Gupta, D., Gaur, M.S.: Lipreading Using Gram Feature Vector. In: Advances in Soft Computing, vol. 85, pp. 81–88. Springer, Heidelberg (2010)

    Google Scholar 

  16. University of Waikato.: Recent Advances in the Automatic Recognition of Audiovisual Speech. Open Source Machine Learning Software WEKA, http://www.cs.waikato.ac.nz/ml/weka/

  17. Yau, W.C., Weghorn, H., Kumar, D.K.: Visual Speech Recognition and Utterance Segmentation Based on Mouth Movement. In: 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications, pp. 7–14 (2007)

    Google Scholar 

  18. Zhang, X., Mersereau, R.M., Clements, M., Broun, C.C.: Visual speech feature extraction for improved speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1993), vol. 2, pp II–II (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Singh, P., Gupta, D., Laxmi, V., Gaur, M.S. (2011). Contribution of Oral Periphery on Visual Speech Intelligibility. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22714-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22714-1_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22713-4

  • Online ISBN: 978-3-642-22714-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics