Skip to main content

Modality Combination Techniques for Continuous Sign Language Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7887))

Abstract

Sign languages comprise parallel aspects and use several modalities to form a sign but so far it is not clear how to best combine these modalities in the context of statistical sign language recognition. We investigate early combination of features, late fusion of decisions, as well as synchronous combination on the hidden Markov model state level, and asynchronous combination on the gloss level. This is done for five modalities on two publicly available benchmark databases consisting of challenging real-life data and less complex lab-data, the state-of-the-art typically focusses on. Using modality combination, the best published word error rate on the SIGNUM database (lab-data) is improved from 11.9% to 10.7% and from 55% to 41.9% on the RWTH-PHOENIX database (challenging real-life data).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. van Agris, U., Knorr, M., Kraiss, K.F.: The significance of facial features for automatic sign language recognition. In: FG, pp. 1–6 (September 2008)

    Google Scholar 

  2. Bengio, S.: An asynchronous hidden Markov model for audio-visual speech recognition. In: NIPS (2003)

    Google Scholar 

  3. Deng, J., Tsui, H.T.: A Two-step Approach based on PaHMM for the Recognition of ASL. In: ACCV (January 2002)

    Google Scholar 

  4. Dreuw, P., Deselaers, T., Rybach, D., Keysers, D., Ney, H.: Tracking using dynamic programming for appearance-based sign language recognition. In: FG, pp. 293–298 (2006)

    Google Scholar 

  5. Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J., Ney, H.: Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus. In: LREC (May 2012)

    Google Scholar 

  6. Gweth, Y., Plahl, C., Ney, H.: Enhanced continuous sign language recognition using PCA and neural network features. In: CVPR 2012 Workshop on Gesture Recognition (June 2012)

    Google Scholar 

  7. Hoffmeister, B., Schlüter, R., Ney, H.: Icnc and Irover: The limits of improving system combination with classification? In: Interspeech, pp. 232–235 (September 2008)

    Google Scholar 

  8. Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (September 2008)

    Google Scholar 

  9. Luettin, J., Potamianos, G., Neti, C.: Asynchronous stream modeling for large vocabulary audio-visual speech recognition. In: ICASSP, pp. 169–172 (2001)

    Google Scholar 

  10. Nakamura, S., Kumatani, K., Tamura, S.: Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition. In: Multimodal Interfaces, pp. 305–309 (2002)

    Google Scholar 

  11. Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 2002(1), 1274–1288 (2002)

    MATH  Google Scholar 

  12. Nefian, A.V., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., Murphy, K.: A coupled hmm for audio-visual speech recognition. In: ICASSP, pp. 2013–2016 (2002)

    Google Scholar 

  13. Ong, E.J., Cooper, H., Pugeault, N., Bowden, R.: Sign language recognition using sequential pattern trees. In: CVPR, June 16-21, pp. 2200–2207 (2012)

    Google Scholar 

  14. Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen University open source speech recognition system. In: INTERSPEECH, pp. 2111–2114 (2009)

    Google Scholar 

  15. Theodorakis, S., Katsamanis, A., Maragos, P.: Product-HMMs for Automatic Sign Language Recognition. In: ICASSP, pp. 1601–1604 (2009)

    Google Scholar 

  16. Tran, K., Kakadiaris, I.A., Shah, S.K.: Fusion of human posture features for continuous action recognition. In: ECCV Workshop on Sign, Gesture and Activity, SGA (2011)

    Google Scholar 

  17. Verma, A., Faruquie, T., Neti, C., Basu, S., Senior, A.: Late integration in audio-visual continuous speech recognition. In: ASRU (1999)

    Google Scholar 

  18. Vogler, C., Metaxas, D.: Parallel Hidden Markov Models for American Sign Language Recognition. In: ICCV, pp. 116–122 (1999)

    Google Scholar 

  19. Wang, C., Chen, X., Gao, W.: Expanding Training Set for Chinese Sign Language Recognition. In: FG (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Forster, J., Oberdörfer, C., Koller, O., Ney, H. (2013). Modality Combination Techniques for Continuous Sign Language Recognition. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds) Pattern Recognition and Image Analysis. IbPRIA 2013. Lecture Notes in Computer Science, vol 7887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38628-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38628-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38627-5

  • Online ISBN: 978-3-642-38628-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics