Abstract
Sign languages comprise parallel aspects and use several modalities to form a sign but so far it is not clear how to best combine these modalities in the context of statistical sign language recognition. We investigate early combination of features, late fusion of decisions, as well as synchronous combination on the hidden Markov model state level, and asynchronous combination on the gloss level. This is done for five modalities on two publicly available benchmark databases consisting of challenging real-life data and less complex lab-data, the state-of-the-art typically focusses on. Using modality combination, the best published word error rate on the SIGNUM database (lab-data) is improved from 11.9% to 10.7% and from 55% to 41.9% on the RWTH-PHOENIX database (challenging real-life data).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
van Agris, U., Knorr, M., Kraiss, K.F.: The significance of facial features for automatic sign language recognition. In: FG, pp. 1–6 (September 2008)
Bengio, S.: An asynchronous hidden Markov model for audio-visual speech recognition. In: NIPS (2003)
Deng, J., Tsui, H.T.: A Two-step Approach based on PaHMM for the Recognition of ASL. In: ACCV (January 2002)
Dreuw, P., Deselaers, T., Rybach, D., Keysers, D., Ney, H.: Tracking using dynamic programming for appearance-based sign language recognition. In: FG, pp. 293–298 (2006)
Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J., Ney, H.: Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus. In: LREC (May 2012)
Gweth, Y., Plahl, C., Ney, H.: Enhanced continuous sign language recognition using PCA and neural network features. In: CVPR 2012 Workshop on Gesture Recognition (June 2012)
Hoffmeister, B., Schlüter, R., Ney, H.: Icnc and Irover: The limits of improving system combination with classification? In: Interspeech, pp. 232–235 (September 2008)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (September 2008)
Luettin, J., Potamianos, G., Neti, C.: Asynchronous stream modeling for large vocabulary audio-visual speech recognition. In: ICASSP, pp. 169–172 (2001)
Nakamura, S., Kumatani, K., Tamura, S.: Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition. In: Multimodal Interfaces, pp. 305–309 (2002)
Nefian, A.V., Liang, L., Pi, X., Liu, X., Murphy, K.: Dynamic bayesian networks for audio-visual speech recognition. EURASIP J. Appl. Signal Process. 2002(1), 1274–1288 (2002)
Nefian, A.V., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., Murphy, K.: A coupled hmm for audio-visual speech recognition. In: ICASSP, pp. 2013–2016 (2002)
Ong, E.J., Cooper, H., Pugeault, N., Bowden, R.: Sign language recognition using sequential pattern trees. In: CVPR, June 16-21, pp. 2200–2207 (2012)
Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH Aachen University open source speech recognition system. In: INTERSPEECH, pp. 2111–2114 (2009)
Theodorakis, S., Katsamanis, A., Maragos, P.: Product-HMMs for Automatic Sign Language Recognition. In: ICASSP, pp. 1601–1604 (2009)
Tran, K., Kakadiaris, I.A., Shah, S.K.: Fusion of human posture features for continuous action recognition. In: ECCV Workshop on Sign, Gesture and Activity, SGA (2011)
Verma, A., Faruquie, T., Neti, C., Basu, S., Senior, A.: Late integration in audio-visual continuous speech recognition. In: ASRU (1999)
Vogler, C., Metaxas, D.: Parallel Hidden Markov Models for American Sign Language Recognition. In: ICCV, pp. 116–122 (1999)
Wang, C., Chen, X., Gao, W.: Expanding Training Set for Chinese Sign Language Recognition. In: FG (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Forster, J., Oberdörfer, C., Koller, O., Ney, H. (2013). Modality Combination Techniques for Continuous Sign Language Recognition. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds) Pattern Recognition and Image Analysis. IbPRIA 2013. Lecture Notes in Computer Science, vol 7887. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38628-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-38628-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38627-5
Online ISBN: 978-3-642-38628-2
eBook Packages: Computer ScienceComputer Science (R0)