A Gaussian Mixture Representation of Gesture Kinematics for On-Line Sign Language Video Annotation

  • Fabio MartínezEmail author
  • Antoine Manzanera
  • Michèle Gouiffès
  • Annelies Braffort
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9475)


Sign languages (SLs) are visuo-gestural representations used by deaf communities. Recognition of SLs usually requires manual annotations, which are expert dependent, prone to errors and time consuming. This work introduces a method to support SL annotations based on a motion descriptor that characterizes dynamic gestures in videos. The proposed approach starts by computing local kinematic cues, represented as mixtures of Gaussians which together correspond to gestures with a semantic equivalence in the sign language corpora. At each frame, a spatial pyramid partition allows a fine-to-coarse sub-regional description of motion-cues distribution. Then for each sub-region, a histogram of motion-cues occurrence is built, forming a frame-gesture descriptor which can be used for on-line annotation. The proposed approach is evaluated using a bag-of-features framework, in which every frame-level histogram is mapped to an SVM. Experimental results show competitive results in terms of accuracy and time computation for a signing dataset.


Video Sequence Kinematic Feature Deaf Community Spatial Pyramid Motion Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is funded by the RTRA Digiteo project MAPOCA.


  1. 1.
    Neidle, C.: Signstream: A database tool for research on visual-gestural language. Sign Lang. Linguist. 4, 203–214 (2001)CrossRefGoogle Scholar
  2. 2.
    Gavrilov, Z., Sclaroff, S., Neidle, C., Dickinson, S.: Detecting reduplication in videos of american sign language. In: 5th Workshop on the Representation and Processing of Sign Language (RPSL) (2014)Google Scholar
  3. 3.
    Cooper, H., Holt, B., Bowden, R.: Sign language recognition (2011)Google Scholar
  4. 4.
    Braffort, A., Filhol, M.: Constraint-based sign language processing. Constraints and Language. Cambridge Scholar Publishing, Cambridge (2014)Google Scholar
  5. 5.
    Matthes, et. al: Elicitation tasks and materials designed for dicta sign’s multi-lingual corpus. In: 4th Workshop on the Representation and Processing of Sign Language (RPSL 2010) of (LREC 2010)Google Scholar
  6. 6.
    Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. cvpr (2014)Google Scholar
  7. 7.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. CVPR 2011, Washington, DC, USA, pp. 3169–3176. IEEE Computer Society (2011)Google Scholar
  8. 8.
    Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR. CVPR 2013, pp. 2555–2562 (2013)Google Scholar
  9. 9.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV 2013, Sydney, Australia, pp. 3551–3558. IEEE (2013)Google Scholar
  10. 10.
    Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: ICCV Workshops, pp. 514–521. IEEE (2009)Google Scholar
  11. 11.
    Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: ICCV 2011, pp. 1419–1426 (2011)Google Scholar
  12. 12.
    Braffort, A., Choisier, A., Collet, C., Dalle, P., Gianni, F., Lenseigne, B., Segouat, J.: Toward an annotation software for video of sign language, including image processing tools and signing space modelling. In: LREC 2004 (2004)Google Scholar
  13. 13.
    Garrigues, M., Manzanera, A.: Real time semi-dense point tracking. In: Campilho, A., Kamel, M. (eds.) ICIAR 2012, Part I. LNCS, vol. 7324, pp. 245–252. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  14. 14.
    Gaber, M.M., Stahl, F., Gomes, J.B.: Background. In: Gaber, M.M., Stahl, F., Gomes, J.B. (eds.) Pocket Data Mining. SBD, vol. 2, pp. 7–22. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  15. 15.
    Boutin, M.: Numerically invariant signature curves. Int. J. Comput. Vis. 40, 235–248 (2000)zbMATHCrossRefGoogle Scholar
  16. 16.
    Chang, C.C., Lin, C.J.: Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRefGoogle Scholar
  17. 17.
    Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA) (2010)Google Scholar
  18. 18.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR 2004. ICPR 2004, Washington, DC, USA, pp. 32–36 (2004)Google Scholar
  19. 19.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild". IEEE ICVPR (2009)Google Scholar
  20. 20.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  21. 21.
    Yu, T.H., Kim, T.K., Cipolla, R.: Real-time action recognition by spatio-temporal semantic and structural forest. BMVA Press 52(1-52), 12 (2010)Google Scholar
  22. 22.
    Cao, X., Zhang, H., Deng, C., Liu, Q., Liu, H.: Action recognition using 3d daisy descriptor. Mach. Vision Appl. 25, 159–171 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Fabio Martínez
    • 1
    • 2
    Email author
  • Antoine Manzanera
    • 2
  • Michèle Gouiffès
    • 1
  • Annelies Braffort
    • 1
  1. 1.LIMSI, CNRSUniversité Paris-SaclayParisFrance
  2. 2.U2IS/Robotics-Vision, ENSTA-ParisTechUniversité Paris-SaclayParisFrance

Personalised recommendations