Hand Gesture Recognition within a Linguistics-Based Framework

  • Konstantinos G. Derpanis
  • Richard P. Wildes
  • John K. Tsotsos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3021)


An approach to recognizing hand gestures from a monocular temporal sequence of images is presented. Of particular concern is the representation and recognition of hand movements that are used in single handed American Sign Language (ASL). The approach exploits previous linguistic analysis of manual languages that decompose dynamic gestures into their static and dynamic components. The first level of decomposition is in terms of three sets of primitives, hand shape, location and movement. Further levels of decomposition involve the lexical and sentence levels and are part of our plan for future work. We propose and demonstrate that given a monocular gesture sequence, kinematic features can be recovered from the apparent motion that provide distinctive signatures for 14 primitive movements of ASL. The approach has been implemented in software and evaluated on a database of 592 gesture sequences with an overall recognition rate of 86.00% for fully automated processing and 97.13% for manually initialized processing.


Apparent Motion Gesture Recognition American Sign Language Kinematic Feature Hand Shape 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. CVIU 73(3), 428–440 (1999)Google Scholar
  2. 2.
    Badler, N.: Temporal scene analysis: Conceptual descriptions of object movements. In: Dept. of Comp. Sc., Univ. of Toronto, Rep. TR-80 (1975)Google Scholar
  3. 3.
    Bergen, J.R., Anandan, P., Hanna, K.J., Hingorani, R.: Hierarchical model-based motion estimation. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588, pp. I:5–10. Springer, Heidelberg (1992)Google Scholar
  4. 4.
    Black, M.J., Anandan, P.: A framework for the robust estimation of optical flow. In: ICCV, pp. 231–236 (1993)Google Scholar
  5. 5.
    Black, M.J., Jepson, A.D.: A probabilistic framework for matching temporal trajectories. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. II:909–924. Springer, Heidelberg (1998)Google Scholar
  6. 6.
    Bobick, A.F., Wilson, A.D.: A state-based approach to the representation and recognition of gesture. PAMI 19(12), 1325–1337 (1997)Google Scholar
  7. 7.
    Darrell, T., Pentland, A.: Space-time gestures. In: CVPR, pp. 335–340 (1993)Google Scholar
  8. 8.
    Derpanis, K.G.: Vision based gesture recognition within a linguistics framework. Master’s thesis, York University, Toronto, Canada (2003)Google Scholar
  9. 9.
    Elgammal, A., Shet, V., Yacoob, Y., Davis, L.S.: Learning dynamics for exemplar-based gesture recognition. In: CVPR, pp. I:571–578 (2003)Google Scholar
  10. 10.
    Fels, S.S., Hinton, G.E.: Glove-talk II. Trans. on NN 9(1), 205–212 (1997)Google Scholar
  11. 11.
    Han, J., Kamber, M.: Data Mining. Morgan Kaufmann, San Francisco (2001)Google Scholar
  12. 12.
    Horn, B.K.P.: Robot Vision. MIT Press, Cambridge (1986)Google Scholar
  13. 13.
    Huber, P.J.: Robust Statistical Procedures. SIAM Press, Philadelphia (1977)zbMATHGoogle Scholar
  14. 14.
    Isard, M., Blake, A.: CONDENSATION - conditional density propagation for visual tracking. IJCV 29(1), 5–28 (1998)CrossRefGoogle Scholar
  15. 15.
    Jahne, B.: Digital Image Processing. Springer, Berlin (1991)Google Scholar
  16. 16.
    Koenderink, J.J., van Doorn, A.J.: Local structure of movement parallax of the plane. JOSA-A 66(7), 717–723 (1976)CrossRefGoogle Scholar
  17. 17.
    Lee, H.K., Kim, J.H.: An HMM-based threshold model approach for gesture recognition. PAMI 21(10), 961–973 (1999)Google Scholar
  18. 18.
    Liang, R.H., Ouhyoung, M.: A real-time continuous gesture recognition system for sign language. In: AFGR, pp. 558–567 (1998)Google Scholar
  19. 19.
    Lu, S., Metaxas, D., Samaras, D., Oliensis, J.: Using multiple cues for hand tracking and model refinement. In: CVPR, pp. II:443–450 (2003)Google Scholar
  20. 20.
    Pavlovic, V.I., Sharma, R., Huang, T.S.: Visual interpretation of hand gestures for human-computer interaction: A review. PAMI 19(7), 677–695 (1997)Google Scholar
  21. 21.
    Poizner, H., Bellugi, U., Lutes-Driscoll, V.: Perception of American Sign Language in dynamic point-light displays. J. of Exp. Psych. 7(2), 430–440 (1981)Google Scholar
  22. 22.
    Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  23. 23.
    Rui, Y., Anandan, P.: Segmenting visual actions based on spatio-temporal motion patterns. In: CVPR, pp. I:111–118 (2000)Google Scholar
  24. 24.
    Schlenzig, J., Hunter, E., Jain, R.: Vision based gesture interpretation using recursive estimation. In: Asilomar Conf. on Signals, Systems and Computers (1994)Google Scholar
  25. 25.
    Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3d human tracking. In: CVPR, pp. I: 69–76 (2003)Google Scholar
  26. 26.
    Starner, T., Weaver, J., Pentland, A.P.: Real-time American Sign Language recognition using desk and wearablecomputer based video. PAMI 20(12), 1371–1375 (1998)Google Scholar
  27. 27.
    Stokoe, W.C., Casterline, D., Croneberg, C.: A Dictionary of American Sign Language. Linstok Press, Washington (1965)zbMATHGoogle Scholar
  28. 28.
    Tsotsos, J.K., Mylopoulos, J., Covvey, H.D., Zucker, S.W.: A framework for visual motion understanding. PAMI 2(6), 563–573 (1980)zbMATHGoogle Scholar
  29. 29.
    Valli, C., Lucas, C.: Linguistics of American Sign Language: An Introduction. Gallaudet University Press, Washington (2000)Google Scholar
  30. 30.
    Vogler, C., Metaxas, D.: A framework for recognizing the simultaneous aspects of American Sign Language. CVIU 81(3), 358–384 (2001)zbMATHGoogle Scholar
  31. 31.
    Yang, M.H., Ahuja, N., Tabb, M.: Extraction of 2d motion trajectories and its application to hand gesture recognition. PAMI 24(8), 1061–1074 (2002)Google Scholar
  32. 32.
    Zarit, B., Super, B.J., Quek, F.: Comparison of five color models in skin pixel classification. In: RATFG, pp. 58–63 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Konstantinos G. Derpanis
    • 1
  • Richard P. Wildes
    • 1
  • John K. Tsotsos
    • 1
  1. 1.Department of Computer Science and, Centre for Vision Research (CVR)York UniversityTorontoCanada

Personalised recommendations