The 3D Human Motion Control Through Refined Video Gesture Annotation



In the beginning of computer and video game industry, simple game controllers consisting of buttons and joysticks were employed, but recently game consoles are replacing joystick buttons with novel interfaces such as the remote controllers with motion sensing technology on the Nintendo Wii [1] Especially video-based human computer interaction (HCI) technique has been applied to games, and the representative game is ‘Eyetoy’ on the Sony PlayStation 2. Video-based HCI technique has great benefit to release players from the intractable game controller. Moreover, in order to communicate between humans and computers, video-based HCI is very crucial since it is intuitive, easy to get, and inexpensive. On the one hand, extracting semantic low-level features from video human motion data is still a major challenge. The level of accuracy is really dependent on each subject’s characteristic and environmental noises. Of late, people have been using 3D motion-capture data for visualizing real human motions in 3D space (e.g, ‘Tiger Woods’ in EA Sports, ‘Angelina Jolie’ in Bear-Wolf movie) and analyzing motions for specific performance (e.g, ‘golf swing’ and ‘walking’). 3D motion-capture system (‘VICON’) generates a matrix for each motion clip. Here, a column is corresponding to a human’s sub-body part and row represents time frames of data capture. Thus, we can extract sub-body part’s motion only by selecting specific columns. Different from low-level feature values of video human motion, 3D human motion-capture data matrix are not pixel values, but is closer to human level of semantics.


Human Motion Motion Capture Golf Swing Motion Capture Data Motion History Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    D. Marshall, T. Ward, and S. McLoone. From Chasing Dots to Reading Minds: The Past, Present, and Future of Video Game Interaction. ACM Crossroads, 13(2), Fall 2006.Google Scholar
  2. 2.
    Neil Robertson and Ian Reid, “A general method for human activity recognition in video”, Comput. Vis. Image Underst., 104(2), 2006, 232–248.CrossRefGoogle Scholar
  3. 3.
    Zelnik-Manor, L. Irani, M., “Event-based analysis of video”, Computer Vision and Pattern Recognition, 2001. CVPR.Google Scholar
  4. 4.
    Iwai, Y. Shimizu, H. Yachida, M., “A Method for Human Action Recognition”, Image and Vision Computing, 21, 2003, 729–743.CrossRefGoogle Scholar
  5. 5.
    MinJe Park, Min Gyu Choi, Yoshiisa Shinagawa and Sung Yong Shi, “Video-Guided Motion Synthesis Using Example Motions”, ACM Transactions on Graphics (TOG), 25(4), 2006, 1327–1359.CrossRefGoogle Scholar
  6. 6.
    Jinxiang Chai and Jessica K. Hodgins, “Performance Animation from Low-dimensional Control Signals”, ACM Transactions on Graphics (TOG), 24(3), 2005, 686–696.CrossRefGoogle Scholar
  7. 7.
    Jin-xiang Chai and Jing Xiao and Jessica Hodgins, “Vision-based Control of 3D Facial Animation”, SCA ’03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, 2003, 193–196.Google Scholar
  8. 8.
    David P. Gibson and Neill W. Campbell and Colin J. Dalton and Barry T. Thomas, “Extraction of Motion Data from Image Sequences to Assist Animators”, Proceedings of the British Machine Vision Conference 2000, 302–311.Google Scholar
  9. 9.
    MK Hu, “Visual pattern recognition by moment invariants”, IRE Trans. Information Theory, vol. 8, no. 2, pp.179–187, 1962.CrossRefGoogle Scholar
  10. 10.
    J. Barbic and A. Safonova and J. Pan and C. Faloutsos and J. Hodgins and N. Pollard, “Segmenting Motion Capture Data into Distinct Behaviors”, In Proceedings of Graphics Interface (GI’04), 2004.Google Scholar
  11. 11.
    Aaron F. Bobick, James W. Davis, “The Recognition of Human Movement Using Temporal Templates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 2001, 257–267.CrossRefGoogle Scholar
  12. 12.
    CMU Motion Capture Database,
  13. 13.
    UTD Motion Data Online Repository,
  14. 14.
    Intel Open Source Computer Vision Library, opencv/
  15. 15.
    Yamato, J. Ohya, J. and Ishii, K. “Recognizing Human Action in Time-Sequential Images using Hidden Markov Model”, Computer Vision and Pattern Recognition, 1992. Proceedings CVPR ’92, 379–385.Google Scholar
  16. 16.
    J. Yang and Y. Xu, “Hidden Markov Model for Gesture Recognition”, Atech. report CMU-RI-TR-94-10, Robotics Institute, Carnegie Mellon University, May, 1994.Google Scholar
  17. 17.
    Yohan Jin, B. Prabhakaran, “Semantic Quantization of 3D Human Motion Data Through Spatial-Temporal Feature Extraction”, Proc. of 14th International Conference on Multimedia Modeling (MMM2008), Kyoto, Japan, Jan 9–11, 2008.Google Scholar
  18. 18.
    Gerhard Rigoll and Andreas Kosmala and Stefan Eickeler, “High Performance Real-Time Gesture Recognition Using Hidden Markvo Models”, Lecture Notes in Computer Science, 1998.Google Scholar
  19. 19.
    Stefan Eickeler and Andreas Kosmala and Gerhard Rigoll, “Hidden Markov Model Based Continuous Online Gesture Recognition”, Int. Conference on Pattern Recognition (ICPR), 1206–1208, 1998.Google Scholar
  20. 20.
    Lawrence R. Rabiner, “A tutorial on Hidden Markov Models and Seleted Applications in Speech Recognition”, Readings in speech recognition}, 1990, 267–296, Morgan Kaufmann Publishers Inc.Google Scholar
  21. 21.
    P.S. Huang, C.J. Harris, M.S. Nixon, “Human gait recognition in canonical space using temporal template”, IEEE Proceedings of VISP 14(2), 1999.Google Scholar
  22. 22.
    Y. Yacoob, M.J. Black, “Parameterized modeling and recognition of activities”, Computer Vision and Image Understanding, 73(2), 1999, 232–247.CrossRefGoogle Scholar
  23. 23.
    Chuanjun Li, S. Q. Zheng and B. Prabhakaran, “Segmentation and Recognition of Motion Streams by Similarity Search”, The ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP), Vol. 3(3), August 2007.Google Scholar
  24. 24.
    Tai-Peng Tian and Stan Sclaroff, “Handsignals Recognition From Video Using 3D Motion Capture Data”, Proceedings of the IEEE Workshop on Motion and Video Computing, 2005.Google Scholar
  25. 25.
    Arikan, O., Forsyth, D. A., and O’Brien, J. 2003. Motion synthesis from annotations. ACM Transactions on Graphics 22, 3, 402–408.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.MySpace (Fox Interactive Media)Beverly HillsUSA
  2. 2.Department of Computer ScienceUniversity of Texas at DallasDallasUSA

Personalised recommendations