The Kinect skeleton tracker can achieve considerable performance with human body tracking in a convenient and low-cost manner. However, the tracker often captures unnatural human poses, such as discontinuous and vibrational movement when self-occlusions occur. In this study, we propose an advanced post-processing method to improve the Kinect skeleton using a single Kinect sensor, in which a combination of probabilistic filtering techniques and supervised learning techniques is employed to correct unnatural tracking movements. Specifically, two deep recurrent neural networks are used to improve joint velocities, as well as joint positions produced by the Kinect skeleton tracker. Moreover, a classic Kalman filter further refines positions and velocities. In addition, we propose a novel measure to evaluate the naturalness of captured joint trajectories. We evaluated the proposed approach by comparing it to ground truth obtained using a commercial optical maker-based motion capture system.
This is a preview of subscription content, log in to check access.
Buy single article
Instant unlimited access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, and Blake A (2011) Real-time human pose recognition in parts from single depth images. In: International conference on computer vision and pattern recognition (CVPR)
Rumelhart D, Hinton G, Williams R (1986) Learning representations by backpropagating errors. Nature 323(6088):533–536
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Goodfellow I, Warde-Farley D, Mirza M, Courville A, and Bengio Y (2013) Maxout networks. In: ICML
Le Roux N, Bengio Y (2010) Deep belief networks are compact universal approximators. Neural Comput 22(8):2192–2207
Delalleau O. and Bengio Y (2011) Shallow vs. deep sum-product networks. In: NIPS
Krizhevsky A, Sutskever, and Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: NIPS
Hochreiter S, Schmidhuber J (1997) Long short-term memory? Neural Comput 9(8):1735–1780
Park S, Trivedi M (2008) Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput Vis Image Understand 111(1):2–20
Ziegler J, Nickel K, and Stiefelhagen R (2006) Tracking of the articulated upper body on multi-view stereo image sequences. In: Proceedings computer vision and pattern recognition
Hofmann M, Gavrila D (2011) Multi-view 3D human pose estimation in complex environment. Int J Comput Vis 96(1):103–124
Baak A, Muller M, Bharaj G, Seidel H.-P, and Theobalt C (2011) A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV, pp 1092–1099
Zhang Q, Song X, Shao X, Shibasaki R, Zhao H (2013) ‘Unsupervised skeleton extraction and motion capture from 3D deformable matching. Neurocomputing 100:170–182
Zhang L, Sturm J, Cremers D, and Lee D. (2012) Real-time human motion tracking using multiple depth cameras. In: Proceedings of the international conference on intelligent robot systems (IROS)
Liu Y, Gall J, Stoll C, Dai Q, Seidel H-P, Theobalt C (2013) Markerless motion capture of multiple characters using multi-view image segmentation. IEEE Trans Pattern Anal Mach Intell 35(11):2720–2735
Masse J-T, Lerasle F, Devy M, Monin A, Lefebvre O, Mas S (2013) Human motion capture using data fusion of multiple skeleton data. ACIVS, volume 8192 of lecture notes in computer science. Springer, Berlin, pp 126–137
Moon S, Park Y, Ko DW, Suh IH (2016) Multiple kinect sensor fusion for human skeleton tracking using Kalman filtering. Int J Adv Robot Syst 13:65
Yeung KY, Kwok TH, Wang CL (2013) Improved Skeleton tracking by duplex kinects: a practical approach for real-time applications. J Comput Inf Sci Eng 13(4):1–10
Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model? J Neurosci 5(7):1688–1703
Thobbi A, Gu Y, and Sheng W (2011) Using human motion estimation for human–robot cooperative manipulation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)
Corteville B. Aertbelien E, Bruyninckx H, De Schutter J, and Van Brussel H (2007) Human-inspired robot assistant for fast point-to-point movements? In: IEEE international conference on robotics and automation
Lv F, and Nevatia R (2006) Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In: ECCV, pp 359–372
Wang Q, Kurillo G, Ofli F, and Bajcsy R (2015) Evaluation of pose tracking accuracy in the first and second generations of Microsoft Kinect. In: 2015 international conference on healthcare informatics (ICHI). IEEE
Liu DC, Nocedal J (1989) On the limited memory method for large scale optimization. Math Program B 45(3):503–528
This work was supported by the Technology Innovation Industrial Program funded by the Ministry of Trade, (MI, South Korea) [10073161 & 10048320, Technology Innovation Program], as well as by Institute for Information & communications Technology Promotion (IITP) grant funded by MSIT (No. 2018-0-00622).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Kim, J.B., Park, Y. & Suh, I. . Tracking human-like natural motion by combining two deep recurrent neural networks with Kalman filter. Intel Serv Robotics 11, 313–322 (2018). https://doi.org/10.1007/s11370-018-0255-z
- Human skeleton tracking
- Deep recurrent neural network
- Kalman filter