Representing human motion with FADE and U-FADE: an efficient frequency-domain approach

  • Pietro Falco
  • Matteo Saveriano
  • Dharmil Shah
  • Dongheui Lee
Article
  • 22 Downloads

Abstract

In this work, we present FADE, a frequency-based descriptor to encode human motion. FADE is simple, and provides high compression rate and low computational complexity. In order to reduce space and time complexity, we exploit the biomechanical property that human motion is bounded in frequency. FADE and U-FADE can be used in combination with both supervised and unsupervised learning approaches in order to classify and cluster human actions, respectively. We present also a branch of FADE, called Uncompressed FADE (U-FADE). U-FADE performs well in combination with some unsupervised algorithms such as spectral clustering, paying the price of a reduced compression rate. Also, U-FADE performs in general better than FADE well with small datasets. We tested our descriptors with well-known, public motion databases, such as HDM05, Berkeley MHAD, and MSR. Moreover, we compared FADE and U-FADE with diverse state of the art approaches.

Keywords

Human action recognition Motion analysis Descriptors for human motion 

Notes

Acknowledgements

This work has been supported by the Marie Curie Action LEACON, EU project 659265, and by the Technical University of Munich, International Graduate School of Science and Engineering (IGSSE).

References

  1. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.MATHGoogle Scholar
  2. Bissacco, A., Chiuso, A., & Ma, Y., Soatto, S. (2001). Recognition of human gaits. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. II-52–II-57).Google Scholar
  3. Cavallo, A., & Falco, P. (2014). Online segmentation and classification of manipulation actions from the observation of kinetostatic data. IEEE Transactions on Human–Machine Systems, 44(2), 256–269.CrossRefGoogle Scholar
  4. Chen, X., & Koskela, M. (2015). Skeleton-based action recognition with extreme learning machines. Neurocomputing, 149, 387–396.CrossRefGoogle Scholar
  5. Chen, C., Liu, K., & Kehtarnavaz, N. (2016). Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing, 12(1), 155–163.  https://doi.org/10.1007/s11554-013-0370-1.CrossRefGoogle Scholar
  6. Cho, K., & Chen, X. (2014). Classifying and visualizing motion capture sequences using deep neural networks. In International conference on computer vision theory and applications (VISAPP) (Vol. 2, pp. 122–130).Google Scholar
  7. De Schutter, J., Di Lello, E., De Schutter, J. F., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In IEEE international conference on robotics and automation (pp. 2071–2078).Google Scholar
  8. Di Benedetto, A., Palmieri, F. A., Cavallo, A., & Falco, P. (2016). A hidden Markov model-based approach to grasping hand gestures classification. In B. Simone, E. Anna, M. F. Carlo, & P. Eros (Eds.), Advances in neural networks (pp. 415–423). Cham: Springer International Publishing.CrossRefGoogle Scholar
  9. Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1110–1118)Google Scholar
  10. Dyn, N., Levin, D., & Rippa, S. (1990). Data dependent triangulations for piecewise linear interpolation. IMA Journal of Numerical Analysis, 10(1), 137–154.MathSciNetCrossRefMATHGoogle Scholar
  11. Evangelidis, G., Singh, G., & Horaud, R. (2014). Skeletal quads: Human action recognition using joint quadruples. In International conference on pattern recognition Google Scholar
  12. Falco, P., Saveriano, M., Hasany, E. G., Kirk, N. H., & Lee, D. (2017). A human action descriptor based on motion coordination. IEEE Robotics and Automation Letters, 2(2), 811–818.CrossRefGoogle Scholar
  13. Forestier, N., & Nougier, V. (1998). The effects of muscular fatigue on the coordination of a multijoint movement in human. Neuroscience Letters, 252(3), 187–190.CrossRefGoogle Scholar
  14. Gowayyed, M. A., Torki, M., Hussein, M. E., & El-Saban, M. (2013). Histogram of oriented displacements (hod): Describing trajectories of human joints for action recognition. In International joint conference on artificial intelligence Google Scholar
  15. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefMATHGoogle Scholar
  16. Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.CrossRefGoogle Scholar
  17. Kulić, D., Ott, C., Lee, D., Ishikawa, J., & Nakamura, Y. (2012). Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 31(3), 330–345.CrossRefGoogle Scholar
  18. Kulić, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden markov chains. The International Journal of Robotics Research, 27(7), 761–784.CrossRefGoogle Scholar
  19. Le Naour, T., Courty, N., & Gibet, S. (2012). Fast motion retrieval with the distance input space. In M. Kallmann, & K. Bekris (Eds.), Motion in Games: Proceedings of the 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Berlin, Heidelberg: Springer.Google Scholar
  20. Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human-robot interaction. In IEEE international conference on robotics and automation (Vol. 2009, pp. 1535–1542).Google Scholar
  21. Lee, D., Soloperto, R., & Saveriano, M. (2017). Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Autonomous Robots, 42, 1–21.Google Scholar
  22. Leightley, D., Li, B., McPhee, J. S., Yap, M. H., & Darby, J. (2014). Exemplar-based human action recognition with template matching from a stream of motion capture. In C. Aurélio & K. Mohamed (Eds.), Image analysis and recognition (pp. 12–20). Cham: Springer International Publishing.Google Scholar
  23. Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In IEEE computer society conference on computer vision and pattern recognition-workshops (pp. 9–14).Google Scholar
  24. Liu, J., Shahroudy, A., Xu, D., & Wang, G. (2016). Spatio-temporal lstm with trust gates for 3d human action recognition. In European conference on computer vision (pp. 816–833). Springer.Google Scholar
  25. Lovász, L., & Plummer, M. D. (2009). Matching theory (Vol. 367). Providence: American Mathematical Society.MATHGoogle Scholar
  26. Mahasseni, B., & Todorovic, S. (2016). Regularizing long short term memory with 3d human-skeleton sequences for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3054–3062).Google Scholar
  27. Medina, J. R., Lawitzky, M., Mörtl, A., Lee, D., & Hirche, S. (2011). An experience-driven robotic assistant acquiring human knowledge to improve haptic cooperation. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2416–2422). IEEE.Google Scholar
  28. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive multimodal human action database. In IEEE workshop on applications of computer vision (pp. 53–60).Google Scholar
  29. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2014). Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation, 25(1), 24–38.CrossRefGoogle Scholar
  30. Pervez, A., Ali, A., Ryu, J. H., & Lee, D. (2017). Novel learning from demonstration approach for repetitive teleoperation tasks. In World haptics conference (WHC) (pp. 60–65).Google Scholar
  31. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  32. Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26, 43–49.CrossRefMATHGoogle Scholar
  33. Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In IEEE international symposium on robot and human interactive communication (pp. 650–655).Google Scholar
  34. Schmidts, A. M., Lee, D., & Peer, A. (2011). Imitation learning of human grasping skills from motion and force data. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1002–1007). IEEE.Google Scholar
  35. Shah, D., Falco, P., Saveriano, M., & Lee, D. (2016). Encoding human actions with a frequency domain approach. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5304–5311).Google Scholar
  36. Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In IEEE international conference on robotics and automation (ICRA) (pp. 6146–6152).Google Scholar
  37. Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In IEEE conference on computer vision and pattern recognition (pp. 588–595).Google Scholar
  38. Walker, J. S. (1996). Fast fourier transforms (Vol. 24). Boca Raton: CRC Press.MATHGoogle Scholar
  39. Wang, C., Wang, Y., & Yuille, A. L. (2016). Mining 3d key-pose-motifs for action recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2639–2647).Google Scholar
  40. Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1290–1297).Google Scholar
  41. Wang, Q., Kurillo, G., Ofli, F., & Bajcsy, R. (2015). Unsupervised temporal segmentation of repetitive human actions based on kinematic modeling and frequency analysis. In 2015 international conference on 3D vision (pp. 562–570).  https://doi.org/10.1109/3DV.2015.69.
  42. Xia, L., Chen, C. C., & Aggarwal, J. (2012). View invariant human action recognition using histograms of 3d joints. In IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20–27).Google Scholar
  43. Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In ACM SIGIR conference on research and development in information retrieval (pp. 267–273).Google Scholar
  44. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., & Xie, X., et al. (2016). Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In AAAI (Vol. 2, p. 8).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Human-centered Assistive RoboticsTechnical University of MunichMunichGermany
  2. 2.Department of Automation SolutionsABB Corporate ResearchVästeråsSweden
  3. 3.Institute of Robotics and Mechatronics, German Aerospace CenterWesslingGermany

Personalised recommendations