Representing human motion with FADE and U-FADE: an efficient frequency-domain approach
- 88 Downloads
Abstract
In this work, we present FADE, a frequency-based descriptor to encode human motion. FADE is simple, and provides high compression rate and low computational complexity. In order to reduce space and time complexity, we exploit the biomechanical property that human motion is bounded in frequency. FADE and U-FADE can be used in combination with both supervised and unsupervised learning approaches in order to classify and cluster human actions, respectively. We present also a branch of FADE, called Uncompressed FADE (U-FADE). U-FADE performs well in combination with some unsupervised algorithms such as spectral clustering, paying the price of a reduced compression rate. Also, U-FADE performs in general better than FADE well with small datasets. We tested our descriptors with well-known, public motion databases, such as HDM05, Berkeley MHAD, and MSR. Moreover, we compared FADE and U-FADE with diverse state of the art approaches.
Keywords
Human action recognition Motion analysis Descriptors for human motionNotes
Acknowledgements
This work has been supported by the Marie Curie Action LEACON, EU project 659265, and by the Technical University of Munich, International Graduate School of Science and Engineering (IGSSE).
References
- Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
- Bissacco, A., Chiuso, A., & Ma, Y., Soatto, S. (2001). Recognition of human gaits. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. II-52–II-57).Google Scholar
- Cavallo, A., & Falco, P. (2014). Online segmentation and classification of manipulation actions from the observation of kinetostatic data. IEEE Transactions on Human–Machine Systems, 44(2), 256–269.CrossRefGoogle Scholar
- Chen, X., & Koskela, M. (2015). Skeleton-based action recognition with extreme learning machines. Neurocomputing, 149, 387–396.CrossRefGoogle Scholar
- Chen, C., Liu, K., & Kehtarnavaz, N. (2016). Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing, 12(1), 155–163. https://doi.org/10.1007/s11554-013-0370-1.CrossRefGoogle Scholar
- Cho, K., & Chen, X. (2014). Classifying and visualizing motion capture sequences using deep neural networks. In International conference on computer vision theory and applications (VISAPP) (Vol. 2, pp. 122–130).Google Scholar
- De Schutter, J., Di Lello, E., De Schutter, J. F., Matthysen, R., Benoit, T., & De Laet, T. (2011). Recognition of 6 dof rigid body motion trajectories using a coordinate-free representation. In IEEE international conference on robotics and automation (pp. 2071–2078).Google Scholar
- Di Benedetto, A., Palmieri, F. A., Cavallo, A., & Falco, P. (2016). A hidden Markov model-based approach to grasping hand gestures classification. In B. Simone, E. Anna, M. F. Carlo, & P. Eros (Eds.), Advances in neural networks (pp. 415–423). Cham: Springer International Publishing.CrossRefGoogle Scholar
- Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1110–1118)Google Scholar
- Dyn, N., Levin, D., & Rippa, S. (1990). Data dependent triangulations for piecewise linear interpolation. IMA Journal of Numerical Analysis, 10(1), 137–154.MathSciNetCrossRefzbMATHGoogle Scholar
- Evangelidis, G., Singh, G., & Horaud, R. (2014). Skeletal quads: Human action recognition using joint quadruples. In International conference on pattern recognition Google Scholar
- Falco, P., Saveriano, M., Hasany, E. G., Kirk, N. H., & Lee, D. (2017). A human action descriptor based on motion coordination. IEEE Robotics and Automation Letters, 2(2), 811–818.CrossRefGoogle Scholar
- Forestier, N., & Nougier, V. (1998). The effects of muscular fatigue on the coordination of a multijoint movement in human. Neuroscience Letters, 252(3), 187–190.CrossRefGoogle Scholar
- Gowayyed, M. A., Torki, M., Hussein, M. E., & El-Saban, M. (2013). Histogram of oriented displacements (hod): Describing trajectories of human joints for action recognition. In International joint conference on artificial intelligence Google Scholar
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefzbMATHGoogle Scholar
- Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.CrossRefGoogle Scholar
- Kulić, D., Ott, C., Lee, D., Ishikawa, J., & Nakamura, Y. (2012). Incremental learning of full body motion primitives and their sequencing through human motion observation. The International Journal of Robotics Research, 31(3), 330–345.CrossRefGoogle Scholar
- Kulić, D., Takano, W., & Nakamura, Y. (2008). Incremental learning, clustering and hierarchy formation of whole body motion patterns using adaptive hidden markov chains. The International Journal of Robotics Research, 27(7), 761–784.CrossRefGoogle Scholar
- Le Naour, T., Courty, N., & Gibet, S. (2012). Fast motion retrieval with the distance input space. In M. Kallmann, & K. Bekris (Eds.), Motion in Games: Proceedings of the 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Berlin, Heidelberg: Springer.Google Scholar
- Lee, D., Ott, C., & Nakamura, Y. (2009). Mimetic communication with impedance control for physical human-robot interaction. In IEEE international conference on robotics and automation (Vol. 2009, pp. 1535–1542).Google Scholar
- Lee, D., Soloperto, R., & Saveriano, M. (2017). Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Autonomous Robots, 42, 1–21.Google Scholar
- Leightley, D., Li, B., McPhee, J. S., Yap, M. H., & Darby, J. (2014). Exemplar-based human action recognition with template matching from a stream of motion capture. In C. Aurélio & K. Mohamed (Eds.), Image analysis and recognition (pp. 12–20). Cham: Springer International Publishing.Google Scholar
- Li, W., Zhang, Z., & Liu, Z. (2010). Action recognition based on a bag of 3d points. In IEEE computer society conference on computer vision and pattern recognition-workshops (pp. 9–14).Google Scholar
- Liu, J., Shahroudy, A., Xu, D., & Wang, G. (2016). Spatio-temporal lstm with trust gates for 3d human action recognition. In European conference on computer vision (pp. 816–833). Springer.Google Scholar
- Lovász, L., & Plummer, M. D. (2009). Matching theory (Vol. 367). Providence: American Mathematical Society.zbMATHGoogle Scholar
- Mahasseni, B., & Todorovic, S. (2016). Regularizing long short term memory with 3d human-skeleton sequences for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3054–3062).Google Scholar
- Medina, J. R., Lawitzky, M., Mörtl, A., Lee, D., & Hirche, S. (2011). An experience-driven robotic assistant acquiring human knowledge to improve haptic cooperation. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2416–2422). IEEE.Google Scholar
- Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2013). Berkeley MHAD: A comprehensive multimodal human action database. In IEEE workshop on applications of computer vision (pp. 53–60).Google Scholar
- Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (2014). Sequence of the most informative joints (smij): A new representation for human skeletal action recognition. Journal of Visual Communication and Image Representation, 25(1), 24–38.CrossRefGoogle Scholar
- Pervez, A., Ali, A., Ryu, J. H., & Lee, D. (2017). Novel learning from demonstration approach for repetitive teleoperation tasks. In World haptics conference (WHC) (pp. 60–65).Google Scholar
- Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
- Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. Transactions on Acoustics, Speech, and Signal Processing, 26, 43–49.CrossRefzbMATHGoogle Scholar
- Saveriano, M., & Lee, D. (2013). Invariant representation for user independent motion recognition. In IEEE international symposium on robot and human interactive communication (pp. 650–655).Google Scholar
- Schmidts, A. M., Lee, D., & Peer, A. (2011). Imitation learning of human grasping skills from motion and force data. In 2011 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1002–1007). IEEE.Google Scholar
- Shah, D., Falco, P., Saveriano, M., & Lee, D. (2016). Encoding human actions with a frequency domain approach. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5304–5311).Google Scholar
- Soloperto, R., Saveriano, M., & Lee, D. (2015). A bidirectional invariant representation of motion for gesture recognition and reproduction. In IEEE international conference on robotics and automation (ICRA) (pp. 6146–6152).Google Scholar
- Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In IEEE conference on computer vision and pattern recognition (pp. 588–595).Google Scholar
- Walker, J. S. (1996). Fast fourier transforms (Vol. 24). Boca Raton: CRC Press.zbMATHGoogle Scholar
- Wang, C., Wang, Y., & Yuille, A. L. (2016). Mining 3d key-pose-motifs for action recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2639–2647).Google Scholar
- Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1290–1297).Google Scholar
- Wang, Q., Kurillo, G., Ofli, F., & Bajcsy, R. (2015). Unsupervised temporal segmentation of repetitive human actions based on kinematic modeling and frequency analysis. In 2015 international conference on 3D vision (pp. 562–570). https://doi.org/10.1109/3DV.2015.69.
- Xia, L., Chen, C. C., & Aggarwal, J. (2012). View invariant human action recognition using histograms of 3d joints. In IEEE computer society conference on computer vision and pattern recognition workshops (pp. 20–27).Google Scholar
- Xu, W., Liu, X., & Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In ACM SIGIR conference on research and development in information retrieval (pp. 267–273).Google Scholar
- Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., & Xie, X., et al. (2016). Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In AAAI (Vol. 2, p. 8).Google Scholar