Physics-based keyframe selection for human motion summarization

  • Athanasios VoulodimosEmail author
  • Ioannis Rallis
  • Nikolaos Doulamis


Analysis of human motion is a field of research that attracts significant interest because of the wide range of associated application domains. Intangible Cultural Heritage (ICH), including the performing arts and in particular dance, is one of the domains where related research is especially useful and challenging. Effective keyframe selection from motion sequences can provide an abstract and compact representation of the semantic information encoded therein, contributing towards useful functionality, such as fast browsing, matching and indexing of ICH content. The availability of powerful 3D motion capture sensors along with the fact that video summarization techniques are not always applicable to the particular case of dance movement create the need for effective and efficient summarization techniques for keyframe selection from 3D human motion capture data sequences. In this paper, we introduce two techniques: a “time-independent” method based on k-means++ clustering algorithm for the extraction of prominent representative instances of a dance, and a physics-based technique that creates temporal summaries of the sequence at different levels of detail. The proposed methods are evaluated on two dance motion datasets and show promising results.


Motion capture data Motion summarization Kinematics 3D Keyframe selection Dance analysis 



  1. 1.
    [Online]: VICON Motion Capture Systems (2017). Accessed 4 Dec 2018
  2. 2.
    Arai K, Barakbah AR (2007) Hierarchical k-means: an algorithm for centroids initialization for k-means. Rep Fac Sci Engrg 36(1):25–31Google Scholar
  3. 3.
    Arikan, O. (2006). Compression of motion capture databases. In: ACM Transactions on Graphics (TOG) (Vol. 25, No. 3, p 890–897). ACMGoogle Scholar
  4. 4.
    Aristidou A, Charalambous P, Chrysanthou Y (2015) Emotion analysis and classification: Understanding the performers’ emotions using the lma entities. In: Computer Graphics Forum, vol. 34. Wiley Online Library, p 262–276Google Scholar
  5. 5.
    Aristidou A, Stavrakis E, Charalambous P, Chrysanthou Y, Loizidou Himona S (2015) Folk dance evaluation using laban movement analysis. ACM JOCCH 8:20:1–20:19Google Scholar
  6. 6.
    Aristidou A, Stavrakis E, Papaefthimiou M, Papagiannakis G, Chrysanthou Y (2017) Style-based motion analysis for dance composition. Vis Comput.
  7. 7.
    Aristidou A, Zeng Q, Stavrakis E, Yin K, Daniel C, Chrysanthou Y, Chen B (2017) Emotion control of unstructured dance movements. In: Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA ‘17, p 9:1–9:10. ACM, New York, NY, USA.
  8. 8.
    Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Pro- ceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, p 1027–1035Google Scholar
  9. 9.
    Assa J, Caspi Y, Cohen-Or D (2005) Action synopsis: pose selection and illustration. In: ACM Transactions on Graphics (TOG) (Vol. 24, No. 3, p 667–676). ACMGoogle Scholar
  10. 10.
    Baraff, D. (1997) Physically based modeling: Principles and practice implicit methods for differential equations. In: SIGGRAPH, vol. 97, p E1–E4Google Scholar
  11. 11.
    Barbič J, Safonova A, Pan JY, Faloutsos C, Hodgins JK, Pollard NS (2004) Segment- ing motion capture data into distinct behaviors. In: Proceedings of Graphics Interface 2004. Canadian Human-Computer Communications Society, p 185–194Google Scholar
  12. 12.
    Bernard J, Wilhelm N, Krüger B, May T, Schreck T, Kohlhammer J (2013) Motionex- plorer: exploratory search in human motion capture data based on hierarchical aggre- gation. IEEE Trans Vis Comput Graph 19(12):2257–2266CrossRefGoogle Scholar
  13. 13.
    Bulut E, Capin T (2007) Key frame extraction from motion capture data by curve saliency. In Computer animation and social agents, p. 119Google Scholar
  14. 14.
    Chai J, Hodgins JK (2005) Performance animation from low-dimensional control signals. In: ACM Transactions on Graphics (ToG), 24:686–696. ACMGoogle Scholar
  15. 15.
    Chen C, Zhuang Y, Nie F, Yang Y, Wu F, Xiao J (2011) Learning a 3d human pose distance metric from geometric pose descriptor. IEEE Trans Vis Comput Graph 17(11):1676–1689. CrossRefGoogle Scholar
  16. 16.
    Chen S, Sun Z, Zhang Y (2015) Scalable organization of collections of motion capture data via quantitative and qualitative analysis. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, p. 411–418Google Scholar
  17. 17.
    Choi B, Lewis JP, Seol Y, Hong S, Eom H, Jung S, Noh J (2016) SketchiMo: sketch-based motion editing for articulated characters. ACM Trans Graph 35(4):146CrossRefGoogle Scholar
  18. 18.
    Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227CrossRefGoogle Scholar
  19. 19.
    Doulamis ND, Voulodimos AS, Kosmopoulos DI, Varvarigou TA (2010) Enhanced human behavior recognition using hmm and evaluative rectification. In: Proceedings of the First ACM International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, ARTEMIS ‘10. ACM, New York, p 39–44.
  20. 20.
    Doulamis A, Voulodimos A, Doulamis N, Soile S, Lampropoulos A (2017) Transform- ing intangible folkloric performing arts into tangible choreographic digital objects: the terpsichore approach. In: Proceedings of the 12th International Joint Confer- ence on Computer Vision, Imaging and Computer Graphics Theory and Applica- tions - Volume 5: CVICG4CULT. INSTICC, SciTePress, p 451–460.
  21. 21.
    Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, p 1600–1607. IEEEGoogle Scholar
  22. 22.
    Field M, Stirling D, Pan Z, Ros M, Naghdy F (2015) Recognizing human motions through mixture modeling of inertial data. Pattern Recogn 48(8):2394–2406CrossRefGoogle Scholar
  23. 23.
    Forbes K, Fiume E (2005) An efficient search algorithm for motion data using weighted pca. In: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation. ACM, p 67–76Google Scholar
  24. 24.
    Halit C, Capin T (2011) Multiscale motion saliency for keyframe extraction from motion capture sequences. Comput Anim Virtual Worlds 22(1):3–14CrossRefGoogle Scholar
  25. 25.
    Hisatomi K, Katayama M, Tomiyama K, Iwadate Y (2011) 3d archive system for tradi- tional performing arts. Int J Comput Vis 94(1):78–88CrossRefGoogle Scholar
  26. 26.
    Kapadia M, Chiang IK, Thomas T, Badler NI, Kider JT Jr et al. (2013) Efficient motion retrieval in large motion databases. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. ACM, p 19–28Google Scholar
  27. 27.
    Kitsikidis A, Dimitropoulos K, Douka S, Grammalidis N (2014) Dance analysis using multiple kinect sensors. In: Computer Vision Theory and Applications (VISAPP), 2014 International Conference on, vol. 2. IEEE, p 789–795Google Scholar
  28. 28.
    Kovar L, Gleicher M (2004) Automated extraction and parameterization of motions in large data sets. In: ACM Transactions on Graphics (ToG), vol. 23. ACM, p. 559–568Google Scholar
  29. 29.
    Krüger B, Tautges J, Weber A, Zinke A (2010) Fast local and global similarity searches in large motion capture databases. In: Proceedings of the 2010 ACM SIG- GRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, p 1–10Google Scholar
  30. 30.
    Laganière R, Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proceedings of the 2Nd ACM TRECVid Video Summarization Workshop, TVS ‘08. ACM, New York, p 144–148.
  31. 31.
    Liu F, Zhuang Y, Wu F, Pan Y (2003) 3d motion retrieval with motion index tree. Comput Vis Image Underst 92(2–3):265–284CrossRefGoogle Scholar
  32. 32.
    Liu XM, Hao AM, Zhao D (2013) Optimization-based key frame extraction for motion capture animation. Vis Comput 29(1):85–95CrossRefGoogle Scholar
  33. 33.
    Makantasis K, Doulamis A, Doulamis N, Ioannides M (2016) In the wild image retrieval and clustering for 3d cultural heritage landmarks reconstruction. Multimed Tools Appl 75(7):3593–3629CrossRefGoogle Scholar
  34. 34.
    Müller M, Röder T (2006) Motion templates for automatic classification and retrieval of motion capture data. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation. Eurographics Association, p 137–146Google Scholar
  35. 35.
    Müller M, Röder T, Clausen M (2005) Efficient content-based retrieval of motion capture data. In: ACM Transactions on Graphics (ToG), vol. 24. ACM, p 677–685Google Scholar
  36. 36.
    Müller M, Baak A, Seidel HP (2009) Efficient and robust annotation of motion capture data. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Com- puter Animation. ACM, p 17–26Google Scholar
  37. 37.
    Protopapadakis E, Voulodimos A, Doulamis A, Camarinopoulos S, Doulamis N, Miaoulis G (2018) Dance pose identification from motion capture data: a comparison of classifiers. Technologies 6(1):31CrossRefGoogle Scholar
  38. 38.
    Protopapadakis E, Voulodimos A, Doulamis N (2018) Multidimensional trajectory similarity estimation via spatial-temporal keyframe selection and signal correlation analysis. In: Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference, PETRA ‘18. ACM, New York, p 91–97.
  39. 39.
    Rallis I, Georgoulas I, Doulamis N, Voulodimos A, Terzopoulos P (2017) Extraction of key postures from 3d human motion data for choreography summarization. In: 2017 9th International Conference on Virtual Worlds and Games for Serious Applications (VS-Games), p 94–101.
  40. 40.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefGoogle Scholar
  41. 41.
    Sheppard R, Kamali M, Rivas R, Tamai M, Yang Z, Wu W, Nahrstedt K (2008) Advancing interactive collaborative mediums through tele-immersive dance (TED): A symbiotic creativity and design environment for art and computer science, p 579–588.
  42. 42.
    Vögele A, Krüger B, Klein R (2014) Efficient unsupervised temporal segmentation of human motion. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Eurographics Association, p 167–176Google Scholar
  43. 43.
    Voulodimos AS, Kosmopoulos DI, Doulamis ND, Varvarigou TA (2014) A top- down event-driven approach for concurrent activity recognition. Multimed Tools Appl 69(2):293–311. CrossRefGoogle Scholar
  44. 44.
    Wu T, Gurram P, Rao RM, Bajwa WU (2015) Hierarchical union-of-subspaces model for human activity summarization. In: 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), p 1053–1061.
  45. 45.
    Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, p 42–49Google Scholar
  46. 46.
    Zhou F, De la Torre F, Hodgins JK (2013) Hierarchical aligned cluster analysis for tem- poral clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35(3):582–596CrossRefGoogle Scholar
  47. 47.
    Zordan VB, Majkowska A, Chiu B, Fast M (2005) Dynamic response for motion capture animation. ACM Trans Graph 24(3):697–701. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Informatics and Computer EngineeringUniversity of West AtticaAthensGreece
  2. 2.National Technical University of AthensAthensGreece

Personalised recommendations