Abstract
This paper presents a novel approach for motion segmentation by using strategies of splitting and remerging. The presented approach, Mossar, hybridizes two existing ones to obtain their potential advantages while covering weaknesses: (1) velocity-based, one of the most widely used approaches that has fairly low accuracy but provides computational simplicity and (2) graph-based, a state-of-the-art approach that provides outstanding accuracy, yet bears high computational complexity and a burden in setting of thresholds. An initial set of key frames is generated by a velocity-based splitting process and then fed into a graph-based remerging process for refinement. We present mechanisms that improve key-frames capturing in the velocity-based approach as well as details on how the graph-based approach is modified and later applied to remerging. The proposed approach also allows users to interactively add or reduce the number of key frames to control segmentation hierarchy without the need to change threshold values and re-run segmentation, as usually done in existing approaches. Our experimental results show that the presented hybrid approach, compared to both velocity-based and graph-based, demonstrates superior performance in terms of accuracy and in comparison to graph-based, our approach has not only less complexity but also a lesser number of thresholds, the values of which can be much more simply determined.
Similar content being viewed by others
Notes
ground truth: a set of key frames located at where the data should be partitioned to create segments
References
Bulut E, Capin T (2007) Key-frame extraction from motion capture data by curve saliency. In: Computer animation and social agents 2007 Jun (pp 119–123)
Carnegie Mellon University. CMU graphics lab motion capture database. http://mocap.cs.cmu.edu. Accessed 1 Jan 2018
Choensawat W, Nakamura M, Hachimura K (2015) GenLaban: a tool for generating Labanotation from motion capture data. Multimed Tools Appl 74(23):10823–10846
Devanne M, Wannous H, Pala P, Berretti S, Daoudi M, Del Bimbo A (2015) Combined shape analysis of human poses and motion units for action segmentation and recognition. In: Automatic face and gesture recognition (FG), 2015 11th IEEE international conference and workshops on 2015 May 4 (Vol. 7, pp 1–6). IEEE
Escalera S, Athitsos V, Guyon I (2016) Challenges in multimodal gesture recognition. J Mach Learn Res 17(72):1–54
Gong D, Medioni G, Zhu S, Zhao X (2012) Kernelized temporal cut for online temporal segmentation and recognition. Comput Vis–ECCV 2012:229–243
Hachimura K, Nakamura M (2001) Method of generating coded description of human body motion from motion-captured data. In: Robot and human interactive communication. Proceedings. 10th IEEE International Workshop on 2001 (pp 122–127). IEEE
ICE Lab Mossar. https://sites.google.com/site/icelabuki/mossar. Accessed 1 Jan 2018
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases 2002 Aug 20 (pp 406–417). VLDB Endowment
Khan S, Bailey DG, Gupta GS (2014) Pause detection in continuous sign language. Int J Comput Appl Technol 50(1–2):75–83
Krüger B, Vögele A, Willig T, Yao A, Klein R, Weber A (2017 Apr) Efficient unsupervised temporal segmentation of motion data. IEEE Trans Multimed 19(4):797–812
Li M, Leung H (2016) Graph-based representation learning for automatic human motion segmentation. Multimed Tools Appl 1;75(15):9205–9224
Li M, Leung H, Liu Z, Zhou L (2016) 3D human motion retrieval using graph kernels based on adaptive graph construction. Comput Graph 54:104–112
Lim IS, Thalmann D (2001) Key-posture extraction out of human motion data. In: Engineering in medicine and biology society. Proceedings of the 23rd annual international conference of the IEEE 2001 (Vol. 2, pp 1167–1169). IEEE
Liu F, Zhuang Y, Wu F, Pan Y (2003) 3D motion retrieval with motion index tree. Comput Vis Image Underst 92(2):265–284
Miura T, Mitobe K, Yukawa T, Kaiga T, Taniguchi T, Tamamoto H (2011) Adaptation of grouping structure analysis in GTTM to hierarchical segmentation of dance motion. Inf Media Technol 6(1):172–192
Northwestern University Process Optimization Open Textbook Spatial branch-and-bound. https://optimization.mccormick.northwestern.edu/index.php/Spatial_branch_and_bound_method. Accessed 1 Jan 2018
Paliyawan P, Sookhanaphibarn K, Choensawat W, Thawonmas R (2015) Towards universal kinect interface for fighting games. In Consumer Electronics (GCCE), 2015 I.E. 4th Global Conference on 2015 Oct 27 (pp 332–333). IEEE
Panagiotakis C, Holzapfel A, Michel D, Argyros AA (2013) Beat synchronous dance animation based on visual analysis of human motion and audio analysis of music tempo. In: International symposium on visual computing 2013 Jul 29 (pp 118–127). Springer Berlin Heidelberg
Ruffieux S, Lalanne D, Mugellini E (2013) ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI. In: Proceedings of the 15th ACM on International conference on multimodal interaction 2013 Dec 9 (pp 483–488). ACM
Shiratori T, Nakazawa A, Ikeuchi K 2018 Detecting dance motion structure using motion capture and musical information. In: Proc. international conference on virtual systems and multimedia (VSMM) 2004 Nov (Vol. 3)
Tunca C, Pehlivan N, Ak N, Arnrich B, Salur G, Ersoy C (2017) Inertial sensor-based robust gait analysis in non-hospital settings for neurological disorders. Sensors 17(4):825
Van Mieghem P (2010) Graph spectra for complex networks. Cambridge University Press, Cambridge ISBN-13: 9781107411470
Vlachos M, Hadjieleftheriou M, Gunopulos D, Keogh E (2003) Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining 2003 Aug 24 (pp 216–225). ACM
Vögele A, Krüger B, Klein R (2014) Efficient unsupervised temporal segmentation of human motion. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation 2014 Jul 21 (pp 167–176). Eurographics Association
Wang Q, Kurillo G, Ofli F, Bajcsy R (2015) Unsupervised temporal segmentation of repetitive human actions based on kinematic modeling and frequency analysis. In: 3D Vision (3DV), 2015 International Conference on 2015 Oct 19 (pp 562–570). IEEE
Wessa P. Kernel density estimation (v1.0.12) in free statistics software (v1.1.23-r7). http://www.wessa.net/rwasp_density.wasp. Accessed 1 Jan 2018
Yin Y, Davis R (2013) Gesture spotting and recognition using salience detection and concatenated hidden markov models. In: Proceedings of the 15th ACM on international conference on multimodal interaction 2013 Dec 9 (pp. 489–494). ACM
Zhou F, De la Torre F, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35(3):582–596
Acknowledgements
The authors wish to thank Professor Kingkarn Sookhanaphibarn, Bangkok University, for her assistance in the technical editing of the manuscript. We would also like to express special thanks to the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paliyawan, P., Choensawat, W. & Thawonmas, R. Mossar: motion segmentation by using splitting and remerging strategies. Multimed Tools Appl 77, 27761–27788 (2018). https://doi.org/10.1007/s11042-018-5965-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5965-x