Abstract
Over the last decade, the volume of unannotated user-generated web content has skyrocketed but manually annotating data is costly in terms of time and resources. We leverage the advancements in Machine Learning to reduce these costs. We create a semantically searchable dance database with automatic annotation and retrieval. We use a pose estimation module to retrieve body pose and generate Labanotation over recorded videos. Though generic, it provides an essential application due to large amount of videos available online. Labanotation can be further exploited to generate ontology and is also very relevant for preservation and digitization of such resources. We also propose a semi-automatic annotation model which generates semantic annotations over any video archive using only 2–4 manually annotated clips. We experiment on two publicly available ballet datasets. High-level concepts such as ballet pose and steps are used to make the semantic library. These also act as descriptive meta-tags making the videos retrievable using a semantic text or video query.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
[https://shubhamagarwalwork.wixsite.com/dancelib] At present, this web-page is a prototype and does not support video queries but that will be extended soon with more data.
- 2.
References
Dewan, S., Agarwal, S., Singh, N.: Spatio-temporal Laban features for dance style recognition. In: ICPR, Beijing, China (2018)
Laban, R., Ullmann, L.: The Mastery of Movement. ERIC, Plays, Inc., Boston (1971)
Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1762–1774 (2009)
Sgouramani, E., Vatakis, A.: “Flash” dance: how speed modulates perceived duration in dancers and non-dancers. Acta Psychol. 147, 17–24 (2014)
Vatakis, A., Sgouramani, E., Gorea, A., Hatzitaki, V., Pollick, F.E.: Time to act: new perspectives on embodiment and timing. Procedia - Soc. Behav. Sci. 126, 16–20 (2014)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: International Conference on Computer Vision (2011)
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)
Samanta, S., Purkait, P., Chanda, B.: Indian classical dance classification by learning dance pose bases. In: 2012 IEEE Workshop on the Applications of Computer Vision (WACV), Breckenridge, CO, pp. 265–270 (2012)
Ma, C.-Y., Chen, M.-H., Kira, Z., AlRegib, G.: TS-LSTM and temporal-inception: exploiting spatio-temporal dynamics for activity recognition. CoRR, abs/1703.10667 (2017)
Aristidou, A., Stavrakis, E., Charalambous, P., Chrysanthou, Y., Himona, S.L.: Folk dance evaluation using Laban movement analysis. J. Comput. Cult. Herit. (JOCCH) 8(4), 20:1–20:19 (2015)
Aristidou, A., Chrysanthou, Y.: Motion indexing of different emotional states using LMA components. In: SIGGRAPH Asia 2013 Technical Briefs (SA 2013), pp. 21:1–21:4. ACM, New York (2013)
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM—Semi-automatic CREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45810-7_32
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012). https://doi.org/10.1016/j.artint.2012.07.001
Ballan, L., Bertini, M., Bimbo, A., Seidenari, L., Serra, G.: Event detection and recognition for semantic annotation of video. Multimed. Tools Appl. 51(1), 279–302 (2011). https://doi.org/10.1007/s11042-010-0643-7
Yildirim, Y., Yazici, A., Yilmaz, T.: Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model. IEEE Trans. Knowl. Data Eng. 25(1), 47–61 (2013)
Raheb, K.E.: Dance ontology: towards a searchable knowledge base. In: Workshop on Movement Qualities and Physical Models Visualization, IRCAM Centre Pompidou, Paris (2012)
El Raheb, K., Mailis, T., Ryzhikov, V., Papapetrou, N., Ioannidis, Y.: BalOnSe: temporal aspects of dance movement and its ontological representation. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 49–64. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58451-5_4
Dewan, S., Agarwal, S., Singh, N.: Laban movement analysis to classify emotions from motion. In: ICMV, Vienna, Austria (2017)
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7
Balakrishnan, R., Rajkumar, K.: Semi-automated annotation and retrieval of dance media objects. Cybern. Syst. 38(4), 349–379 (2007). https://doi.org/10.1080/01969720701291189
Choensawat, W., Nakamura, M., Hachimura, K.: GenLaban: a tool for generating Labanotation from motion capture data. Multimed. Tools Appl. 74, 10823 (2015). https://doi.org/10.1007/s11042-014-2209-6
Jalal, A., Kamal, S., Kim, D.: A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14, 11735–11759 (2014)
Jalal, A., Sarif, N., Kim, J.T., Kim, T.S.: Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ. 22, 271–279 (2013)
Li, J., Allinson, N.: Building recognition using local oriented features. IEEE Trans. Industr. Inform. 9(3), 1697–1704 (2013)
Jalal, A., Kamal, S., Kim, D.: Shape motion features approach for activity tracking and recognition from kinect video camera. In: IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, Gwangju, pp. 445–450 (2015)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV (2016)
F1eichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Dewan, S., Agarwal, S., Singh, N. (2018). Automatic Labanotation Generation, Semi-automatic Semantic Annotation and Retrieval of Recorded Videos. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-04257-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04256-1
Online ISBN: 978-3-030-04257-8
eBook Packages: Computer ScienceComputer Science (R0)