Advertisement

Automatic Labanotation Generation, Semi-automatic Semantic Annotation and Retrieval of Recorded Videos

  • Swati DewanEmail author
  • Shubham Agarwal
  • Navjyoti Singh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11279)

Abstract

Over the last decade, the volume of unannotated user-generated web content has skyrocketed but manually annotating data is costly in terms of time and resources. We leverage the advancements in Machine Learning to reduce these costs. We create a semantically searchable dance database with automatic annotation and retrieval. We use a pose estimation module to retrieve body pose and generate Labanotation over recorded videos. Though generic, it provides an essential application due to large amount of videos available online. Labanotation can be further exploited to generate ontology and is also very relevant for preservation and digitization of such resources. We also propose a semi-automatic annotation model which generates semantic annotations over any video archive using only 2–4 manually annotated clips. We experiment on two publicly available ballet datasets. High-level concepts such as ballet pose and steps are used to make the semantic library. These also act as descriptive meta-tags making the videos retrievable using a semantic text or video query.

Keywords

Searchable dance video library Labanotation Automatic annotation Semantic query retrieval 

References

  1. 1.
    Dewan, S., Agarwal, S., Singh, N.: Spatio-temporal Laban features for dance style recognition. In: ICPR, Beijing, China (2018)Google Scholar
  2. 2.
    Laban, R., Ullmann, L.: The Mastery of Movement. ERIC, Plays, Inc., Boston (1971)Google Scholar
  3. 3.
    Wang, Y., Mori, G.: Human action recognition by semi-latent topic models. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1762–1774 (2009)CrossRefGoogle Scholar
  4. 4.
    Sgouramani, E., Vatakis, A.: “Flash” dance: how speed modulates perceived duration in dancers and non-dancers. Acta Psychol. 147, 17–24 (2014)CrossRefGoogle Scholar
  5. 5.
    Vatakis, A., Sgouramani, E., Gorea, A., Hatzitaki, V., Pollick, F.E.: Time to act: new perspectives on embodiment and timing. Procedia - Soc. Behav. Sci. 126, 16–20 (2014)CrossRefGoogle Scholar
  6. 6.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  7. 7.
    Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: International Conference on Computer Vision (2011)Google Scholar
  8. 8.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  9. 9.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)Google Scholar
  10. 10.
    Samanta, S., Purkait, P., Chanda, B.: Indian classical dance classification by learning dance pose bases. In: 2012 IEEE Workshop on the Applications of Computer Vision (WACV), Breckenridge, CO, pp. 265–270 (2012)Google Scholar
  11. 11.
    Ma, C.-Y., Chen, M.-H., Kira, Z., AlRegib, G.: TS-LSTM and temporal-inception: exploiting spatio-temporal dynamics for activity recognition. CoRR, abs/1703.10667 (2017)Google Scholar
  12. 12.
    Aristidou, A., Stavrakis, E., Charalambous, P., Chrysanthou, Y., Himona, S.L.: Folk dance evaluation using Laban movement analysis. J. Comput. Cult. Herit. (JOCCH) 8(4), 20:1–20:19 (2015)Google Scholar
  13. 13.
    Aristidou, A., Chrysanthou, Y.: Motion indexing of different emotional states using LMA components. In: SIGGRAPH Asia 2013 Technical Briefs (SA 2013), pp. 21:1–21:4. ACM, New York (2013)Google Scholar
  14. 14.
    Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM—Semi-automatic CREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45810-7_32CrossRefGoogle Scholar
  15. 15.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012).  https://doi.org/10.1016/j.artint.2012.07.001MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Ballan, L., Bertini, M., Bimbo, A., Seidenari, L., Serra, G.: Event detection and recognition for semantic annotation of video. Multimed. Tools Appl. 51(1), 279–302 (2011).  https://doi.org/10.1007/s11042-010-0643-7CrossRefGoogle Scholar
  17. 17.
    Yildirim, Y., Yazici, A., Yilmaz, T.: Automatic semantic content extraction in videos using a fuzzy ontology and rule-based model. IEEE Trans. Knowl. Data Eng. 25(1), 47–61 (2013)CrossRefGoogle Scholar
  18. 18.
    Raheb, K.E.: Dance ontology: towards a searchable knowledge base. In: Workshop on Movement Qualities and Physical Models Visualization, IRCAM Centre Pompidou, Paris (2012)Google Scholar
  19. 19.
    El Raheb, K., Mailis, T., Ryzhikov, V., Papapetrou, N., Ioannidis, Y.: BalOnSe: temporal aspects of dance movement and its ontological representation. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 49–64. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58451-5_4CrossRefGoogle Scholar
  20. 20.
    Dewan, S., Agarwal, S., Singh, N.: Laban movement analysis to classify emotions from motion. In: ICMV, Vienna, Austria (2017)Google Scholar
  21. 21.
    Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_7CrossRefGoogle Scholar
  22. 22.
    Balakrishnan, R., Rajkumar, K.: Semi-automated annotation and retrieval of dance media objects. Cybern. Syst. 38(4), 349–379 (2007).  https://doi.org/10.1080/01969720701291189CrossRefGoogle Scholar
  23. 23.
    Choensawat, W., Nakamura, M., Hachimura, K.: GenLaban: a tool for generating Labanotation from motion capture data. Multimed. Tools Appl. 74, 10823 (2015).  https://doi.org/10.1007/s11042-014-2209-6CrossRefGoogle Scholar
  24. 24.
    Jalal, A., Kamal, S., Kim, D.: A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 14, 11735–11759 (2014)CrossRefGoogle Scholar
  25. 25.
    Jalal, A., Sarif, N., Kim, J.T., Kim, T.S.: Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ. 22, 271–279 (2013)CrossRefGoogle Scholar
  26. 26.
    Li, J., Allinson, N.: Building recognition using local oriented features. IEEE Trans. Industr. Inform. 9(3), 1697–1704 (2013)CrossRefGoogle Scholar
  27. 27.
    Jalal, A., Kamal, S., Kim, D.: Shape motion features approach for activity tracking and recognition from kinect video camera. In: IEEE 29th International Conference on Advanced Information Networking and Applications Workshops, Gwangju, pp. 445–450 (2015)Google Scholar
  28. 28.
    Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV (2016)CrossRefGoogle Scholar
  29. 29.
    F1eichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.International Institute of Information Technology, HyderabadHyderabadIndia

Personalised recommendations