Spatiotemporal Similarity Search in 3D Motion Capture Gesture Streams

  • Christian BeecksEmail author
  • Marwan Hassani
  • Jennifer Hinnell
  • Daniel Schüller
  • Bela Brenger
  • Irene Mittelberg
  • Thomas Seidl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9239)


The question of how to model spatiotemporal similarity between gestures arising in 3D motion capture data streams is of major significance in currently ongoing research in the domain of human communication. While qualitative perceptual analyses of co-speech gestures, which are manual gestures emerging spontaneously and unconsciously during face-to-face conversation, are feasible in a small-to-moderate scale, these analyses are inapplicable to larger scenarios due to the lack of efficient query processing techniques for spatiotemporal similarity search. In order to support qualitative analyses of co-speech gestures, we propose and investigate a simple yet effective distance-based similarity model that leverages the spatial and temporal characteristics of co-speech gestures and enables similarity search in 3D motion capture data streams in a query-by-example manner. Experiments on real conversational 3D motion capture data evidence the appropriateness of the proposal in terms of accuracy and efficiency.


Similarity search Spatiotemporal data 3D motion capture data Streams Co-speech gestures Gesture matching distance Gesture signature Dynamic time warping 



This work is partially funded by the Excellence Initiative of the German federal and state governments and by DFG grant SE 1039/7-1.


  1. 1.
    Arici, T., Celebi, S., Aydin, A.S., Temiz, T.T.: Robust gesture recognition using feature pre-processing and weighted dynamic time warping. Multimedia Tools Appl. 72(3), 3045–3062 (2014)CrossRefGoogle Scholar
  2. 2.
    Beecks, C.: Distance-based similarity models for content-based multimedia retrieval. PhD thesis, RWTH Aachen University (2013)Google Scholar
  3. 3.
    Beecks, C., Kirchhoff, S., Seidl, T.: On stability of signature-based similarity measures for content-based image retrieval. Multimedia Tools Appl. 71(1), 349–362 (2014). doi: 10.1007/s11042-012-1334-3 CrossRefGoogle Scholar
  4. 4.
    Beecks, C., Kirchhoff, S., Seidl, T.: Signature matching distance for content-based image retrieval. In: Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 41–48 (2013)Google Scholar
  5. 5.
    Beecks, C., Uysal, M.S., Seidl, T.: A comparative study of similarity measures for content-based multimedia retrieval. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1552–1557 (2010)Google Scholar
  6. 6.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 438–445 (2010)Google Scholar
  7. 7.
    Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: AAAI 1994 workshop on knowledge discovery in databases, pp. 359–370 (1994)Google Scholar
  8. 8.
    Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) Human Motion 2007. LNCS, vol. 4814, pp. 285–298. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  9. 9.
    Bodiroža, S., Doisy, G., Hafner, V.V.: Position-invariant, real-time gesture recognition based on dynamic time warping. In: Proceedings of the International Conference on Human-robot Interaction, pp. 87–88 (2013)Google Scholar
  10. 10.
    Campbell, L.W.: Visual Classification of Co-verbal Gestures for Gesture Understanding. PhD thesis (2001)Google Scholar
  11. 11.
    Chen, L., Ng, R.: On the marriage of Lp-norms and edit distance. In: Proceedings of the International Conference on Very Large Data Bases, pp. 792–803 (2004)Google Scholar
  12. 12.
    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 491–502 (2005)Google Scholar
  13. 13.
    Cheng, J., Xie, C., Bian, W., Tao, D.: Feature fusion for 3D hand gesture recognition by learning a shared hidden space. Pattern Recogn. Lett. 33(4), 476–484 (2012)CrossRefGoogle Scholar
  14. 14.
    Cienki, A.: Cognitive linguistics: Spoken language and gesture as expressions of conceptualization. Body - Language - Communication: An International Handbook on Multimodality in Human Interaction, pp. 182–201 (2013)Google Scholar
  15. 15.
    Deza, M., Deza, E.: Encyclopedia of Distances. Springer, Heidelberg (2009) zbMATHCrossRefGoogle Scholar
  16. 16.
    Efron, D.: Gesture and Environment. Kings Crown Press, New York (1941) Google Scholar
  17. 17.
    Ekman, P., Friesen, W.: The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica 1(1), 49–98 (1969)CrossRefGoogle Scholar
  18. 18.
    Fang, S., Chan, H.: Human identification by quantifying similarity and dissimilarity in electrocardiogram phase space. Pattern Recogn. 42(9), 1824–1831 (2009)CrossRefGoogle Scholar
  19. 19.
    Hahn, M., Krüger, L., Wöhler, C.: 3D action recognition and long-term prediction of human motion. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 23–32. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  20. 20.
    Hasan, H., Abdul-Kareem, S.: Static hand gesture recognition using neural networks. Artif. Intell. Rev. 41(2), 147–181 (2014)CrossRefGoogle Scholar
  21. 21.
    Hassani, M., Beecks, C., Töws, D., Serbina, T., Haberstroh, M., Niemietz, P., Jeschke, S., Neumann, S., Seidl, T.: Sequential pattern mining of multimodal streams in the humanities. In: Proceedings of the Conference on Database Systems for Business, Technology, and Web, pp. 683–686 (2015)Google Scholar
  22. 22.
    Hassani, M., Seidl, T.: Towards a mobile health context prediction: Sequential pattern mining in multiple streams. In: Proceedings of the IEEE International Conference on Mobile Data Management, pp. 55–57 (2011)Google Scholar
  23. 23.
    Hausdorff, F.: Grundzüge der Mengenlehre. Von Veit (1914)Google Scholar
  24. 24.
    Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.: Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)CrossRefGoogle Scholar
  25. 25.
    Ibraheem, N.A., Khan, R.Z.: Article: survey on various gesture recognition technologies and techniques. Int. J. Comput. Appl. 50(7), 38–44 (2012)Google Scholar
  26. 26.
    Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975)CrossRefGoogle Scholar
  27. 27.
    Kendon, A.: Some relationships between body motion and speech. Stud. Dyadic Commun. 7, 177 (1972)CrossRefGoogle Scholar
  28. 28.
    Kendon, A.: Gesticulation and speech: two aspects of the process of utterance. The Relat. Verbal Nonverbal Commun. 25, 207–227 (1980)Google Scholar
  29. 29.
    Kendon, A.: Gesture: Visible action as utterance. Cambridge University Press (2004)Google Scholar
  30. 30.
    Keogh, E.J.: Exact indexing of dynamic time warping. In: Proceedings of the International Conference on Very Large Data Bases, pp. 406–417 (2002)Google Scholar
  31. 31.
    Keskin, C., Erkan, A., Akarun, L.: Real time hand tracking and 3d gesture recognition for interactive interfaces using hmm. ICANN/ICONIPP 26–29, 2003 (2003)Google Scholar
  32. 32.
    Khan, R.Z., Ibraheem, N.A.: Survey on gesture recognition for hand image postures. pp. 110–121 (2012)Google Scholar
  33. 33.
    Latecki, L.J., Megalooikonomou, V., Wang, Q., Lakaemper, R., Ratanamahatana, C.A., Keogh, E.: Elastic partial matching of time series. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 577–584 (2005)Google Scholar
  34. 34.
    LaViola, J.: A survey of hand posture and gesture recognition techniques and technology. Brown University, Providence, RI (1999)Google Scholar
  35. 35.
    Liu, J., Kavakli, M.: A survey of speech-hand gesture recognition for the development of multimodal interfaces in computer games. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1564–1569 (2010)Google Scholar
  36. 36.
    McNeill, D.: Hand and mind: What gestures reveal about thought. University of Chicago Press (1992)Google Scholar
  37. 37.
    Mitra, S., Acharya, T.: Gesture recognition: a survey. Trans. Sys. Man Cyber Part C 37(3), 311–324 (2007)CrossRefGoogle Scholar
  38. 38.
    Mittelberg, I.: Geometric and image-schematic patterns in gesture space. Equinox Publishing, pp. 351–388 (2010)Google Scholar
  39. 39.
    Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)zbMATHCrossRefGoogle Scholar
  40. 40.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)CrossRefGoogle Scholar
  41. 41.
    Müller, C.: Redebegleitende Gesten. Berliner Wissenschafts-Verlag, Kulturgeschichte - Theorie - Sprachvergleich (1998)Google Scholar
  42. 42.
    Müller, C., Cienki, A., Fricke, E., Ladewig, S.H., McNeill, D., Teßendorf, S.: Body - Language - Communication: An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science 38). De Gruyter Mouton, Berlin/ Boston (2013)Google Scholar
  43. 43.
    Müller, C., Posner, R.: The Semantics and Pragmatics of Everyday Gestures. Kultur. Weidler, Körper, Zeichen (2004)Google Scholar
  44. 44.
    Nam, Y., Wohn, K.: Recognition of hand gestures with 3D, nonlinear arm movement. Pattern Recogn. Lett. 18(1), 105–113 (1997)CrossRefGoogle Scholar
  45. 45.
    Park, B.G., Lee, K.M., Lee, S.U.: Color-based image retrieval using perceptually modified hausdorff distance. EURASIP J. Image Video Process. 2008, 4:1–4:10 (2008)Google Scholar
  46. 46.
    Psarrou, A., Gong, S., Walter, M.: Recognition of human gestures and behaviour based on motion trajectories. Image Vis. Comput. 20(5), 349–358 (2002)CrossRefGoogle Scholar
  47. 47.
    Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)CrossRefGoogle Scholar
  48. 48.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40(2), 99–121 (2000)zbMATHCrossRefGoogle Scholar
  49. 49.
    Ruffieux, S., Lalanne, D., Mugellini, E., Abou Khaled, O.: A survey of datasets for human gesture recognition. In: Kurosu, M. (ed.) HCI 2014, Part II. LNCS, vol. 8511, pp. 337–348. Springer, Heidelberg (2014) Google Scholar
  50. 50.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)zbMATHCrossRefGoogle Scholar
  51. 51.
    Stern, H., Shmueli, M., Berman, S.: Most discriminating segment-longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification. Pattern Recogn. Lett. 34(15), 1980–1989 (2013)zbMATHCrossRefGoogle Scholar
  52. 52.
    Suk, H.-I., Sin, B.-K., Lee, S.-W.: Recognizing hand gestures using dynamic bayesian network. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1–6 (2008)Google Scholar
  53. 53.
    Suk, H.-I., Sin, B.-K., Lee, S.-W.: Hand gesture recognition based on dynamic Bayesian network framework. Pattern Recogn. 43(9), 3059–3072 (2010)zbMATHCrossRefGoogle Scholar
  54. 54.
    Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 216–225 (2003)Google Scholar
  55. 55.
    Vlachos, M., Kollios, G., Gunopulos, D.: Elastic translation invariant matching of trajectories. Mach. Learn. 58(2–3), 301–334 (2005)zbMATHCrossRefGoogle Scholar
  56. 56.
    Watson, R.: A survey of gesture recognition techniques. Technical report,Trinity College Dublin, Department of Computer Science (1993)Google Scholar
  57. 57.
    Wu, Y., Huang, T.S.: Vision-based gesture recognition: a review. In: Braffort, A., Gibet, S., Teil, D., Gherbi, R., Richardson, J. (eds.) GW 1999. LNCS (LNAI), vol. 1739, pp. 103–115. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  58. 58.
    Yang, J., Li, Y., Wang, K.: A new descriptor for 3D trajectory recognition via modified CDTW. In: Proceedings of the IEEE International Conference on Automation and Logistics, pp. 37–42 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Christian Beecks
    • 1
    Email author
  • Marwan Hassani
    • 1
  • Jennifer Hinnell
    • 2
  • Daniel Schüller
    • 3
  • Bela Brenger
    • 3
  • Irene Mittelberg
    • 3
  • Thomas Seidl
    • 1
  1. 1.Data Management and Exploration GroupRWTH Aachen UniversityAachenGermany
  2. 2.Department of LinguisticsUniversity of AlbertaAlbertaCanada
  3. 3.Natural Media LabRWTH Aachen UniversityAachenGermany

Personalised recommendations