Unsupervised Temporal Commonality Discovery

  • Wen-Sheng Chu
  • Feng Zhou
  • Fernando De la Torre
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7575)


Unsupervised discovery of commonalities in images has recently attracted much interest due to the need to find correspondences in large amounts of visual data. A natural extension, and a relatively unexplored problem, is how to discover common semantic temporal patterns in videos. That is, given two or more videos, find the subsequences that contain similar visual content in an unsupervised manner. We call this problem Temporal Commonality Discovery (TCD). The naive exhaustive search approach to solve the TCD problem has a computational complexity quadratic with the length of each sequence, making it impractical for regular-length sequences. This paper proposes an efficient branch and bound (B&B) algorithm to tackle the TCD problem. We derive tight bounds for classical distances between temporal bag of words of two segments, including ℓ1, intersection and χ 2. Using these bounds the B&B algorithm can efficiently find the global optimal solution. Our algorithm is general, and it can be applied to any feature that has been quantified into histograms. Experiments on finding common facial actions in video and human actions in motion capture data demonstrate the benefits of our approach. To the best of our knowledge, this is the first work that addresses unsupervised discovery of common events in videos.


Temporal bag of words branch and bound temporal commonality discovery 


  1. 1.
  2. 2.
    An, S., Peursum, P., Liu, W., Venkatesh, S.: Efficient subwindow search with submodular score functions. In: CVPR (2011)Google Scholar
  3. 3.
    Balakrishnan, V., Boyd, S., Balemi, S.: Branch and bound algorithm for computing the minimum stability degree of parameter-dependent linear systems. International Journal of Robust and Nonlinear Control 1(4), 295–317 (1991)zbMATHCrossRefGoogle Scholar
  4. 4.
    Barbič, J., Safonova, A., Pan, J.Y., Faloutsos, C., Hodgins, J.K., Pollard, N.S.: Segmenting motion capture data into distinct behaviors. In: Proc. of Graphics Interface (2004)Google Scholar
  5. 5.
    Bartlett, M.S., Littlewort, G.C., Frank, M.G., Lainscsek, C., Fasel, I.R., Movellan, J.R.: Automatic recognition of facial actions in spontaneous expressions. Journal of Multimedia 1(6), 22–35 (2006)CrossRefGoogle Scholar
  6. 6.
    Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: ICCV (2005)Google Scholar
  7. 7.
    Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV (2011)Google Scholar
  8. 8.
    Chu, W.-S., Chen, C.-P., Chen, C.-S.: MOMI-Cosegmentation: Simultaneous Segmentation of Multiple Objects among Multiple Images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 355–368. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Everingham, M., Zisserman, A., Williams, C.I., Van Gool, L.: The PASCAL visual object classes challenge 2006 results. In: 2th PASCAL Challenge (2006)Google Scholar
  10. 10.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge Univ. Press (1997)Google Scholar
  11. 11.
    Han, D., Bo, L., Sminchisescu, C.: Selection and Context for Action Recognition. In: ICCV (2009)Google Scholar
  12. 12.
    Hoai, M., Zhong Lan, Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: CVPR (2011)Google Scholar
  13. 13.
    Lampert, C., Blaschko, M., Hofmann, T.: Efficient subwindow search: A branch and bound framework for object localization. PAMI 31(12), 2129–2142 (2009)CrossRefGoogle Scholar
  14. 14.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  15. 15.
    Liu, H., Yan, S.: Common visual pattern discovery via spatially coherent correspondences. In: CVPR (2010)Google Scholar
  16. 16.
    Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)Google Scholar
  17. 17.
    Maier, D.: The complexity of some problems on subsequences and supersequences. Journal of the ACM 25(2), 322–336 (1978)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Minnen, D., Isbell, C., Essa, I., Starner, T.: Discovering multivariate motifs using subsequence density estimation. In: AAAI (2007)Google Scholar
  19. 19.
    Mueen, A., Keogh, E.: Online discovery and maintenance of time series motifs. In: KDD (2010)Google Scholar
  20. 20.
    Mukherjee, L., Singh, V., Peng, J.: Scale invariant cosegmentation for image groups. In: CVPR (2011)Google Scholar
  21. 21.
    Paterson, M., Dančík, V.: Longest common subsequences. Mathematical Foundations of Computer Science 841, 127–142 (1994)Google Scholar
  22. 22.
    Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)Google Scholar
  23. 23.
    Schindler, G., Krishnamurthy, P., Lublinerman, R., Liu, Y., Dellaert, F.: Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In: CVPR (2008)Google Scholar
  24. 24.
    Scholkopf, B.: The kernel trick for distances. In: NIPS (2001)Google Scholar
  25. 25.
    Si, Z., Pei, M., Yao, B., Zhu, S.: Unsupervised learning of event and-or grammar and semantics from video. In: ICCV (2011)Google Scholar
  26. 26.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  27. 27.
    Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)Google Scholar
  28. 28.
    Turaga, P., Veeraraghavan, A., Chellappa, R.: Unsupervised view and rate invariant clustering of video sequences. CVIU 113(3), 353–371 (2009)Google Scholar
  29. 29.
    Viola, P., Jones, M.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)CrossRefGoogle Scholar
  30. 30.
    Wang, Y., Jiang, H., Drew, M.S., Li, Z., Mori, G.: Unsupervised discovery of action classes. In: CVPR (2006)Google Scholar
  31. 31.
    Wang, Y., Velipasalar, S.: Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence. In: ICASSP (2009)Google Scholar
  32. 32.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. PAMI 33(9), 1728–1743 (2011)CrossRefGoogle Scholar
  33. 33.
    Zhou, F., De la Torre, F., Cohn, J.F.: Unsupervised discovery of facial events. In: CVPR (2010)Google Scholar
  34. 34.
    Zhu, S., Mumford, D.: A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision 2(4), 259–362 (2006)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wen-Sheng Chu
    • 1
  • Feng Zhou
    • 1
  • Fernando De la Torre
    • 1
  1. 1.Robotics InstituteCarnegie Mellon UniversityUSA

Personalised recommendations