Compact Video Description for Copy Detection with Precise Temporal Alignment

  • Matthijs Douze
  • Hervé Jégou
  • Cordelia Schmid
  • Patrick Pérez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


This paper introduces a very compact yet discriminative video description, which allows example-based search in a large number of frames corresponding to thousands of hours of video. Our description extracts one descriptor per indexed video frame by aggregating a set of local descriptors. These frame descriptors are encoded using a time-aware hierarchical indexing structure. A modified temporal Hough voting scheme is used to rank the retrieved database videos and estimate segments in them that match the query. If we use a dense temporal description of the videos, matched video segments are localized with excellent precision.

Experimental results on the Trecvid 2008 copy detection task and a set of 38000 videos from YouTube show that our method offers an excellent trade-off between search accuracy, efficiency and memory usage.


Video Frame Average Precision Interest Point Dynamic Time Warping Local Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Over, P., Awad, G., Rose, T., Fiscus, J., Kraaij, W., Smeaton, A.: Trecvid 2008- goals, tasks, data, evaluation mechanisms and metrics. In: Trecvid (2008)Google Scholar
  2. 2.
    Law-To, J., Chen, L., Joly, A., Laptev, I., Buisson, O., Gouet-Brunet, V., Boujemaa, N., Stentiford, F.: Video copy detection: a comparative study. In: CIVR, pp. 371–378. ACM, New York (2007)Google Scholar
  3. 3.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  4. 4.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1615–1630 (2005)CrossRefGoogle Scholar
  5. 5.
    Joly, A.: New local descriptors based on dissociated dipoles. In: CIVR (2007)Google Scholar
  6. 6.
    Douze, M., Jégou, H., Schmid, C.: An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Transactions on Multimedia 12, 257–266 (2010)CrossRefGoogle Scholar
  7. 7.
    Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)Google Scholar
  8. 8.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)Google Scholar
  9. 9.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV, 1470–1477 (2003)Google Scholar
  10. 10.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)Google Scholar
  11. 11.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60, 63–86 (2004)CrossRefGoogle Scholar
  12. 12.
    Heikkila, M., Pietikainen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recognition 42, 425–436 (2009)CrossRefGoogle Scholar
  13. 13.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. Computer Vision and Image Understanding 110, 346–359 (2008)CrossRefGoogle Scholar
  14. 14.
    Winder, S., Hua, G., Brown, M.: Picking the best Daisy. In: CVPR (2009)Google Scholar
  15. 15.
    Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Grzeszczuk, R., Girod, B.: Chog: Compressed histogram of gradients: A low bit-rate feature descriptor. In: CVPR (2009)Google Scholar
  16. 16.
    Calonder, M., Lepetit, V., Fua, P., Konolige, K., Bowman, J., Mihelich, P.: Compact signatures for high-speed interest point description and matching. In: ICCV (2009)Google Scholar
  17. 17.
    Perronnin, F., Liu, Y., Sanchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR (2010)Google Scholar
  18. 18.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)Google Scholar
  19. 19.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  20. 20.
    Yeh, M.C., Cheng, K.T.: Video copy detection by fast sequence matching. In: CIVR (2009)Google Scholar
  21. 21.
    Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matthijs Douze
    • 1
  • Hervé Jégou
    • 2
  • Cordelia Schmid
    • 1
  • Patrick Pérez
    • 3
  1. 1.INRIA GrenobleFrance
  2. 2.INRIA RennesFrance
  3. 3.Technicolor RennesFrance

Personalised recommendations