Determining a structured spatio-temporal representation of video content for efficient visualization and indexing

  • Marc Gelgon
  • Patrick Bouthemy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1406)


Efficient access to information contained in video databases implies that a structured representation of the content of the video is built beforehand. This paper describes an approach in this direction, targeted at video indexing and browsing. Exploiting a 2D motion model estimator, we partition the video into shots, characterize camera motion, extract and track mobile objects. These steps rely on robust motion estimation, statistical tests and contextual statistical labeling. The content of each shot can then be viewed on a synoptic frame composed of a mosaic image of the background scene, on which trajectories of mobile objects are superimposed. The proposed method also provides instantaneous and long-term, qualitative and quantitative object motion cues for content-based indexing. Its different steps and the system they form are designed to keep computational cost low, while being able to cope with general video content was aimed at. We provide experimental results on real-world sequences. The structured output opens important possible extensions, for instance in the direction of higher-level interpretation.


Motion Model Camera Motion Mobile Object Mosaic Image Video Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    P. Aigrain and P. Joly.-The automatic real-time analysis of film editing and transition effects and its applications.-Computer & Graphics, 18(1):93–103, 1994.CrossRefGoogle Scholar
  2. 2.
    P. Aigrain, H.J. Zhang, and D. Petkovic.-Content-based representation and retrieval of visual media: a state-of-the-art review.-Multimedia Tools and Applications, 3(3): 179–202, November 1996.CrossRefGoogle Scholar
  3. 3.
    S. Ayer and H.S Sawhney.-Compact representations of videos through dominant and multiple motion estimation.-IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(8):814–830, August 1996.CrossRefGoogle Scholar
  4. 4.
    M. Basseville.-Detecting changes in signals and systems — a survey.-Automatica, 24(3):309–326, 1988.zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    J.S. Boreczky and L.A. Rowe.-Comparison of video shot boundary detection techniques.-In In I.K. Sethi and R.C. Jain, editors, Proceedings of IS-T/SPIE Conference on Storage and Retrieval for Image and Video Databases IV, Vol. SPIE 2670, pages 170–179, 1996.Google Scholar
  6. 6.
    P. Bouthemy and F. Ganansia.-Video partitioning and camera motion characterization for content-based video indexing.-In Proc. of 3rd IEEE Int. Conf. on Image Processing, volume I, pages 905–909, Lausanne, Sept 1996.Google Scholar
  7. 7.
    C. Castel, L. Chaudron, and C. Tessier.-What is going on? A high level interpretation of sequences of images.-In 4th European Conf. on Computer Vision,, Cambridge UK, April 1996.-LNCS 1065.Google Scholar
  8. 8.
    J.D. Courtney.-Automatic video indexing via object motion analysis.-Pattern Recognition, 30(4):607–625, April 1997.CrossRefGoogle Scholar
  9. 9.
    M. De Marsico, L. Cinque, and S Levialdi.-Indexing pictorial documents by their content: a survey of current techniques.-Image and Vision Computing, (15):119–141, 1997.Google Scholar
  10. 10.
    A. Del Bimbo, E. Vicario, and D. Zingoni.-Symbolic description and visual querying of image sequences using spatio-temporal logic.-IEEE Trans. on Knowledge and Data Engineering, 7(4):609–621, August 1995.CrossRefGoogle Scholar
  11. 11.
    M. Flickner et al.-Query by image and video content: the QBIC system.-IEEE Computer, pages 23–32, Sept. 1995.Google Scholar
  12. 12.
    E. FranÇois and P. Bouthemy.-Derivation of qualitative information in motion analysis.-Image and Vision Computing, 8(4):279–287, Nov. 1990.CrossRefGoogle Scholar
  13. 13.
    M. Gelgon and P. Bouthemy.-A region-level graph labeling approach to motionbased segmentation.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 514–519, Puerto-Rico, June 1997.Google Scholar
  14. 14.
    F. Idris and S. Panchanathan.-Review of image and video indexing techniques.-Jal of Visual Communication and Image Representation, 8(2):146–166, June 1997.CrossRefGoogle Scholar
  15. 15.
    M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu.-Efficient representations of video sequences and their applications.-Signal Processing: Image Communication, (8):327–351, 1996.Google Scholar
  16. 16.
    M. Irani, B. Rousso, and S. Peleg.-Detecting and tracking multiple moving objects using temporal integration.-In Proc. of Second European Conference on Computer Vision, pages 282–287, Santa Margherita Ligure, Italy, May 1992.Google Scholar
  17. 17.
    A. Nagasaka and Y. Tanaka.-Automatic video indexing and full-video search for objects appearances.-Visual Database Systems II, pages 113–127, 1992.-E. Knuth and L.M. Wegner (eds.), Elsevier Science Publ.Google Scholar
  18. 18.
    J.M Odobez and P. Bouthemy.-Robust multiresolution estimation of parametric motion models.-Jal of Visual Communication and Image Representation, 6(4):348–365, December 1995.CrossRefGoogle Scholar
  19. 19.
    N.V. Patel and I.K. Sethi.-Video shot detection and characterization for video databases.-Pattern Recognition, 30(4):607–625, April 1997.CrossRefGoogle Scholar
  20. 20.
    B. Rousso, S. Peleg, I. Finci, and A. Rav-Acha.-Universal mosaicing using pipe projection.-In Proc. of IEEE International Conf. on Computer Vision (ICCV'98), pages 945–952, Bombay, India, January 1999.Google Scholar
  21. 21.
    H. Sawhney and R. Kumar.-True multi image alignment and its application to mosaicing and lens distorsion correction.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 450–456, Puerto-Rico, June 1997.Google Scholar
  22. 22.
    C. Schmid and R. Mohr.-Combining greyvalue invariants with local constraints for object recognition.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 872–877, San Francisco, USA., June 1996.Google Scholar
  23. 23.
    M.A Smith and T. Kanade.-Video skimming and characterization through the combination of image and language understanding techniques.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 775–781, Puerto-Rico, June 1997.Google Scholar
  24. 24.
    C. Stiller.-Object-oriented estimation of dense motion fields.-IEEE Trans. on Image Processing, 6(2), February 1997.Google Scholar
  25. 25.
    J.Y.A Wang and E.H Adelson.-Representing moving images with layers.-IEEE Trans. on Image Processing, 3(5):625–638, September 1994.CrossRefGoogle Scholar
  26. 26.
    H.J Zhang, A. Kankanhalli, and S.W. Smoliar.-Automatic partitioning of fullmotion video.-Multimedia Systems, 1:10–28, 1993.CrossRefGoogle Scholar
  27. 27.
    H.J Zhang, J. Wu, D. Zhong, and S.W. Smoliar.-An integrated system for content-based video retrieval and browsing.-Pattern Recognition, 30(4):643–658, April 1997.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Marc Gelgon
    • 1
  • Patrick Bouthemy
    • 1
  1. 1.IRISA/INRIARennes cedexFrance

Personalised recommendations