Video Summaries through Mosaic-Based Shot and Scene Clustering

  • Aya Aner
  • John R. Kender
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2353)


We present an approach for compact video summaries that allows fast and direct access to video data. The video is segmented into shots and, in appropriate video genres, into scenes, using previously proposed methods. A new concept that supports the hierarchical representation of video is presented, and is based on physical setting and camera locations. We use mosaics to represent and cluster shots, and detect appropriate mosaics to represent scenes. In contrast to approaches to video indexing which are based on key-frames, our efficient mosaic-based scene representation allows fast clustering of scenes into physical settings, as well as further comparison of physical settings across videos. This enables us to detect plots of different episodes in situation comedies and serves as a basis for indexing whole video sequences. In sports videos where settings are not as well defined, our approach allows classifying shots for characteristic event detection. We use a novel method for mosaic comparison and create a highly compact non-temporal representation of video. This representation allows accurate comparison of scenes across different videos and serves as a basis for indexing video libraries.


Video Sequence Physical Setting Video Summarization Sport Video Video Summary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    A. Aner and J. R. Kender. A unified memory-based approach to cut, dissolve, key frame and scene analysis. In ICIP, 2001.Google Scholar
  2. 2.
    D. Arijon. Grammar of the Film Language. Silman-James Press, 1976.Google Scholar
  3. 3.
    M. Gelgon and P. Bouthemy. Comparison of automatic shot boundary detection algorithms. In ECCV, 1998.Google Scholar
  4. 4.
    R. C. Gonzalez and R. E. Woods. Digital Image Processing. Addison Wesley, 1993.Google Scholar
  5. 5.
    A. Hanjalic, R. L. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video retrieval systems. In IEEE Transactions on Circuits and Systems for Video Technology, volume 9, Jun. 1999.Google Scholar
  6. 6.
  7. 7.
    M. Irani and P. Anandan. Video indexing based on mosaic representations. In Proceedings of the IEEE, volume 86, 1998.Google Scholar
  8. 8.
    M. Irani, P. Anandan, J. Bergenand R. Kumar, and S. Hsu. Efficient representation of video sequences and their applications. In Signal processing: Image Communication, volume 8, 1996.Google Scholar
  9. 9.
    Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ, 1988.zbMATHGoogle Scholar
  10. 10.
    J. R. Kender and B.L. Yeo. Video scene segmentation via continuous video coherence. In CVPR, 1998.Google Scholar
  11. 11.
    R. Lienhart. Determining a structured spatio-temporal representation of video content for efficient visualisation and indexing. In SPIE Storage and Retrieval for Image and Video Databases VII, volume 3656, 1999.Google Scholar
  12. 12.
    S. Nepal, U. Srinivasan, and G. Reynolds. Automatic detection of’ goal’ segments in basketball videos. In ACM Multimedia, 2001.Google Scholar
  13. 13.
    J. Oh, K. A. Hua, and N. Liang. Scene change detection in a MPEG compressed video sequence. In In SPIE Multimedia Computing and Networking, Jan. 2000.Google Scholar
  14. 14.
    G. Salton and M. McGill. Introduction to modern information retrieval. New York: McGraw-Hill, 1983.zbMATHGoogle Scholar
  15. 15.
    F. Schaffalitzky and A. Zisserman. Viewpoint invariant texture matching and wide baseline stereo. In ICCV, 2001.Google Scholar
  16. 16.
    R. Szeliski and S. Heung-Yeung. Creating full-view panoramic image mosaics and environment maps. In SIGGRAPH, 1997.Google Scholar
  17. 17.
    N. Vasconcelos. A spatiotemporal motion model for video summarization. In CVPR, 1998.Google Scholar
  18. 18.
    M. Yeung and B. Liu. Efficient matching and clustering of video shots. In ICIP, 1995.Google Scholar
  19. 19.
    M. Yeung and B.L. Yeo. Time-constrained clustering for segmentation of video into story units. In ICPR, 1996.Google Scholar
  20. 20.
    A. Zomet, S. Peleg, and C. Arora. Rectified mosaicing: Mosaics without the curl. In CVPR, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Aya Aner
    • 1
  • John R. Kender
    • 1
  1. 1.Department of Computer ScienceColumbia University

Personalised recommendations