Video Summarization Based on Semantic Representation

  • Rafael Paulin Carlos
  • Kuniaki Uehara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1554)


Summarization of video data is of growing practical importance because the more expanding video databases, inundate users with vast amounts of video data, the more users need reduced versions which they can assimilate with limited effort in shorter browsing time. In recent days, many researchers have investigated summarizing techniques, such as fast-forward play back, and skipping video frames at fixed intervals of time. However, all these techniques are based on syntactic aspects of the video.

Another idea is to present summarized videos according its semantic representation. The critical aspect of compacting a video is context understanding, which is the key to choosing “significant scenes” that should be included in the summarized video. The goal of this work is to show the utility of semantic representation method for video summarization. We propose a method to extract significant scenes and create a summarized video without losing the content of the video’s story. The story is analyzed by its semantic content and is represented in a structured graph where each scene is represented by affect units.


Summarization video semantic representation skim video video content 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lehnert, W.G.: Affect Units and Narrative Summarization, Research Report #179, Department of Computer Science, Yale University (1980).Google Scholar
  2. 2.
    K. Uehara and N. Horiuchi: Semantic Representation of Video Content based on Prototype-Instance Model, Proceedings of International Symposium on Digital Media Information Base, pp. 167–175 (1997).Google Scholar
  3. 3.
    M. A. Smith and T. Kanade: Video Skimming and Characterization through the Combination of Image and Language Understanding, Technical report CMU-CS-95-186, Carnegie Mellon University (1995).Google Scholar
  4. 4.
    A. Hauptmann and M. A. Smith: Text, Speech, and Vision for Video Segmentation: The Informedia Project, AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision (1995).Google Scholar
  5. 5.
    M. Smith and T. Kanade: Video Skimming for Quick Browsing Based on Audio and Image Characterization, Technical report CMU-CS-97-111, Carnegie Mellon University (1997).Google Scholar
  6. 6.
    Myers, B.A., et al.: The Amulet Environment: New Models for Effective User Interface Software Development, Technical Report CMU-CS-96-189, School of Computer Science, Carnegie Mellon University (1996).Google Scholar
  7. 7.
    Marc Davis: Knowledge Representation for Video, Proceedings of the 12th National Conference on Artificial Intelligence, pp. 128–134 (1994).Google Scholar
  8. 8.
    M. Christel, D. Winkler and C. Taylor: Multimedia Abstractions for a Digital Video Library, Proceedings of the ACM Digital Libraries’ 97 Conference (1997).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Rafael Paulin Carlos
    • 1
  • Kuniaki Uehara
    • 2
  1. 1.Department of Computer & Systems EngineeringKobe UniversityJapan
  2. 2.Research Center for Urban Safety & SecurityKobe UniversityJapan

Personalised recommendations