Semantic Structures for Video Data Indexing

  • Koji Zettsu
  • Kuniaki Uehara
  • Katsumi Tanaka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1554)


Video indexing based on contents annotations can fully explore semantic information of video data. However, the most difficult and time-consuming process in annotation-based indexing is to identify appropriate video intervals for various semantic contents manually. Thus, automatic discovering video intervals from video data will be helpful for the indexing work. For this purpose, we propose “semantic structures” of video data and a mechanism for discovering semantic structures. The basic concept of our approach is to (1) discover consecutive sequences of shots from video data, each of which represents a consistent action or situation, and (2) index each of the discovered video intervals based on its semantics. A semantic structure is a collection of discovered video intervals that are classified into three categories: “unchanged” (i.e. actors or backgrounds are unchanged throughout the interval), “gradually changing” (i.e. actors or backgrounds are changing shot by shot) and “multiplexing” (i.e. individual actors or backgrounds are appearing by turns). The mechanism discovers these types of video intervals by comparing and contrasting similarity between each shot, and indexes each of discovered intervals by using indexing algorithms prepared for each type. We show how well our approach works for identifying video intervals with some experimental results.


Characteristic Vector Video Data Semantic Content Similarity Threshold Semantic Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Thomas, G., Smith, A. and Davenport, G.: The stratification system: A design environment for random access videoProc. of Workshop on Networking and operating System Support for Digital Audio and Video, ACM (1992).Google Scholar
  2. 2.
    Davenport, G., Thomas, G., Smith. A. and Pincever, N.: Cinematic primitives for multimedia. Proc. of IEEE Computer Graphics & Applications, pp.67–74 (1991).Google Scholar
  3. 3.
    Tonomura, Y.: Video handling based on structured information for hypermedia systems, Proc. of Intl. Conf. on Multimedia Information Systems, pp.333–344 (1991).Google Scholar
  4. 4.
    Weiss, R., Duda, A. and Gifford, D.: Content-Based Access to Algebraic Video, Proc. of IEEE Multimedia, pp.140–151 (1994).Google Scholar
  5. 5.
    Allen, J. F.: Maintaining Knowledge about Temporal Intervals, C. ACM, Vol.26, pp.832–843 (1983).zbMATHCrossRefGoogle Scholar
  6. 6.
    Davis, M.: Media Streams: An iconic visual language for video annotation, Proc. of IEEE Symposium on Visual Languages, pp.196–202 (1993).Google Scholar
  7. 7.
    Davis, M.: Knowledge representation for video, Proc. of Workshop on Indexing and reuse in Multimedia Systems, pp.19–28 (1994).Google Scholar
  8. 8.
    Smith, M. A. and Kanade, T.: Video Skimming for Quick Browsing based on Audio and Image Characterization, Tech-Report CMU-CS-95-186 (1995).Google Scholar
  9. 9.
    Hampapur, A., Jain, R. and Weymouth, T.: Digital video indexing in multimedia systems, Proc. of the Workshop on Indexing and Reuse in Multimedia Systems (1994).Google Scholar
  10. 10.
    Salton, G.: The SMART Retrieval System-Experiments in Automatic Document Processing. Prentice-Hall Inc, Englewood Cliffs: New Jersey (1971).Google Scholar
  11. 11.
    Lienhar, R.: Automatic Text Recognition for Video Indexing. Proc. of the 4th ACM Multimedia, pp.11–20 (1996).Google Scholar
  12. 12.
    Ariki, Y., Iwanari, E. and Motegi, Y.: Detection and Description of TV News Article, Proc. of the 47th FID, pp.198–202 (1994).Google Scholar
  13. 13.
    Boreczky, J.,, S. and Rowe, L. A.: A comparison of Video Shot Boundary Detection Techniques, Strage & Retrieval for Image and Video Databases IV, Proc. of SPIE 2670, pp.170–179 (1996).Google Scholar
  14. 14.
    Zhang, H.,, J., Kankanhalli, A. and Stephen, W. S.: Automatic parsing of fullmotion video, Multimedia Systems, 1:10–28, July (1993).Google Scholar
  15. 15.
    Tanizawa, K.: Video Clustering and Scene Detection based on Visual Information, Graduation thesis, Kobe University (1998).Google Scholar
  16. 16.
    Schank, R.: Dynamic Memory, Cambridge University Press: Cambridge (1982).Google Scholar
  17. 17.
    Arijon, D.: Grammar of the Film Language, Silman-James Press (1991).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Koji Zettsu
    • 1
  • Kuniaki Uehara
    • 2
  • Katsumi Tanaka
    • 3
  1. 1.Kobe Research CenterTelecommunications Advancement Organization of JapanChuo KobeJapan
  2. 2.Research Center for Urban Safety and SecurityKobe UniversityJapan
  3. 3.Graduate School of Science and TechnologyKobe UniversityJapan

Personalised recommendations