Multimedia Tools and Applications

, Volume 13, Issue 3, pp 255–284 | Cite as

Tools for Browsing a TV Situation Comedy Based on Content Specific Attributes

  • Joshua S. Wachman
  • Rosalind W. Picard


This paper presents general purpose video analysis and annotation tools, which combine high-level and low-level information, and which learn through user interaction and feedback. The use of these tools is illustrated through the construction of two video browsers, which allow a user to fast forward (or rewind) to frames, shots, or scenes containing a particular character, characters, or other labeled content. The two browsers developed in this work are: (1) a basic video browser, which exploits relations between high-level scripting information and closed captions, and (2) an advanced video browser, which augments the basic browser with annotations gained from applying machine learning. The learner helps the system adapt to different peoples' labelings by accepting positive and negative examples of labeled content from a user, and relating these to low-level color and texture features extracted from the digitized video. This learning happens interactively, and is used to infer labels on data the user has not yet seen. The labeled data may then be browsed or retrieved from the database in real time.An evaluation of the learning performance shows that a combination of low-level color signal features outperforms several other combinations of signal features in learning character labels in an episode of the TV situation comedy, Seinfeld. We discuss several issues that arise in the combination of low-level and high-level information, and illustrate solutions to these issues within the context of browsing television sitcoms.

computer assisted learning video pattern recognition video annotation Society of Models FourEyes content-based retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    B. Astle, “Video database indexing and method of presenting video database index to a user,” US Patent Office 5,485,611, January 1996. Assigned to Intel Corporation of Santa Clara CA, Filed Dec. 30, 1994.Google Scholar
  2. 2.
    V.M. Bove Jr., “Personalcasting: Interactive local augmentation of television programming,” Master's thesis, MIT, 1983.Google Scholar
  3. 3.
    S. Chang, J. Smith, and H. Wang, “Automatic feature extraction and indexing for content based visual query,” Technical Report 408-95-14, Columbia University, New York, NY, 1991.Google Scholar
  4. 4.
    S.S. Intille and A.F. Bobick, “Visual tracking using closed-worlds,” Technical Report 294, MIT Media Laboratory Perceptual Computing, 20 Ames Street, Cambridge, MA 02139, 1994.Google Scholar
  5. 5.
    A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Prentice Hall: Englewood Cliffs, NJ, 1988.Google Scholar
  6. 6.
    K. Karahalios, “Salient movies,” Master's thesis, MIT, Cambridge, MA 02139, 1995.Google Scholar
  7. 7.
    B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Image Understanding Workshop, April 1981, pp. 121–130.Google Scholar
  8. 8.
    J. Meng and S.F. Chang, “Tools for compressed-domain video indexing and editing,” IS&T/SPIE Symposium on Electronic Imaging: Science and Technology—Storage & Retrieval for Image and Video Database IV, 2670, Feb. 1996.Google Scholar
  9. 9.
    T.P. Minka, “An image database browser that learns from user interaction,” Master's thesis, MIT, Cambridge, MA 02139, Feb. 1996. Also appears as MIT Media Lab Perceptual Computing Section Technical Report #365.Google Scholar
  10. 10.
    T.P. Minka and R.W. Picard, “Interactive learning using a 'society of models',” Special Issue of Pattern Recognition on Image Databases Classification and Retrieval, 1995. Also appears as MIT Media Lab Perceptual Computing Section Technical Report #349.Google Scholar
  11. 11.
    Y.I. Ohta, T. Kanade, and T. Sakai, “Color information for region segmentation,” Computer Graphics and Image Processing, Vol. 13, pp. 222–241, 1980.Google Scholar
  12. 12.
    R.W. Picard and T.P. Minka, “Vision texture for annotation,” ACM/Springer-Verlag Journal of Multimedia Systems, Vol. 3, pp. 3–15, 1995. Also appears as MIT Media Laboratory Perceptual Computing Section Technical Report #302.Google Scholar
  13. 13.
    B. Salt, Film Style and Technology and Analysis, Starwood, London, 1983.Google Scholar
  14. 14.
    R.K. Srihari, R. Chopra, D. Burhans, M. Venkatraman, and V. Govindaraju, “Use of collateral text in image interpretation,” in CEDAR Proceedings of The Image UnderstandingWorkshop, Vol. II, Monterey, CA, Nov. 13–16, 1994, pp. 897–905. ARPA Software and Intelligent Systems Technology Office.Google Scholar
  15. 15.
    R.K. Srihari, “Linguistic context in vision,” in Proceedings, MIT, Nov. 10–12, 1995, pp. 78–88. MIT AAA-I Fall Symposium Series Computational Models for Integrating Language and Vision.Google Scholar
  16. 16.
    M. Swain and D. Ballard, “Color indexing,” International Journal of ComputerVision,Vol. 7, No. 1, pp. 11–32, 1991.Google Scholar
  17. 17.
    A. Tversky, “Features of similarity,” Psychological Review, Vol. 84, No. 4, pp. 327–352, July 1977.Google Scholar
  18. 18.
    J.Y.A. Wang and E.A. Adelson, “Layered representation for motion analysis,” in Proceedings of the Computer Vision and Pattern Recognition Conference, June 1993. Also appears asMITMedia Lab Perceptual Computing Section Technical Report #221.Google Scholar
  19. 19.
    M.M. Yeung and B. Liu, “Efficient matching and clustering of video shots,” in Proceedings IEEE International Conference on Image Processing Vol. 1.,Washington, D.C., Oct. 23–26, 1995. Princeton University, pp. 338-341.Google Scholar
  20. 20.
    H. Zhang, A. Kankanhalli, and S. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Systems, Vol. 1, pp. 10–28, 1993.Google Scholar
  21. 21.
    H. Zhang, C.Y. Low, and S. Smoliar, “Video parsing and browsing using compressed data,” Journal of Multimedia Tools and Applications, Vol. 1, No. 1, pp. 89–111, March 1995.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Joshua S. Wachman
    • 1
  • Rosalind W. Picard
    • 1
  1. 1.The Perceptual Computing Group of the MIT Media LaboratoryCambridgeUSA

Personalised recommendations