High-Level Feature Detection from Video in TRECVid: A 5-Year Retrospective of Achievements

Part of the Signals and Communication Technology book series (SCT)


Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one that determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically, however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarize the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video.


Search Task Feature Detection Average Precision Semantic Feature Automatic Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Face Recognition Grand Challenge. URL:www.frvt.org/FRGC, 2006.
  2. 2.
    AMI: Augmented Multi-Person Interaction. URL:www.amiproject.org/, Last checked 9 September 2007.
  3. 3.
    ETISEO: Video Understanding Evaluation. URL:www.silogic.fr/etiseo/, Last checked 9 September 2007.
  4. 4.
    The Internet Archive Movie Archive home page, Last checked 14 September 2007.Google Scholar
  5. 5.
    LSCOM Lexicon Definitions and Annotations. URL:www.ee.columbia.edu/dvmm/lscom, Last checked 14 September 2007.
  6. 6.
    PETS: Performance Evaluation of Tracking and Surveillance. URL:www.cvg.cs.rdg.ac.uk/slides/pets.html, Last checked 9 September 2007.
  7. 7.
    M. G. Christel and A. G. Hauptmann. The Use and Utility of High-Level Semantic Features in Video Retrieval. In Proceedings of the International Conference on Video Retrieval, pp. 134–144, Singapore, 20–22 July 2005.Google Scholar
  8. 8.
    A. Hauptman. How many high-level concepts will fill the semantic gap in video retrieval? In proceedings of the ACM International Conference on Image and Video Retrieval, 2007.Google Scholar
  9. 9.
    P. Joly, J. Benois-Pineau, E. Kijak, and G. Quénot. The ARGOS campaign: Evaluation of video analysis and indexing tools. Image Communication, 22(7–8):705–717, 2007.Google Scholar
  10. 10.
    W. Kraaij, A. F. Smeaton, P. Over, and J. Arlandis. TRECVID 2004–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2004), Gaithersburg, MD, November 2004.Google Scholar
  11. 11.
    C.-Y. Lin, B. L. Tseng, and J. R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. Proceedings of the TRECVID 2003 Workshop, 2003.Google Scholar
  12. 12.
    A. Loui, J. Luo, S.-F. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa. Kodak’s consumer video benchmark data set: concept definition and annotation. In MIR ’07: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 245–254, New York, NY, USA, 2007. ACM Press.Google Scholar
  13. 13.
    M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. G. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE MultiMedia Magazine, 13(3):86–91, 2006.CrossRefGoogle Scholar
  14. 14.
    M. R. Naphade and J. R. Smith. On the Detection of Semantic Concepts at TRECVID. In MULTIMEDIA’04: Proceedings of the 12th ACM International Conference on Multimedia, pp. 660–667, New York, NY, USA, 10–16 October 2004.Google Scholar
  15. 15.
    A. P. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In MULTIMEDIA ’07: Proceedings of the 15th International Conference on Multimedia, pp. 991–1000, New York, NY, USA, 2007. ACM Press.Google Scholar
  16. 16.
    P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. TRECVID 2005–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2005), Gaithersburg, MD, November 2005.Google Scholar
  17. 17.
    P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. TRECVID 2006–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2006), Gaithersburg, MD, November 2006.Google Scholar
  18. 18.
    A. F. Smeaton, W. Kraaij, and P. Over. TRECVid 2003: An overview. In TREC2003: Proceedings of the TREC Workshop (TREC 2003), Gaithersburg, MD, November 2003.Google Scholar
  19. 19.
    A. F. Smeaton and P. Over. The TREC-2002 video track report. In TREC2002: Proceedings of the TREC Workshop (TREC 2002), Gaithersburg, MD, November 2002.Google Scholar
  20. 20.
    A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content based image retrieval at the end of the early years. IEEE Transactions on Pattern Recognition and Machine Intelligence, 22(12):1349–1380, 2000.CrossRefGoogle Scholar
  21. 21.
    C. G. Snoek and M. Worring. Are concept detector lexicons effective for video search? In Proceedings of the IEEE International Conference on Multimedia & Expo, pp. 1966–1969, 2007.Google Scholar
  22. 22.
    C. G. Snoek, M. Worring, J.-M. Geusebroek, D. C. Koelma, F. J. Seinstra, and A. Smeulders. The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions, PAMI, 28(10):1678–1689, 2006.CrossRefGoogle Scholar
  23. 23.
    C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia, pp. 421–430, New York, NY, USA, 2006. ACM Press.Google Scholar
  24. 24.
    T. Volkmer, J. R. Smith, and A. P. Natsev. A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 892–901, New York, NY, USA, 2005. ACM Press.Google Scholar
  25. 25.
    E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 102–111, New York, NY, USA, 2006. ACM Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Dublin City UniversityIreland

Personalised recommendations