High-Level Feature Detection from Video in TRECVid: A 5-Year Retrospective of Achievements
Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one that determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically, however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarize the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video.
KeywordsSearch Task Feature Detection Average Precision Semantic Feature Automatic Speech Recognition
Unable to display preview. Download preview PDF.
- 1.Face Recognition Grand Challenge. URL:www.frvt.org/FRGC, 2006.
- 2.AMI: Augmented Multi-Person Interaction. URL:www.amiproject.org/, Last checked 9 September 2007.
- 3.ETISEO: Video Understanding Evaluation. URL:www.silogic.fr/etiseo/, Last checked 9 September 2007.
- 4.The Internet Archive Movie Archive home page, Last checked 14 September 2007.Google Scholar
- 5.LSCOM Lexicon Definitions and Annotations. URL:www.ee.columbia.edu/dvmm/lscom, Last checked 14 September 2007.
- 6.PETS: Performance Evaluation of Tracking and Surveillance. URL:www.cvg.cs.rdg.ac.uk/slides/pets.html, Last checked 9 September 2007.
- 7.M. G. Christel and A. G. Hauptmann. The Use and Utility of High-Level Semantic Features in Video Retrieval. In Proceedings of the International Conference on Video Retrieval, pp. 134–144, Singapore, 20–22 July 2005.Google Scholar
- 8.A. Hauptman. How many high-level concepts will fill the semantic gap in video retrieval? In proceedings of the ACM International Conference on Image and Video Retrieval, 2007.Google Scholar
- 9.P. Joly, J. Benois-Pineau, E. Kijak, and G. Quénot. The ARGOS campaign: Evaluation of video analysis and indexing tools. Image Communication, 22(7–8):705–717, 2007.Google Scholar
- 10.W. Kraaij, A. F. Smeaton, P. Over, and J. Arlandis. TRECVID 2004–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2004), Gaithersburg, MD, November 2004.Google Scholar
- 11.C.-Y. Lin, B. L. Tseng, and J. R. Smith. Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. Proceedings of the TRECVID 2003 Workshop, 2003.Google Scholar
- 12.A. Loui, J. Luo, S.-F. Chang, D. Ellis, W. Jiang, L. Kennedy, K. Lee, and A. Yanagawa. Kodak’s consumer video benchmark data set: concept definition and annotation. In MIR ’07: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 245–254, New York, NY, USA, 2007. ACM Press.Google Scholar
- 14.M. R. Naphade and J. R. Smith. On the Detection of Semantic Concepts at TRECVID. In MULTIMEDIA’04: Proceedings of the 12th ACM International Conference on Multimedia, pp. 660–667, New York, NY, USA, 10–16 October 2004.Google Scholar
- 15.A. P. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In MULTIMEDIA ’07: Proceedings of the 15th International Conference on Multimedia, pp. 991–1000, New York, NY, USA, 2007. ACM Press.Google Scholar
- 16.P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. TRECVID 2005–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2005), Gaithersburg, MD, November 2005.Google Scholar
- 17.P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. TRECVID 2006–-An Overview. In Proceedings of the TRECVID Workshop (TRECVID 2006), Gaithersburg, MD, November 2006.Google Scholar
- 18.A. F. Smeaton, W. Kraaij, and P. Over. TRECVid 2003: An overview. In TREC2003: Proceedings of the TREC Workshop (TREC 2003), Gaithersburg, MD, November 2003.Google Scholar
- 19.A. F. Smeaton and P. Over. The TREC-2002 video track report. In TREC2002: Proceedings of the TREC Workshop (TREC 2002), Gaithersburg, MD, November 2002.Google Scholar
- 21.C. G. Snoek and M. Worring. Are concept detector lexicons effective for video search? In Proceedings of the IEEE International Conference on Multimedia & Expo, pp. 1966–1969, 2007.Google Scholar
- 23.C. G. M. Snoek, M. Worring, J. C. van Gemert, J.-M. Geusebroek, and A. W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia, pp. 421–430, New York, NY, USA, 2006. ACM Press.Google Scholar
- 24.T. Volkmer, J. R. Smith, and A. P. Natsev. A web-based system for collaborative annotation of large image and video collections: an evaluation and user study. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 892–901, New York, NY, USA, 2005. ACM Press.Google Scholar
- 25.E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 102–111, New York, NY, USA, 2006. ACM Press.Google Scholar