Skip to main content

Video Retrieval Based on Uncertain Concept Detection Using Dempster–Shafer Theory

  • Chapter
  • First Online:
Multimedia Data Mining and Analytics

Abstract

For a long time, it was difficult to automatically extract meanings from video shots, because, even for a particular meaning, shots are characterized by signifincantly different visual appearances, depending on camera techniques and shooting environments. One promising approach for this has been recently devised where a large amount of shots are statistically analyzed to cover diverse visual appearances for a meaning. Inspired by the significant performance improvement, concept-based video retrieval receives much research attention. Here, concepts are abstracted names of meanings that humans can perceive from shots, like objects, actions, events, and scenes. For each concept, a detector is built in advance by analyzing a large amount of shots. Then, given a query, shots are retrieved based on concept detection results. Since each detector can detect a concept robustly to diverse visual appearances, effective retrieval can be achieved using concept detection results as “intermediate” features. However, despite the recent improvement, it is still difficult to accurately detect any kind of concept. In addition, shots can be taken by arbitrary camera techniques and in arbitrary shooting environments, which unboundedly increases the diversity of visual appearances. Thus, it cannot be expected to detect concepts with an accuracy of \(100\,\%\). This chapter explores how to utilize such uncertain detection results to improve concept-based video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since the purpose of this section is to provide an overview of concept-based retrieval, Fig. 12.4 only presents generalization/specialization relations. Please refer to [12] for other relations (e.g., part-of, attribute-of, and co-occurrence) and our approach for organizing LSCOM concepts.

  2. 2.

    Since the search task has been stopped at TRECVID 2009, videos of this year are the latest ones where the retrieval performance using example shots can be evaluated.

References

  1. Petkovic M, Jonker W (2002) Content-based video retrieval: a database perspective. Kluwer Academic Publishers, Boston

    Google Scholar 

  2. Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  3. Djordjevic D, Izquierdo E, Grzegorzek M (2007) User driven systems to bridge the semantic gap. In: Proceedings of the EUSIPCO 2007, pp 718–722

    Google Scholar 

  4. Staab S, Scherp A, Arndt R, Troncy R, Grzegorzek M, Saathoff C, Schenk S, Hardman L (2008) Semantic multimedia. In: Baroglio C, Bonatti PA, Małuszyński J, Polleres A, Schaffert S (eds) Reasoning Web. LNCS.Springer, Berlin

    Google Scholar 

  5. Naphade MR, Smith JR (2004) On the detection of semantic concepts at TRECVID. In: Proceedings of the MM 2004, pp 660–667

    Google Scholar 

  6. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322

    Article  Google Scholar 

  7. Li X, Wang D, Li J, Zhang B (2007) Video search in concept subspace: a text-like paradigm. In: Proceedings of the CIVR 2007, pp 603–610

    Google Scholar 

  8. Natsev AP, Haubold A, Tešić J, Xie L, Yan R (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of the MM 2007, pp 991–1000

    Google Scholar 

  9. Ngo C et al (2009) VIREO/DVMM at TRECVID 2009: high-level feature extraction, automatic video search and content-based copy detection. In: Proceedings of the TRECVID 2009, pp 415–432

    Google Scholar 

  10. Wei XY, Jiang YG, Ngo CW (2011) Concept-driven multi-modality fusion for video search. IEEE Trans Circuits Syst Video Technol 21(1):62–73

    Article  Google Scholar 

  11. Naphade M, Smith J, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91

    Article  Google Scholar 

  12. Shirahama K, Uehara K (2011) Constructing and utilizing video ontology for accurate and fast retrieval. Int J Multimed Data Eng Manag (IJMDEM) 2(4):59–75

    Article  Google Scholar 

  13. Zhu S, Wei X, Ngo C (2013) Error recovered hierarchical classification. In: Proceedings of the MM 2013, pp 697–700

    Google Scholar 

  14. Hauptmann A, Yan R, Lin WH, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans Multimed 9(5):958–966

    Article  Google Scholar 

  15. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the CVPR 2009, pp 248–255

    Google Scholar 

  16. Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the CHI 2008, pp 453–456

    Google Scholar 

  17. Ayache S, Qu\(\acute{\text{ e }}\)not G (2008) Video corpus annotation using active learning. In: Proceedings of the ECIR 2008, pp 187–198

    Google Scholar 

  18. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1–2):43–72

    Article  Google Scholar 

  19. Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the ICCV 1999, pp 1150–1157

    Google Scholar 

  20. Bay H, Tuytelaars T, Gool L (2006) SURF: speeded up robust features. In: Proceedings of the ECCV 2006, pp 404–417

    Google Scholar 

  21. van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596

    Article  Google Scholar 

  22. Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV 2004 SLCV, pp 1–22

    Google Scholar 

  23. Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14(4):1196–1205

    Article  Google Scholar 

  24. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the CVPR 2007, pp 1–8

    Google Scholar 

  25. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  26. Lin HT, Lin CJ, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276

    Google Scholar 

  27. Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the MIR 2006, pp 321–330

    Google Scholar 

  28. The PASCAL Visual Object Classes Homepage. http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  29. ImageNet Large Scale Visual Recognition Competition (2013) (ILSVRC2013). http://www.image-net.org/challenges/LSVRC/2013/

  30. Shirahama K, Uehara K (2012) Kobe university and Muroran institute of technology at TRECVID 2012 semantic indexing task. In: Proceedings of the TRECVID 2012, pp 239–247

    Google Scholar 

  31. Snoek CGM et al (2009) The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of the TRECVID 2009, pp 226–238

    Google Scholar 

  32. Natsev AP, Naphade MR, Tešić J (2005) Learning the semantics of multimedia queries and concepts from a small number of examples. In: Proceedings of the MM 2005, pp 598–607

    Google Scholar 

  33. Rasiwasia N, Moreno P, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multimed 9(5):923–938

    Article  Google Scholar 

  34. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    MATH  Google Scholar 

  35. Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119–130

    Article  Google Scholar 

  36. Kanamori T, Hido S, Sugiyama M (2009) A least-squares approach to direct importance estimation. J Mach Learn Res 10(7):1391–1445

    MATH  MathSciNet  Google Scholar 

  37. He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  38. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the ECCV 2006, pp 490–503

    Google Scholar 

  39. Snoek CGM, Worring M, Geusebroek JM, Koelma D, Seinstra F (2005) On the surplus value of semantic video analysis beyond the key frame. In: Proceedings of the ICME 2005, pp 386–389

    Google Scholar 

  40. Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the CVPR 2011, pp 3169–3176

    Google Scholar 

  41. Peng Y et al (2009) PKU-ICST at TRECVID 2009: high level feature extraction and search. In: Proceedings of the TRECVID 2009

    Google Scholar 

  42. Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623

    Article  Google Scholar 

  43. Bi J, Zhang T (2005) Support vector classification with input data uncertainty. In: Proceedings of the NIPS 2004, pp 161–168

    Google Scholar 

  44. Kriegel HP, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the KDD 2005, pp 672–677

    Google Scholar 

  45. Wang H, McClean S (2008) Deriving evidence theoretical functions in multivariate data spaces: a systematic approach. IEEE Trans Syst Man Cybern B Cybern 38(2):455–465

    Article  Google Scholar 

  46. Aregui A, Denoeux T (2008) Constructing consonant belief functions from sample data using confidence sets of pignistic probabilities. Int J Approx Reason 49(3):575–594

    Article  MATH  MathSciNet  Google Scholar 

  47. Zribi M (2003) Parametric estimation of Dempster-Shafer belief functions. In: Proceedings of the ISIF 2003, pp 485–491

    Google Scholar 

  48. Benmokhtar R, Huet B (2008) Perplexity-based evidential neural network classifier fusion using MPEG-7 low-level visual features. In: Proceedings of the MIR 2008, pp 336–341

    Google Scholar 

  49. Wang X, Kankanhalli M (2010) Portfolio theory of multimedia fusion. In: Proceedings of the MM 2010, pp 723–726

    Google Scholar 

  50. Li X, Snoek CG (2009) Visual categorization with negative examples for free. In: Proceedings of the MM 2009, pp 661–664

    Google Scholar 

  51. Quattoni A, Wang S, Morency L, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10):1848–1852

    Article  Google Scholar 

Download references

Acknowledgments

The research work by Kimiaki Shirahama has been funded by the Postdoctoral Fellowship for Research Abroad by Japan Society for the Promotion of Science (JSPS). Also, this work was in part supported by JSPS through Grand-in-Aid for Scientific Research (B): KAKENHI (26280040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kimiaki Shirahama .

Editor information

Editors and Affiliations

Appendix

Appendix

We evaluate our video retrieval method on \(24\) queries specified at TRECVID 2009 Search task [27]. For each query, shots in test videos are manually assessed based on the following criteria: A shot is relevant to the query if it contains a sufficient evidence for humans to recognize the relevance. In other words, such an evidence may be shown only in a region on some video frames in a shot. Below, we show the ID and text description of each query:    

269::

Find shots of a road taken from a moving vehicle through the front window

270::

Find shots of a crowd of people, outdoors, filling more than half of the frame area

271::

Find shots with a view of one or more tall buildings (more than four stories) and the top story visible

272::

Find shots of a person talking on a telephone

273::

Find shots of a closeup of a hand, writing, drawing, coloring, or painting

274::

Find shots of exactly two people sitting at a table

275::

Find shots of one or more people, each walking up one or more steps

276::

Find shots of one or more dogs, walking, running, or jumping

277::

Find shots of a person talking behind a microphone

278::

Find shots of a building entrance

279::

Find shots of people shaking hands

280::

Find shots of a microscope

281::

Find shots of two more people, each singing and/or playing a musical instrument

282::

Find shots of a person pointing

283::

Find shots of a person playing a piano

284::

Find shots of a street scene at night

285::

Find shots of printed, typed, or handwritten text, filling more than half of the frame area

286::

Find shots of something burning with flames visible

287::

Find shots of one or more people, each at a table or desk with a computer visible

288::

Find shots of an airplane or helicopter on the ground, seen from outside

289::

Find shots of one or more people, each sitting in a chair, talking

290::

Find shots of one or more ships or boats, in the water

291::

Find shots of a train in motion, seen from outside

292::

Find shots with the camera zooming in on a person’s face

 

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Shirahama, K., Kumabuchi, K., Grzegorzek, M., Uehara, K. (2015). Video Retrieval Based on Uncertain Concept Detection Using Dempster–Shafer Theory. In: Baughman, A., Gao, J., Pan, JY., Petrushin, V. (eds) Multimedia Data Mining and Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-14998-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14998-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14997-4

  • Online ISBN: 978-3-319-14998-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics