Video Retrieval Based on Uncertain Concept Detection Using Dempster–Shafer Theory

Shirahama, Kimiaki; Kumabuchi, Kenji; Grzegorzek, Marcin; Uehara, Kuniaki

doi:10.1007/978-3-319-14998-1_12

Kimiaki Shirahama⁵,
Kenji Kumabuchi⁶,
Marcin Grzegorzek⁵ &
…
Kuniaki Uehara⁶

2426 Accesses
1 Citations

Abstract

For a long time, it was difficult to automatically extract meanings from video shots, because, even for a particular meaning, shots are characterized by signifincantly different visual appearances, depending on camera techniques and shooting environments. One promising approach for this has been recently devised where a large amount of shots are statistically analyzed to cover diverse visual appearances for a meaning. Inspired by the significant performance improvement, concept-based video retrieval receives much research attention. Here, concepts are abstracted names of meanings that humans can perceive from shots, like objects, actions, events, and scenes. For each concept, a detector is built in advance by analyzing a large amount of shots. Then, given a query, shots are retrieved based on concept detection results. Since each detector can detect a concept robustly to diverse visual appearances, effective retrieval can be achieved using concept detection results as “intermediate” features. However, despite the recent improvement, it is still difficult to accurately detect any kind of concept. In addition, shots can be taken by arbitrary camera techniques and in arbitrary shooting environments, which unboundedly increases the diversity of visual appearances. Thus, it cannot be expected to detect concepts with an accuracy of \(100\,\%\). This chapter explores how to utilize such uncertain detection results to improve concept-based video retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Since the purpose of this section is to provide an overview of concept-based retrieval, Fig. 12.4 only presents generalization/specialization relations. Please refer to [12] for other relations (e.g., part-of, attribute-of, and co-occurrence) and our approach for organizing LSCOM concepts.
2.
Since the search task has been stopped at TRECVID 2009, videos of this year are the latest ones where the retrieval performance using example shots can be evaluated.

References

Petkovic M, Jonker W (2002) Content-based video retrieval: a database perspective. Kluwer Academic Publishers, Boston
Google Scholar
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Djordjevic D, Izquierdo E, Grzegorzek M (2007) User driven systems to bridge the semantic gap. In: Proceedings of the EUSIPCO 2007, pp 718–722
Google Scholar
Staab S, Scherp A, Arndt R, Troncy R, Grzegorzek M, Saathoff C, Schenk S, Hardman L (2008) Semantic multimedia. In: Baroglio C, Bonatti PA, Małuszyński J, Polleres A, Schaffert S (eds) Reasoning Web. LNCS.Springer, Berlin
Google Scholar
Naphade MR, Smith JR (2004) On the detection of semantic concepts at TRECVID. In: Proceedings of the MM 2004, pp 660–667
Google Scholar
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322
Article Google Scholar
Li X, Wang D, Li J, Zhang B (2007) Video search in concept subspace: a text-like paradigm. In: Proceedings of the CIVR 2007, pp 603–610
Google Scholar
Natsev AP, Haubold A, Tešić J, Xie L, Yan R (2007) Semantic concept-based query expansion and re-ranking for multimedia retrieval. In: Proceedings of the MM 2007, pp 991–1000
Google Scholar
Ngo C et al (2009) VIREO/DVMM at TRECVID 2009: high-level feature extraction, automatic video search and content-based copy detection. In: Proceedings of the TRECVID 2009, pp 415–432
Google Scholar
Wei XY, Jiang YG, Ngo CW (2011) Concept-driven multi-modality fusion for video search. IEEE Trans Circuits Syst Video Technol 21(1):62–73
Article Google Scholar
Naphade M, Smith J, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimed 13(3):86–91
Article Google Scholar
Shirahama K, Uehara K (2011) Constructing and utilizing video ontology for accurate and fast retrieval. Int J Multimed Data Eng Manag (IJMDEM) 2(4):59–75
Article Google Scholar
Zhu S, Wei X, Ngo C (2013) Error recovered hierarchical classification. In: Proceedings of the MM 2013, pp 697–700
Google Scholar
Hauptmann A, Yan R, Lin WH, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans Multimed 9(5):958–966
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the CVPR 2009, pp 248–255
Google Scholar
Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the CHI 2008, pp 453–456
Google Scholar
Ayache S, Qu\(\acute{\text{ e }}\)not G (2008) Video corpus annotation using active learning. In: Proceedings of the ECIR 2008, pp 187–198
Google Scholar
Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, Gool LV (2005) A comparison of affine region detectors. Int J Comput Vis 65(1–2):43–72
Article Google Scholar
Lowe D (1999) Object recognition from local scale-invariant features. In: Proceedings of the ICCV 1999, pp 1150–1157
Google Scholar
Bay H, Tuytelaars T, Gool L (2006) SURF: speeded up robust features. In: Proceedings of the ECCV 2006, pp 404–417
Google Scholar
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
Article Google Scholar
Csurka G, Bray C, Dance C, Fan L (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV 2004 SLCV, pp 1–22
Google Scholar
Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimed 14(4):1196–1205
Article Google Scholar
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the CVPR 2007, pp 1–8
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
MATH Google Scholar
Lin HT, Lin CJ, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276
Google Scholar
Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of the MIR 2006, pp 321–330
Google Scholar
The PASCAL Visual Object Classes Homepage. http://pascallin.ecs.soton.ac.uk/challenges/VOC/
ImageNet Large Scale Visual Recognition Competition (2013) (ILSVRC2013). http://www.image-net.org/challenges/LSVRC/2013/
Shirahama K, Uehara K (2012) Kobe university and Muroran institute of technology at TRECVID 2012 semantic indexing task. In: Proceedings of the TRECVID 2012, pp 239–247
Google Scholar
Snoek CGM et al (2009) The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of the TRECVID 2009, pp 226–238
Google Scholar
Natsev AP, Naphade MR, Tešić J (2005) Learning the semantics of multimedia queries and concepts from a small number of examples. In: Proceedings of the MM 2005, pp 598–607
Google Scholar
Rasiwasia N, Moreno P, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multimed 9(5):923–938
Article Google Scholar
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
MATH Google Scholar
Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119–130
Article Google Scholar
Kanamori T, Hido S, Sugiyama M (2009) A least-squares approach to direct importance estimation. J Mach Learn Res 10(7):1391–1445
MATH MathSciNet Google Scholar
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the ECCV 2006, pp 490–503
Google Scholar
Snoek CGM, Worring M, Geusebroek JM, Koelma D, Seinstra F (2005) On the surplus value of semantic video analysis beyond the key frame. In: Proceedings of the ICME 2005, pp 386–389
Google Scholar
Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the CVPR 2011, pp 3169–3176
Google Scholar
Peng Y et al (2009) PKU-ICST at TRECVID 2009: high level feature extraction and search. In: Proceedings of the TRECVID 2009
Google Scholar
Aggarwal C, Yu P (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623
Article Google Scholar
Bi J, Zhang T (2005) Support vector classification with input data uncertainty. In: Proceedings of the NIPS 2004, pp 161–168
Google Scholar
Kriegel HP, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the KDD 2005, pp 672–677
Google Scholar
Wang H, McClean S (2008) Deriving evidence theoretical functions in multivariate data spaces: a systematic approach. IEEE Trans Syst Man Cybern B Cybern 38(2):455–465
Article Google Scholar
Aregui A, Denoeux T (2008) Constructing consonant belief functions from sample data using confidence sets of pignistic probabilities. Int J Approx Reason 49(3):575–594
Article MATH MathSciNet Google Scholar
Zribi M (2003) Parametric estimation of Dempster-Shafer belief functions. In: Proceedings of the ISIF 2003, pp 485–491
Google Scholar
Benmokhtar R, Huet B (2008) Perplexity-based evidential neural network classifier fusion using MPEG-7 low-level visual features. In: Proceedings of the MIR 2008, pp 336–341
Google Scholar
Wang X, Kankanhalli M (2010) Portfolio theory of multimedia fusion. In: Proceedings of the MM 2010, pp 723–726
Google Scholar
Li X, Snoek CG (2009) Visual categorization with negative examples for free. In: Proceedings of the MM 2009, pp 661–664
Google Scholar
Quattoni A, Wang S, Morency L, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10):1848–1852
Article Google Scholar

Download references

Acknowledgments

The research work by Kimiaki Shirahama has been funded by the Postdoctoral Fellowship for Research Abroad by Japan Society for the Promotion of Science (JSPS). Also, this work was in part supported by JSPS through Grand-in-Aid for Scientific Research (B): KAKENHI (26280040).

Author information

Authors and Affiliations

Pattern Recognition Group, University of Siegen, Hoelderlinstr. 3, 57076, Siegen, Germany
Kimiaki Shirahama & Marcin Grzegorzek
Graduate School of System Informatics, Kobe University, 1-1, Rokkodai, Nada Kobe, 657-8501, Japan
Kenji Kumabuchi & Kuniaki Uehara

Authors

Kimiaki Shirahama
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Kumabuchi
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Grzegorzek
View author publications
You can also search for this author in PubMed Google Scholar
Kuniaki Uehara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kimiaki Shirahama .

Editor information

Editors and Affiliations

IBM Corp., Durham, North Carolina, USA
Aaron K. Baughman
Nokia Inc., Sunnyvale, California, USA
Jiang Gao
Google Inc., Mountain View, California, USA
Jia-Yu Pan
4i, Inc., Carlsbad, California, USA
Valery A. Petrushin

Appendix

We evaluate our video retrieval method on \(24\) queries specified at TRECVID 2009 Search task [27]. For each query, shots in test videos are manually assessed based on the following criteria: A shot is relevant to the query if it contains a sufficient evidence for humans to recognize the relevance. In other words, such an evidence may be shown only in a region on some video frames in a shot. Below, we show the ID and text description of each query:

269::: Find shots of a road taken from a moving vehicle through the front window
270::: Find shots of a crowd of people, outdoors, filling more than half of the frame area
271::: Find shots with a view of one or more tall buildings (more than four stories) and the top story visible
272::: Find shots of a person talking on a telephone
273::: Find shots of a closeup of a hand, writing, drawing, coloring, or painting
274::: Find shots of exactly two people sitting at a table
275::: Find shots of one or more people, each walking up one or more steps
276::: Find shots of one or more dogs, walking, running, or jumping
277::: Find shots of a person talking behind a microphone
278::: Find shots of a building entrance
279::: Find shots of people shaking hands
280::: Find shots of a microscope
281::: Find shots of two more people, each singing and/or playing a musical instrument
282::: Find shots of a person pointing
283::: Find shots of a person playing a piano
284::: Find shots of a street scene at night
285::: Find shots of printed, typed, or handwritten text, filling more than half of the frame area
286::: Find shots of something burning with flames visible
287::: Find shots of one or more people, each at a table or desk with a computer visible
288::: Find shots of an airplane or helicopter on the ground, seen from outside
289::: Find shots of one or more people, each sitting in a chair, talking
290::: Find shots of one or more ships or boats, in the water
291::: Find shots of a train in motion, seen from outside
292::: Find shots with the camera zooming in on a person’s face

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shirahama, K., Kumabuchi, K., Grzegorzek, M., Uehara, K. (2015). Video Retrieval Based on Uncertain Concept Detection Using Dempster–Shafer Theory. In: Baughman, A., Gao, J., Pan, JY., Petrushin, V. (eds) Multimedia Data Mining and Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-14998-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-14998-1_12
Published: 01 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14997-4
Online ISBN: 978-3-319-14998-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video Retrieval Based on Uncertain Concept Detection Using Dempster–Shafer Theory

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation