Evaluation of Visual Content Descriptors for Supporting Ad-Hoc Video Search Tasks at the Video Browser Showdown

Kletz, Sabrina; Leibetseder, Andreas; Schoeffmann, Klaus

doi:10.1007/978-3-319-73603-7_17

Evaluation of Visual Content Descriptors for Supporting Ad-Hoc Video Search Tasks at the Video Browser Showdown

Sabrina Kletz²¹,
Andreas Leibetseder²¹ &
Klaus Schoeffmann²¹

Conference paper
First Online: 13 January 2018

3124 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Abstract

Since 2017 the Video Browser Showdown (VBS) collaborates with TRECVID and interactively evaluates Ad-Hoc Video Search (AVS) tasks, in addition to Known-Item Search (KIS) tasks. In this video search competition the participants have to find relevant target scenes to a given textual query within a specific time limit, in a large dataset consisting of 600 h of video content. Since usually the number of relevant scenes for such an AVS query is rather high, the teams at the VBS 2017 could find only a small portion of them. One way to support them at the interactive search would be to automatically retrieve other similar instances of an already found target scene. However, it is unclear which content descriptors should be used for such an automatic video content search, using a query-by-example approach. Therefore, in this paper we investigate several different visual content descriptors (CNN Features, CEDD, COMO, HOG, Feature Signatures and HOF) for the purpose of similarity search in the TRECVID IACC.3 dataset, used for the VBS. Our evaluation shows that there is no single descriptor that works best for every AVS query, however, when considering the total performance over all 30 AVS tasks of TRECVID 2016, CNN features provide the best performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
YouTube Company Statistics 2016, www.statisticbrain.com/youtube-statistics (accessed September 1, 2017).
2.
TRECVID video data, http://www-nlpir.nist.gov/projects/tv2016/tv2016.html#data.
3.
TRECVID extra Ad-Hoc video search judgments, www-nlpir.nist.gov/projects/ tv2016/pastdata/extra.avs.qrels.tv16.xlsx.
4.
Internet Archive, www.archive.org.

References

Awad, G., Fiscus, J., Michel, M., Joy, D., Kraaij, W., Smeaton, A.F., Quénot, G., Eskevich, M., Aly, R., Ordelman, R.: TRECVID 2016: evaluating video search, video event detection, localization, and hyperlinking. In: Proceedings of TRECVID (2016)
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Google Scholar
Beecks, C., Kirchhoff, S., Seidl, T.: Signature matching distance for content-based image retrieval. In: Proceedings of 3rd International ACM Conference on Multimedia Retrieval (2013)
Google Scholar
Blaz̆ek, A., Lokoc̆, J., Kubon̆, D.: Video hunter at VBS 2017. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 493–498. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_47
Chapter Google Scholar
Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 312–322. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79547-6_30
Chapter Google Scholar
Cisco: The Zettabyte Era: Trends and Analysis. Technical report, Cisco (2017). http://tinyurl.com/cisco-trends-2017
Cobârzan, C., Schoeffmann, K., Bailer, W., Hürst, W., Blažek, A., Lokoč, J., Vrochidis, S., Barthel, K.U., Rossetto, L.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. I, pp. 886–893. IEEE (2005)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33
Chapter Google Scholar
Hürst, W., Ching, A.I.V., Schoeffmann, K., Primus, M.J.: Storyboard-based video browsing using color and concept indices. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 480–485. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_45
Chapter Google Scholar
Joe Yue-Hei, N., Fan, Y., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: Proceedings of IEEE Conference Workshop on Computer Vision and Pattern Recognition, pp. 53–61 (2015)
Google Scholar
Kletz, S., Schoeffmann, K., Münzer, B., Primus, J.M., Husslein, H.: Surgical action retrieval for assisting video review of laparoscopic skills. In: Proceedings of ACMMM Conference Workshop on Educational and Knowledge Technologies (2017)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of 26th IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Lu, Y.-J., Nguyen, P.A., Zhang, H., Ngo, C.-W.: Concept-based interactive search system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 463–468. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_42
Chapter Google Scholar
Moumtzidou, A., et al.: VERGE in VBS 2017. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 486–492. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_46
Chapter Google Scholar
Nguyen, V.-T., Ngo, T.D., Le, D.-D., Tran, M.-T., Duong, D.A., Satoh, S.: Semantic extraction and object proposal for video search. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 475–479. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_44
Chapter Google Scholar
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_43
Chapter Google Scholar
Sanou, B.: World in 2016: ICT Facts and Figures. Technical report, International Telecommunication Union (ITU) (2017). http://tinyurl.com/itu-facts-2016
Schoeffmann, K., Hudelist, M.A., Huber, J.: Video interaction tools: a survey of recent work. ACM Comput. Surv. 48(1), 14:1–14:34 (2015)
Google Scholar
Schoeffmann, K., Primus, M.J., Muenzer, B., Petscharnig, S., Karisch, C., Xu, Q., Huerst, W.: Collaborative feature maps for interactive video search. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 457–462. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_41
Chapter Google Scholar
Schoeffmann, K.: A user-centric media retrieval competition: the video browser showdown 2012–2014. IEEE MultiMedia 21(4), 8–13 (2014)
Article Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proceedings of 8th ACM International Workshop on Multimedia Information Retrieval, p. 321. ACM Press (2006)
Google Scholar
Vassou, S.A., Amanatiadis, A., Christodoulou, K., Chatzichristoos, S.A.: CoMo: a compact composite moment-based descriptor for image retrieval. In: Proceedings of 15th International Workshop on Content-Based Multimedia Indexing (2017)
Google Scholar
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 603. ACM Press (2008)
Google Scholar

Download references

Acknowledgement

This work is supported by the Alpen-Adria University Klagenfurt and Lakeside Labs GmbH, Klagenfurt, Austria and funding from the European Regional Development Fund and the Carinthian Economic Promotion Fund (KWF) under grant KWF 20214 u. 3520/26336/38165.

Author information

Authors and Affiliations

Institute of Information Technology, Alpen-Adria University (AAU), 9020, Klagenfurt, Austria
Sabrina Kletz, Andreas Leibetseder & Klaus Schoeffmann

Authors

Sabrina Kletz
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Leibetseder
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Schoeffmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabrina Kletz .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kletz, S., Leibetseder, A., Schoeffmann, K. (2018). Evaluation of Visual Content Descriptors for Supporting Ad-Hoc Video Search Tasks at the Video Browser Showdown. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_17
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics