Retina enhanced SURF descriptors for spatio-temporal concept detection

Strat, Sabin Tiberius; Benoit, Alexandre; Lambert, Patrick; Caplier, Alice

doi:10.1007/s11042-012-1280-0

Retina enhanced SURF descriptors for spatio-temporal concept detection

Published: 11 January 2013

Volume 69, pages 443–469, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sabin Tiberius Strat^1,2,
Alexandre Benoit¹,
Patrick Lambert¹ &
…
Alice Caplier³

413 Accesses
6 Citations
Explore all metrics

Abstract

This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Notes

http://www.opencv.willowgarage.com.
http://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx.
http://www.code.opencv.org.
http://threadingbuildingblocks.org/.
Indexation et Recherche d’Information Multimédia, Multimedia Information Indexing and Searching, http://mrim.imag.fr/irim/.

References

Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE conference on computer vision and pattern recognition. CVPR 2012 Open Source Award Winner
Ali WBH, Debreuve E, Kornprobst P, Barlaud M (2011) Bio-inspired bags-of-features for image classification. In: KDIR, pp 277–281
Arthur D, Vassilvitskii S (2007) K-means+ +: the advantages of careful seeding. In: SODA, pp 1027–1035
Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on modeling and representing events, J-MRE ’11, pp 13–18. ACM, New York, NY. doi:10.1145/2072508.2072512
Chapter Google Scholar
Benoit A, Caplier A, Durette B, Herault J (2010) Using human visual system modeling for bio-inspired low level image processing. Comput Vis Image Underst 114(7):758–773. doi:10.1016/j.cviu.2010.01.011. http://www.sciencedirect.com/science/article/B6WCX-4YHT83W-2/2/51a03dd736d2dabd052e62122f5fd79a
Article Google Scholar
Chen MY, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. Tech. Rep. CMU-CS-09-161, Carnegie Mellon University
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV (2)’06, pp 428–441
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40:5:1–5:60. doi:10.1145/1348246.1348248
Article Google Scholar
Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Vieux R, Mansencal B, Benois-Pineau J, Boujut H, Morand C, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot MG, Cord M, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: semantic indexing and instance search. In: TREC online proceedings. Gaithersburg, États-Unis, GDR ISIS. http://hal.archives-ouvertes.fr/hal-00591099/en/
Hérault J (2009) Vision: signals, images and neural networks. In: Progress in neural processing. World Scientific Publishers, Département Images et Signal. http://hal.archives-ouvertes.fr/hal-00366132/en/
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Juan L, Gwun O (2009) A comparison of sift, pca-sift and surf. Int J Image Process IJIP 3(4):143–152. http://www.cscjournals.org/csc/manuscriptinfo.php?ManuscriptCode=72.73.72.79.44.52.48.99
Google Scholar
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Conference on computer vision & pattern recognition. http://lear.inrialpes.fr/pubs/2008/LMSR08
Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817. doi:10.1109/TPAMI.2006.86
Article Google Scholar
Mantiuk R, Daly S, Myszkowski K, Seidel HP (2005) Predicting visible differences in high dynamic range images—model and its calibration. In: Rogowitz BE, Pappas TN, Daly SJ (eds) Human vision and electronic imaging X, IS&T/SPIE’s 17th annual symposium on electronic imaging (2005), vol 5666, pp 204–214
Niaz U, Redi M, Tanase C, Merialdo B, Farinella G, Li Q (2011) EURECOM at TrecVid 2011: the light semantic indexing task. In: TRECVid'2011, 15th international workshop on video retrieval evaluation, 2011, national institute of standards and technology. Gaithersburg, USA
Over P, Awad G, Michel M, Fiscus J, Kraaij W, Smeaton AF, Quenot G (2011) Trecvid 2011–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2011. NIST, USA
Redi M, MÃrialdo B (2011) Saliency moments for image categorization. In: ICMR 2011, 1st ACM international conference on multimedia retrieval, Trento, Italy, 17–20 April 2011. doi:10.1145/1991996.1992035. http://www.eurecom.fr/publication/3360
Reinhard E, Devlin K (2005) Dynamic range reduction inspired by photoreceptor physiology. IEEE Trans Vis Comput Graph 11:13–24. doi:10.1109/TVCG.2005.9
Article Google Scholar
van de Sande KE, Gevers T, Snoek CG (2008) A comparison of color features for visual concept classification. In: Proceedings of the 2008 international conference on Content-based Image and Video Retrieval, CIVR ’08. ACM, New York, NY, pp 141–150. doi:10.1145/1386352.1386376
Chapter Google Scholar
Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Divakaran A (ed.) Multimedia content analysis, theory and applications. Springer, Berlin, pp 151–174
Google Scholar
Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2:215–322. doi:10.1561/1500000014
Article Google Scholar
Sprague JM, Meikle TH Jr. (1965) The role of the superior colliculus in visually guided behavior. Exp Neurol 11(1):115–146. doi:10.1016/0014-4886(65)90026-9. http://www.sciencedirect.com/science/article/pii/0014488665900269
Article Google Scholar
Wang H, Kläser A, Schmid C, Cheng-Lin L (2011) Action recognition by Dense Trajectories. In: IEEE conference on computer vision & pattern recognition. Colorado Springs, USA, pp 3169–3176. http://hal.inria.fr/inria-00583818
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international Conference on Information and Knowledge Management, CIKM ’06. ACM, New York, NY, pp 102–111. doi:10.1145/1183614.1183633
Chapter Google Scholar
Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. ACM, New York, NY, pp 603–610. doi:10.1145/1390334.1390437
Chapter Google Scholar

Download references

Acknowledgement

This work would not have been possible without the IRIM^{Footnote 5} French consortium, who provided the processing toolchain for the unified descriptors evaluation.

Author information

Authors and Affiliations

LISTIC - Université de Savoie, Annecy Le Vieux, France
Sabin Tiberius Strat, Alexandre Benoit & Patrick Lambert
LAPI - University “Politechnica” of Bucharest, Bucharest, Romania
Sabin Tiberius Strat
Gipsa-Lab - Université de Grenoble, St Martin d’Hères, France
Alice Caplier

Authors

Sabin Tiberius Strat
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Benoit
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Alice Caplier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre Benoit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strat, S.T., Benoit, A., Lambert, P. et al. Retina enhanced SURF descriptors for spatio-temporal concept detection. Multimed Tools Appl 69, 443–469 (2014). https://doi.org/10.1007/s11042-012-1280-0

Download citation

Published: 11 January 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11042-012-1280-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Retina enhanced SURF descriptors for spatio-temporal concept detection

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Retina enhanced SURF descriptors for spatio-temporal concept detection

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation