Skip to main content
Log in

Retina enhanced SURF descriptors for spatio-temporal concept detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.opencv.willowgarage.com.

  2. http://www.its.bldrdoc.gov/vqeg/vqeg-home.aspx.

  3. http://www.code.opencv.org.

  4. http://threadingbuildingblocks.org/.

  5. Indexation et Recherche d’Information Multimédia, Multimedia Information Indexing and Searching, http://mrim.imag.fr/irim/.

References

  1. Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE conference on computer vision and pattern recognition. CVPR 2012 Open Source Award Winner

  2. Ali WBH, Debreuve E, Kornprobst P, Barlaud M (2011) Bio-inspired bags-of-features for image classification. In: KDIR, pp 277–281

  3. Arthur D, Vassilvitskii S (2007) K-means+ +: the advantages of careful seeding. In: SODA, pp 1027–1035

  4. Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on modeling and representing events, J-MRE ’11, pp 13–18. ACM, New York, NY. doi:10.1145/2072508.2072512

    Chapter  Google Scholar 

  5. Benoit A, Caplier A, Durette B, Herault J (2010) Using human visual system modeling for bio-inspired low level image processing. Comput Vis Image Underst 114(7):758–773. doi:10.1016/j.cviu.2010.01.011. http://www.sciencedirect.com/science/article/B6WCX-4YHT83W-2/2/51a03dd736d2dabd052e62122f5fd79a

    Article  Google Scholar 

  6. Chen MY, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. Tech. Rep. CMU-CS-09-161, Carnegie Mellon University

  7. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22

  8. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV (2)’06, pp 428–441

  9. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40:5:1–5:60. doi:10.1145/1348246.1348248

    Article  Google Scholar 

  10. Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Vieux R, Mansencal B, Benois-Pineau J, Boujut H, Morand C, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot MG, Cord M, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: semantic indexing and instance search. In: TREC online proceedings. Gaithersburg, États-Unis, GDR ISIS. http://hal.archives-ouvertes.fr/hal-00591099/en/

  11. Hérault J (2009) Vision: signals, images and neural networks. In: Progress in neural processing. World Scientific Publishers, Département Images et Signal. http://hal.archives-ouvertes.fr/hal-00366132/en/

  12. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  13. Juan L, Gwun O (2009) A comparison of sift, pca-sift and surf. Int J Image Process IJIP 3(4):143–152. http://www.cscjournals.org/csc/manuscriptinfo.php?ManuscriptCode=72.73.72.79.44.52.48.99

    Google Scholar 

  14. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Conference on computer vision & pattern recognition. http://lear.inrialpes.fr/pubs/2008/LMSR08

  15. Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817. doi:10.1109/TPAMI.2006.86

    Article  Google Scholar 

  16. Mantiuk R, Daly S, Myszkowski K, Seidel HP (2005) Predicting visible differences in high dynamic range images—model and its calibration. In: Rogowitz BE, Pappas TN, Daly SJ (eds) Human vision and electronic imaging X, IS&T/SPIE’s 17th annual symposium on electronic imaging (2005), vol 5666, pp 204–214

  17. Niaz U, Redi M, Tanase C, Merialdo B, Farinella G, Li Q (2011) EURECOM at TrecVid 2011: the light semantic indexing task. In: TRECVid'2011, 15th international workshop on video retrieval evaluation, 2011, national institute of standards and technology. Gaithersburg, USA

  18. Over P, Awad G, Michel M, Fiscus J, Kraaij W, Smeaton AF, Quenot G (2011) Trecvid 2011–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2011. NIST, USA

  19. Redi M, MÃrialdo B (2011) Saliency moments for image categorization. In: ICMR 2011, 1st ACM international conference on multimedia retrieval, Trento, Italy, 17–20 April 2011. doi:10.1145/1991996.1992035. http://www.eurecom.fr/publication/3360

  20. Reinhard E, Devlin K (2005) Dynamic range reduction inspired by photoreceptor physiology. IEEE Trans Vis Comput Graph 11:13–24. doi:10.1109/TVCG.2005.9

    Article  Google Scholar 

  21. van de Sande KE, Gevers T, Snoek CG (2008) A comparison of color features for visual concept classification. In: Proceedings of the 2008 international conference on Content-based Image and Video Retrieval, CIVR ’08. ACM, New York, NY, pp 141–150. doi:10.1145/1386352.1386376

    Chapter  Google Scholar 

  22. Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Divakaran A (ed.) Multimedia content analysis, theory and applications. Springer, Berlin, pp 151–174

    Google Scholar 

  23. Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2:215–322. doi:10.1561/1500000014

    Article  Google Scholar 

  24. Sprague JM, Meikle TH Jr. (1965) The role of the superior colliculus in visually guided behavior. Exp Neurol 11(1):115–146. doi:10.1016/0014-4886(65)90026-9. http://www.sciencedirect.com/science/article/pii/0014488665900269

    Article  Google Scholar 

  25. Wang H, Kläser A, Schmid C, Cheng-Lin L (2011) Action recognition by Dense Trajectories. In: IEEE conference on computer vision & pattern recognition. Colorado Springs, USA, pp 3169–3176. http://hal.inria.fr/inria-00583818

  26. Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international Conference on Information and Knowledge Management, CIKM ’06. ACM, New York, NY, pp 102–111. doi:10.1145/1183614.1183633

    Chapter  Google Scholar 

  27. Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. ACM, New York, NY, pp 603–610. doi:10.1145/1390334.1390437

    Chapter  Google Scholar 

Download references

Acknowledgement

This work would not have been possible without the IRIMFootnote 5 French consortium, who provided the processing toolchain for the unified descriptors evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Benoit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strat, S.T., Benoit, A., Lambert, P. et al. Retina enhanced SURF descriptors for spatio-temporal concept detection. Multimed Tools Appl 69, 443–469 (2014). https://doi.org/10.1007/s11042-012-1280-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1280-0

Keywords

Navigation