Spatio-Temporal Scale Selection in Video Data

  • Tony LindebergEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10302)


We present a theory and a method for simultaneous detection of local spatial and temporal scales in video data. The underlying idea is that if we process video data by spatio-temporal receptive fields at multiple spatial and temporal scales, we would like to generate hypotheses about the spatial extent and the temporal duration of the underlying spatio-temporal image structures that gave rise to the feature responses. For two types of spatio-temporal scale-space representations, (i) a non-causal Gaussian spatio-temporal scale space for offline analysis of pre-recorded video sequences and (ii) a time-causal and time-recursive spatio-temporal scale space for online analysis of real-time video streams, we express sufficient conditions for spatio-temporal feature detectors in terms of spatio-temporal receptive fields to deliver scale covariant and scale invariant feature responses. A theoretical analysis is given of the scale selection properties of six types of spatio-temporal interest point detectors, showing that five of them allow for provable scale covariance and scale invariance. Then, we describe a time-causal and time-recursive algorithm for detecting sparse spatio-temporal interest points from video streams and show that it leads to intuitively reasonable results.


Temporal Scale Video Data Local Extremum Scale Covariance Scale Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hubel, D.H., Wiesel, T.N.: Brain and Visual Perception: The Story of a 25-Year Collaboration. Oxford University Press, Oxford (2005)Google Scholar
  2. 2.
    DeAngelis, G.C., Ohzawa, I., Freeman, R.D.: Receptive field dynamics in the central visual pathways. Trends Neurosci. 18, 451–457 (1995)CrossRefGoogle Scholar
  3. 3.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR 2001), pp. II:123–II:130 (2001)Google Scholar
  4. 4.
    Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: MacLean, W.J. (ed.) SCVMA 2004. LNCS, vol. 3667, pp. 91–103. Springer, Heidelberg (2006). doi: 10.1007/11676959_8 CrossRefGoogle Scholar
  5. 5.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  6. 6.
    Koenderink, J.J.: The structure of images. Biol. Cyb. 50, 363–370 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Lindeberg, T.: Scale-Space Theory in Computer Vision. Springer, Heidelberg (1993)zbMATHGoogle Scholar
  8. 8.
    Lindeberg, T.: Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space. J. Math. Imaging Vis. 40, 36–81 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Florack, L.M.J.: Image Structure. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  10. 10.
    ter Haar Romeny, B.: Front-End Vision and Multi-Scale Image Analysis. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comp. Vis. 30, 77–116 (1998)Google Scholar
  12. 12.
    Lindeberg, T.: On automatic selection of temporal scales in time-casual scale-space. In: Sommer, G., Koenderink, J.J. (eds.) AFPAC 1997. LNCS, vol. 1315, pp. 94–113. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  13. 13.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)Google Scholar
  14. 14.
    Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88688-4_48 CrossRefGoogle Scholar
  15. 15.
    Lindeberg, T.: A computational theory of visual receptive fields. Biol. Cybern. 107, 589–635 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Lindeberg, T.: Time-causal and time-recursive spatio-temporal receptive fields. J. Math. Imaging Vis. 55, 50–88 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Lindeberg, T.: Temporal scale selection in time-causal scale space. J. Math. Imaging Vis. 58, 57–101 (2017). doi: 10.1007/s10851-016-0691-3 MathSciNetCrossRefGoogle Scholar
  18. 18.
    Lindeberg, T.: Image matching using generalized scale-space interest points. J. Math. Imaging Vis. 52, 3–36 (2015)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Computational Brain Science Lab, Department of Computational Science and Technology, School of Computer Science and CommunicationKTH Royal Institute of TechnologyStockholmSweden

Personalised recommendations