Efficient Online Spatio-Temporal Filtering for Video Event Detection

  • Xinchen Yan
  • Junsong YuanEmail author
  • Hui  Liang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


We propose a novel spatio-temporal filtering technique to improve the per-pixel prediction map, by leveraging the spatio-temporal smoothness of the video signal. Different from previous techniques that perform spatio-temporal filtering in an offline/batch mode, e.g., through graphical model, our filtering can be implemented online and in real-time, with provable lowest computational complexity. Moreover, it is compatible to any image analysis module that can produce per-pixel map of detection scores or multi-class prediction distributions. For each pixel, our filtering finds the optimal spatio-temporal trajectory in the past frames that has the maximum accumulated detection score. Pixels with small accumulated detection score will be treated as false alarm thus suppressed. To demonstrate the effectiveness of our online spatio-temporal filtering, we perform three video event tasks: salient action discovery, walking pedestrian detection, and sports event detection, all in an online/causal way. The experimental results on the three datasets demonstrate the excellent performances of our filtering scheme when compared with the state-of-the-art methods.


Video Sequence Salient Object Pedestrian Detection Video Event Baseline Detector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)Google Scholar
  2. 2.
    Badrinarayanan, V., Budvytis, I., Cipolla, R.: Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(11), 2751–2764 (2013)CrossRefGoogle Scholar
  3. 3.
    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In: CVPR (2012)Google Scholar
  4. 4.
    Berclaz, J., Fleuret, F., Turetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. PAMI 33(9), 1806–1819 (2011)CrossRefGoogle Scholar
  5. 5.
    Borji, A., Itti, L.: State-of-the-art in visual attention modeling. PAMI 35(1), 185–207 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen, A.Y., Corso, J.J.: Temporally consistent multi-class video-object segmentation with the video graph-shifts algorithm. In: WACV (2011)Google Scholar
  7. 7.
    Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple people from a moving camera. PAMI 35(7), 1577–1591 (2013)CrossRefGoogle Scholar
  8. 8.
    Couprie, C., Farabet, C., LeCun, Y.: Causal graph-based video segmentation (2013)Google Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  10. 10.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)Google Scholar
  11. 11.
    Floros, G., Leibe, B.: Joint 2d–3d temporally consistent semantic segmentation of street scenes. In: CVPR (2012)Google Scholar
  12. 12.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)Google Scholar
  13. 13.
    Hernández-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., Escalera, S.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: CVPR (2012)Google Scholar
  14. 14.
    Hoai, M., De la Torre, F.: Max-margin early event detectors. In: CVPR (2012)Google Scholar
  15. 15.
    Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR (2012)Google Scholar
  16. 16.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  17. 17.
    Kim, J., Woods, J.W.: Spatio-temporal adaptive 3-d kalman filter for video. IEEE Trans. on Image Processing 6(3), 414–424 (1997)CrossRefGoogle Scholar
  18. 18.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)Google Scholar
  19. 19.
    Leibe, B., Schindler, K., Cornelis, N., Van Gool, L.: Coupled object detection and tracking from static cameras and moving vehicles. PAMI 30(10), 1683–1698 (2008)CrossRefGoogle Scholar
  20. 20.
    Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: CVPR (2011Google Scholar
  21. 21.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)Google Scholar
  22. 22.
    Miksik, O., Munoz, D., Bagnell, J.A., Hebert, M.: Efficient temporal consistency for streaming video scene analysis. In: ICRA (2013)Google Scholar
  23. 23.
    Nataliya, S., Michalis, R., Leonid, S., Greg, M.: Action is in the eye of the beholder: Eye-gaze driven model for spatio-temporal action localization. In: NIPS (2013)Google Scholar
  24. 24.
    Paris, S.: Edge-preserving smoothing and mean-shift segmentation of video streams. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 460–473. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  25. 25.
    Patti, A.J., Tekalp, A.M., Sezan, M.I.: A new motion-compensated reduced-order model kalman filter for space-varying restoration of progressive and interlaced video. IEEE Trans. on Image Processing 7(4), 543–554 (1998)CrossRefGoogle Scholar
  26. 26.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)Google Scholar
  27. 27.
    Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: CVPR (2012)Google Scholar
  28. 28.
    Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. IJCV 77(1–3), 125–141 (2008)CrossRefGoogle Scholar
  29. 29.
    S. Hussain, R., Matthias, G., Irfan, E.: Geometric context from videoGoogle Scholar
  30. 30.
    Sharma, P., Huang, C., Nevatia, R.: Unsupervised incremental learning for improved object detection in a video. In: CVPR (2012)Google Scholar
  31. 31.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild (2012)Google Scholar
  32. 32.
    Supancic III, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: CVPR (2013)Google Scholar
  33. 33.
    Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: adapting object detectors from image to video. In: NIPS (2012)Google Scholar
  34. 34.
    Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)Google Scholar
  35. 35.
    Tran, D., Yuan, J., Forsyth, D.: Video event detection: From subvolume localization to spatio-temporal path search. PAMI (2013)Google Scholar
  36. 36.
    Kastner, S., Ungerleider, G.L.: Mechanisms of visual attention in the human cortex. Annual review of neuroscience 23(1), 315–341 (2000)CrossRefGoogle Scholar
  37. 37.
    Walk, S., Majer, N., Schindler, K., Schiele, B.: New features and insights for pedestrian detection. In: CVPR (2010)Google Scholar
  38. 38.
    Wang, X., Hua, G., Han, T.X.: Detection by detections: non-parametric detector adaptation for a video. In: CVPR (2012)Google Scholar
  39. 39.
    Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  40. 40.
    Wojek, C., Walk, S., Schiele, B.: Multi-cue onboard pedestrian detection. In: CVPR (2009)Google Scholar
  41. 41.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR (2013)Google Scholar
  42. 42.
    Zhang, L., Tong, M.H., Cottrell, G.W.: Sunday: saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st Annual Cognitive Science Conference (2009)Google Scholar
  43. 43.
    Zhou, B., Hou, X., Zhang, L.: A phase discrepancy analysis of object motion. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 225–238. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  3. 3.Computer Science and Engineering Division, Electrical Engineering and Computer Science DepartmentUniversity of MichiganAnn ArborUSA

Personalised recommendations