Skip to main content

Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9475))

Included in the following conference series:

  • 1907 Accesses

Abstract

Video surveillance systems must process and manage a growing amount of data captured over a network of cameras for various recognition tasks. In order to limit human labour and error, this paper presents a spatial-temporal fusion approach to accurately combine information from Region of Interest (RoI) batches captured in a multi-camera surveillance scenario. In this paper, feature-level and score-level approaches are proposed for spatial-temporal fusion of information to combine information over frames, in a framework based on ensembles of GMM-UBM (Universal Background Models). At the feature-level, features in a batch of multiple frames are combined and fed to the ensemble, whereas at the score-level the outcome of ensemble for individual frames are combined. Results indicate that feature-level fusion provides higher level of accuracy in a very efficient way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dewan, M.A.A., Granger, E., Marcialis, G.L., Sabourin, R., Roli, F.: Adaptive appearance model tracking for still-to-video face recognition. Pattern Recogn. 49, 129–151 (2016)

    Article  Google Scholar 

  2. Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Active learning of video streams in a multi-camera scenario. In: 22nd International Conference on Pattern Recognition (2014)

    Google Scholar 

  3. Dietrich, C., Palm, G., Schwenker, F.: Decision templates for the classification of bioacoustic time series. Inf. Fusion 4, 101–109 (2003)

    Article  Google Scholar 

  4. Jiang, B., Martínez, B., Valstar, M.F., Pantic, M.: Decision level fusion of domain specific regions for facial action recognition. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, 24–28 August 2014, pp. 1776–1781 (2014)

    Google Scholar 

  5. Abouelenien, M., Wan, Y., Saudagar, A.: Feature and decision level fusion for action recognition, pp. 1–7 (2012)

    Google Scholar 

  6. Schels, M., Glodek, M., Meudt, S., Scherer, S., Schmidt, M., Layher, G., Tschechne, S., Brosch, T., Hrabal, D., Walter, S., Traue, H.C., Palm, G., Schwenker, F., Rojc, M., Campbell, N.: Multi-Modal Classifier-Fusion for the Recognition of Emotions. In: Converbal Synchrony in Human-Machine Interaction, pp. 73–97. CRC Press (2013)

    Google Scholar 

  7. Tao, Q., Veldhuis, R.: Hybrid fusion for biometrics: combining score-level and decision-level fusion. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshop on Biometrics, Los Alamitos, pp. 1–6. IEEE Computer Society Press (2008)

    Google Scholar 

  8. Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Learning from evolving video streams in a multi-camera scenario. Mach. Learn. 100, 609–633 (2015)

    Article  MathSciNet  Google Scholar 

  9. Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Workshop on Perceptive User Interfaces, pp. 1–7 (2001)

    Google Scholar 

  10. Colores-Vargas, J.M., García-Vázquez, M., Ramírez-Acosta, A., Pérez-Meana, H., Nakano-Miyatake, M.: Video images fusion to improve iris recognition accuracy in unconstrained environments. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2012. LNCS, vol. 7914, pp. 114–125. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Cvejic, N., Nikolov, S., Knowles, H., Loza, A., Achim, A., Bull, D., Canagarajah, C.: The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: ICVPR, pp. 1–7 (2007)

    Google Scholar 

  12. Krishna Mohan, C., Dhananjaya, N., Yegnanarayana, B.: Video shot segmentation using late fusion technique. In: ICMLA, pp. 267–270 (2008)

    Google Scholar 

  13. Kamishima, Y., Inoue, N., Shinoda, K.: Event detection in consumer videos using GMM supervectors and SVMs. EURASIP J. Image Video Process. 51 (2013)

    Google Scholar 

  14. Sharma, V., Davis, J.W.: Feature-level fusion for object segmentation using mutual information. In: Hammoud, R.I. (ed.) Augmented Vision Perception in Infrared, pp. 295–320. Springer, London (2009)

    Chapter  Google Scholar 

  15. Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2015)

    Article  Google Scholar 

  16. Teixeira, L.F., Corte-Real, L.: Video object matching across multiple independent views using local descriptors and adaptive learning. Pattern Recogn. Lett. 30, 157–167 (2009)

    Article  Google Scholar 

  17. Settles, B.: Active learning literature survey. Technical report 1648, University of Wisconsin-Madison (2009)

    Google Scholar 

  18. Cardoso, J.S., Corte-Real, L.: Toward a generic evaluation of image segmentation. IEEE Trans. Image Process. 14, 1773–1782 (2005)

    Article  Google Scholar 

  19. Cawley, G.C.: Baseline methods for active learning. In: Active Learning and Experimental Design@ AISTATS, pp. 47–57 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samaneh Khoshrou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Khoshrou, S., Cardoso, J.S., Granger, E., Teixeira, L.F. (2015). Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9475. Springer, Cham. https://doi.org/10.1007/978-3-319-27863-6_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27863-6_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27862-9

  • Online ISBN: 978-3-319-27863-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics