Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams

Khoshrou, Samaneh; Cardoso, Jaime S.; Granger, Eric; Teixeira, Luís F.

doi:10.1007/978-3-319-27863-6_47

Samaneh Khoshrou^25,26,
Jaime S. Cardoso^25,26,
Eric Granger²⁷ &
…
Luís F. Teixeira^25,26

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9475))

Included in the following conference series:

International Symposium on Visual Computing

1829 Accesses

Abstract

Video surveillance systems must process and manage a growing amount of data captured over a network of cameras for various recognition tasks. In order to limit human labour and error, this paper presents a spatial-temporal fusion approach to accurately combine information from Region of Interest (RoI) batches captured in a multi-camera surveillance scenario. In this paper, feature-level and score-level approaches are proposed for spatial-temporal fusion of information to combine information over frames, in a framework based on ensembles of GMM-UBM (Universal Background Models). At the feature-level, features in a batch of multiple frames are combined and fed to the ensemble, whereas at the score-level the outcome of ensemble for individual frames are combined. Results indicate that feature-level fusion provides higher level of accuracy in a very efficient way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dewan, M.A.A., Granger, E., Marcialis, G.L., Sabourin, R., Roli, F.: Adaptive appearance model tracking for still-to-video face recognition. Pattern Recogn. 49, 129–151 (2016)
Article Google Scholar
Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Active learning of video streams in a multi-camera scenario. In: 22nd International Conference on Pattern Recognition (2014)
Google Scholar
Dietrich, C., Palm, G., Schwenker, F.: Decision templates for the classification of bioacoustic time series. Inf. Fusion 4, 101–109 (2003)
Article Google Scholar
Jiang, B., Martínez, B., Valstar, M.F., Pantic, M.: Decision level fusion of domain specific regions for facial action recognition. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, 24–28 August 2014, pp. 1776–1781 (2014)
Google Scholar
Abouelenien, M., Wan, Y., Saudagar, A.: Feature and decision level fusion for action recognition, pp. 1–7 (2012)
Google Scholar
Schels, M., Glodek, M., Meudt, S., Scherer, S., Schmidt, M., Layher, G., Tschechne, S., Brosch, T., Hrabal, D., Walter, S., Traue, H.C., Palm, G., Schwenker, F., Rojc, M., Campbell, N.: Multi-Modal Classifier-Fusion for the Recognition of Emotions. In: Converbal Synchrony in Human-Machine Interaction, pp. 73–97. CRC Press (2013)
Google Scholar
Tao, Q., Veldhuis, R.: Hybrid fusion for biometrics: combining score-level and decision-level fusion. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshop on Biometrics, Los Alamitos, pp. 1–6. IEEE Computer Society Press (2008)
Google Scholar
Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Learning from evolving video streams in a multi-camera scenario. Mach. Learn. 100, 609–633 (2015)
Article MathSciNet Google Scholar
Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Workshop on Perceptive User Interfaces, pp. 1–7 (2001)
Google Scholar
Colores-Vargas, J.M., García-Vázquez, M., Ramírez-Acosta, A., Pérez-Meana, H., Nakano-Miyatake, M.: Video images fusion to improve iris recognition accuracy in unconstrained environments. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2012. LNCS, vol. 7914, pp. 114–125. Springer, Heidelberg (2013)
Chapter Google Scholar
Cvejic, N., Nikolov, S., Knowles, H., Loza, A., Achim, A., Bull, D., Canagarajah, C.: The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: ICVPR, pp. 1–7 (2007)
Google Scholar
Krishna Mohan, C., Dhananjaya, N., Yegnanarayana, B.: Video shot segmentation using late fusion technique. In: ICMLA, pp. 267–270 (2008)
Google Scholar
Kamishima, Y., Inoue, N., Shinoda, K.: Event detection in consumer videos using GMM supervectors and SVMs. EURASIP J. Image Video Process. 51 (2013)
Google Scholar
Sharma, V., Davis, J.W.: Feature-level fusion for object segmentation using mutual information. In: Hammoud, R.I. (ed.) Augmented Vision Perception in Infrared, pp. 295–320. Springer, London (2009)
Chapter Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2015)
Article Google Scholar
Teixeira, L.F., Corte-Real, L.: Video object matching across multiple independent views using local descriptors and adaptive learning. Pattern Recogn. Lett. 30, 157–167 (2009)
Article Google Scholar
Settles, B.: Active learning literature survey. Technical report 1648, University of Wisconsin-Madison (2009)
Google Scholar
Cardoso, J.S., Corte-Real, L.: Toward a generic evaluation of image segmentation. IEEE Trans. Image Process. 14, 1773–1782 (2005)
Article Google Scholar
Cawley, G.C.: Baseline methods for active learning. In: Active Learning and Experimental Design@ AISTATS, pp. 47–57 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

INESC TEC, Porto, Portugal
Samaneh Khoshrou, Jaime S. Cardoso & Luís F. Teixeira
Faculdade de Engenharia da Universidade do Porto (FEUP), Porto, Portugal
Samaneh Khoshrou, Jaime S. Cardoso & Luís F. Teixeira
Laboratoire d’Imagerie, de vision et d’intelligence artificielle, École de technologie supérieure, Université du Québec, Montreal, Canada
Eric Granger

Authors

Samaneh Khoshrou
View author publications
You can also search for this author in PubMed Google Scholar
Jaime S. Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Eric Granger
View author publications
You can also search for this author in PubMed Google Scholar
Luís F. Teixeira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samaneh Khoshrou .

Editor information

Editors and Affiliations

University of Nevada, Reno, Nevada, USA
George Bebis
NASA Ames Research Center, Moffett Field, California, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Bahram Parvin
Desert Research Institute, Reno, Nevada, USA
Darko Koracin
University of Houston, Houston, Texas, USA
Ioannis Pavlidis
IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Rogerio Feris
Purdue University, West Lafayette, Indiana, USA
Tim McGraw
Side Effects Software, Santa Monica, California, USA
Mark Elendt
The DiVE, Durham, North Carolina, USA
Regis Kopper
Texas A&M University, College Station, Texas, USA
Eric Ragan
Kent State University, Kent, Ohio, USA
Zhao Ye
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Gunther Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khoshrou, S., Cardoso, J.S., Granger, E., Teixeira, L.F. (2015). Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9475. Springer, Cham. https://doi.org/10.1007/978-3-319-27863-6_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-27863-6_47
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27862-9
Online ISBN: 978-3-319-27863-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics