Abstract
Video surveillance systems must process and manage a growing amount of data captured over a network of cameras for various recognition tasks. In order to limit human labour and error, this paper presents a spatial-temporal fusion approach to accurately combine information from Region of Interest (RoI) batches captured in a multi-camera surveillance scenario. In this paper, feature-level and score-level approaches are proposed for spatial-temporal fusion of information to combine information over frames, in a framework based on ensembles of GMM-UBM (Universal Background Models). At the feature-level, features in a batch of multiple frames are combined and fed to the ensemble, whereas at the score-level the outcome of ensemble for individual frames are combined. Results indicate that feature-level fusion provides higher level of accuracy in a very efficient way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dewan, M.A.A., Granger, E., Marcialis, G.L., Sabourin, R., Roli, F.: Adaptive appearance model tracking for still-to-video face recognition. Pattern Recogn. 49, 129–151 (2016)
Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Active learning of video streams in a multi-camera scenario. In: 22nd International Conference on Pattern Recognition (2014)
Dietrich, C., Palm, G., Schwenker, F.: Decision templates for the classification of bioacoustic time series. Inf. Fusion 4, 101–109 (2003)
Jiang, B., Martínez, B., Valstar, M.F., Pantic, M.: Decision level fusion of domain specific regions for facial action recognition. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, 24–28 August 2014, pp. 1776–1781 (2014)
Abouelenien, M., Wan, Y., Saudagar, A.: Feature and decision level fusion for action recognition, pp. 1–7 (2012)
Schels, M., Glodek, M., Meudt, S., Scherer, S., Schmidt, M., Layher, G., Tschechne, S., Brosch, T., Hrabal, D., Walter, S., Traue, H.C., Palm, G., Schwenker, F., Rojc, M., Campbell, N.: Multi-Modal Classifier-Fusion for the Recognition of Emotions. In: Converbal Synchrony in Human-Machine Interaction, pp. 73–97. CRC Press (2013)
Tao, Q., Veldhuis, R.: Hybrid fusion for biometrics: combining score-level and decision-level fusion. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Workshop on Biometrics, Los Alamitos, pp. 1–6. IEEE Computer Society Press (2008)
Khoshrou, S., Cardoso, J.S., Teixeira, L.F.: Learning from evolving video streams in a multi-camera scenario. Mach. Learn. 100, 609–633 (2015)
Fisher, J.W., Darrell, T.: Signal level fusion for multimodal perceptual user interface. In: Workshop on Perceptive User Interfaces, pp. 1–7 (2001)
Colores-Vargas, J.M., García-Vázquez, M., Ramírez-Acosta, A., Pérez-Meana, H., Nakano-Miyatake, M.: Video images fusion to improve iris recognition accuracy in unconstrained environments. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2012. LNCS, vol. 7914, pp. 114–125. Springer, Heidelberg (2013)
Cvejic, N., Nikolov, S., Knowles, H., Loza, A., Achim, A., Bull, D., Canagarajah, C.: The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: ICVPR, pp. 1–7 (2007)
Krishna Mohan, C., Dhananjaya, N., Yegnanarayana, B.: Video shot segmentation using late fusion technique. In: ICMLA, pp. 267–270 (2008)
Kamishima, Y., Inoue, N., Shinoda, K.: Event detection in consumer videos using GMM supervectors and SVMs. EURASIP J. Image Video Process. 51 (2013)
Sharma, V., Davis, J.W.: Feature-level fusion for object segmentation using mutual information. In: Hammoud, R.I. (ed.) Augmented Vision Perception in Infrared, pp. 295–320. Springer, London (2009)
Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Hum.-Mach. Syst. 45, 51–61 (2015)
Teixeira, L.F., Corte-Real, L.: Video object matching across multiple independent views using local descriptors and adaptive learning. Pattern Recogn. Lett. 30, 157–167 (2009)
Settles, B.: Active learning literature survey. Technical report 1648, University of Wisconsin-Madison (2009)
Cardoso, J.S., Corte-Real, L.: Toward a generic evaluation of image segmentation. IEEE Trans. Image Process. 14, 1773–1782 (2005)
Cawley, G.C.: Baseline methods for active learning. In: Active Learning and Experimental Design@ AISTATS, pp. 47–57 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Khoshrou, S., Cardoso, J.S., Granger, E., Teixeira, L.F. (2015). Spatio-Temporal Fusion for Learning of Regions of Interests Over Multiple Video Streams. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9475. Springer, Cham. https://doi.org/10.1007/978-3-319-27863-6_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-27863-6_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27862-9
Online ISBN: 978-3-319-27863-6
eBook Packages: Computer ScienceComputer Science (R0)