A Benchmarking Framework for Background Subtraction in RGBD Videos

Camplani, Massimo; Maddalena, Lucia; Moyá Alcover, Gabriel; Petrosino, Alfredo; Salgado, Luis

doi:10.1007/978-3-319-70742-6_21

A Benchmarking Framework for Background Subtraction in RGBD Videos

Massimo Camplani¹⁷,
Lucia Maddalena¹⁸,
Gabriel Moyá Alcover¹⁹,
Alfredo Petrosino²⁰ &
…
Luis Salgado^21,22

Conference paper
First Online: 31 December 2017

2162 Accesses
28 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10590))

Abstract

The complementary nature of color and depth synchronized information acquired by low cost RGBD sensors poses new challenges and design opportunities in several applications and research areas. Here, we focus on background subtraction for moving object detection, which is the building block for many computer vision applications, being the first relevant step for subsequent recognition, classification, and activity analysis tasks. The aim of this paper is to describe a novel benchmarking framework that we set up and made publicly available in order to evaluate and compare scene background modeling methods for moving object detection on RGBD videos. The proposed framework involves the largest RGBD video dataset ever made for this specific purpose. The 33 videos span seven categories, selected to include diverse scene background modeling challenges for moving object detection. Seven evaluation metrics, chosen among the most widely used, are adopted to evaluate the results against a wide set of pixel-wise ground truths. Moreover, we present a preliminary analysis of results, devoted to assess to what extent the various background modeling challenges pose troubles to background subtraction methods exploiting color and depth information.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The advent of low cost RGBD sensors such as Microsoft Kinect or Asus Xtion Pro is completely changing the computer vision world, as they are being successfully used in several applications and research areas. Many of these applications, such as gaming or human computer interaction systems, rely on the efficiency of learning a scene background model for detecting and tracking moving objects, to be further processed and analyzed. Depth data is particularly attractive and suitable for applications based on moving objects detection, since they are not affected by several problems representative of color-based imagery. However, depth data suffer from other problems, such as depth camouflage or depth sensor noisy measurements, which limit the efficiency of depth-only based background modeling approaches. The complementary nature of color and depth synchronized information acquired by RGBD sensors poses new challenges and design opportunities. New strategies are required that explore the effectiveness of the combination of depth- and color-based features, or their joint incorporation into well known moving object detection and tracking frameworks.

In order to evaluate and compare scene background modelling methods for moving object detection on RGBD videos, we assembled and made available the SBM-RGBD dataset^{Footnote 1}. It provides all facilities (data, ground truths, and evaluation scripts) for the SBM-RGBD Challenge, organized in conjunction with the Workshop on Background Learning for Detection and Tracking from RGBD Videos, 2017. The dataset and the results of the SBM-RGBD Challenge, which are described in the following sections, will remain available also after the competition, as reference for future methods.

2 Video Categories

The SBM-RGBD dataset provides a wide set of synchronized color and depth sequences acquired by the Microsoft Kinect. The dataset consists of 33 videos (about 15000 frames) representative of typical indoor visual data captured in video surveillance and smart environment scenarios, selected to cover a wide range of scene background modeling challenges for moving object detection. The videos come from our personal collections as well as from existing public datasets, including the GSM dataset, described in Moyá-Alcover et al. [13], MULTIVISION, described in Fernandez-Sanchez et al. [5], the Princeton Tracking Benchmark, described by Song and Xiao [14], the RGB-D object detection dataset, described by Camplani and Salgado [3], and the UR Fall Detection Dataset, described by Kwolek and Kepski [7].

The videos have 640$\,\times \,$480 spatial resolution and their length varies from 70 to 1400 frames. Depth images are recorded at either 16 or 8 bits. They are already synchronized and registered with the corresponding color images by projecting the depth map onto the color image, allowing a color-depth pixel correspondence. For each sequence, pixels that have no color-depth correspondence (due to the difference in the color and depth cameras centers) are indicated in black in a binary Region-of-Interest (ROI) image (see Fig. 2-(c)) and are excluded by the evaluation (see Sect. 4).

The videos span seven categories, selected to include diverse scene background modelling challenges for moving object detection. These well known challenges can be related only to the RGB channels (RGB), only to the depth channel (D), or can be related to all the channels (RGB+D):

1.
Bootstrapping (RGB+D): Videos including foreground objects in all their frames. The challenge is to learn a model of the scene background (to be adopted for background subtraction) even when the usual assumption of having a set of training frames empty of foreground objects fails.

This category includes five videos, in most of which the background is never shown in some scene regions, being always occupied by foreground people.
2.
Color Camouflage (RGB): Videos including foreground objects whose color is very close to that of the background, making hard a correct segmentation based only on color. This category consists of four videos where foreground objects are moved in front of similarly colored background (e.g., a white box in front of other white boxes or a rolling furniture moving in front of other furniture of the same color).
3.
Depth Camouflage (D): Videos including foreground objects very close in depth to the background. Indeed, in these cases the sensor gives the same depth data values for foreground and background, making hard a correct segmentation based only on depth. The category consists of four videos where people move their hands or other objects very close to the background.
4.
Illumination Changes (RGB): Videos containing strong and mild illumination changes. The challenge here is to adapt the color background model to illumination changes in order to achieve an accurate foreground detection. Four videos are included into this category, where the illumination varies due to the covering of the light source or to unstable illumination acquisition.
5.
Intermittent Motion (RGB+D): Videos with scenarios known for causing ghosting artifacts in the detected motion, i.e., abandoned foreground objects or removed foreground objects. The challenge here is to detect foreground objects even if they stop moving (abandoned object) or if they were initially stationary and then start moving (removed object). This category consists of six videos including abandoned and removed objects. Two videos are obtained by reversing the original temporal order of the frames (so that an object that is abandoned in the original sequence results as removed in the reversed sequence).
6.
Out of Sensor Range (D): Videos including foreground or background objects that are too close to/far from the sensor. Indeed, in these cases the sensor is unable to measure depth, due to its minimum and maximum depth specifications, resulting in invalid depth values. Five videos are included into this category, where several invalid depth values are due to foreground objects whose distance from the sensor is out of the admissible sensor range.
7.
Shadows (RGB+D): Videos showing shadows caused by foreground objects. Indeed, foreground objects block the active light emitted by the sensor from reaching the background. This causes the casting on the background of shadows, that apparently behave as moving objects. RGBD sensors exhibit two different types of shadows: visible-light shadows in the RGB channels or IR shadows in the depth channel. The category consists of five videos including more or less strong shadows.

Examples of videos from all the categories are reported in Fig. 1.

3 Ground Truths

To enable a precise quantitative comparison of various algorithms for moving object detection from RGBD videos, all the videos come with pixel-wise ground truth foreground segmentations for each video. A foreground region is intended as anything that does not belong to the background, including abandoned objects and still persons, but excluding light reflections, shadows, etc. The ground truth images, some of which created using the GroundTruther software kindly made available by the organizers of changedetection.net, contain four labels (see Fig. 2-(d)), namely:

0: Background
85: Outside ROI
170: Unknown motion
255: Foreground

Areas around moving objects are labeled as unknown motion, due to semi-transparency and motion blur that do not allow a precise foreground/background classification. Therefore, these areas, as those not included into the ROI, are excluded by the evaluation.

While our evaluation is made across all the ground truths for all the videos, only a subset of the available ground truths is made publicly available for testing, in order to reduce the possibility of overtuning method parameters.

4 Metrics

The SBM-RGBD dataset comes also with tools to compute performance metrics for moving object detection from RGBD videos, and thus identify algorithms that are robust across various challenges. Let TP, FP, FN, and TN indicate, for each video, the total number of True Positive, False Positive, False Negative, and True Negative pixels, respectively. The seven adopted metrics, widely adopted in the literature for evaluating the results of moving object detection (e.g., [6]), are

1.
Recall
$$\begin{aligned} Rec=\frac{TP}{TP + FN} \end{aligned}$$
2.
Specificity
$$\begin{aligned} Sp=\frac{TN}{TN + FP} \end{aligned}$$
3.
False Positive Rate
$$\begin{aligned} FPR=\frac{FP}{FP + TN} \end{aligned}$$
4.
False Negative Rate
$$\begin{aligned} FNR=\frac{FN}{TP + FN} \end{aligned}$$
5.
Percentage of Wrong Classifications
$$\begin{aligned} PWC=100 * \frac{FN + FP}{TP + FN + FP + TN} \end{aligned}$$
6.
Precision
$$\begin{aligned} Prec=\frac{TP}{TP + FP} \end{aligned}$$
7.
F-Measure
$$\begin{aligned} F_1=\frac{2 * Prec * Rec}{Prec + Rec} \end{aligned}$$

The Matlab scripts to compute all performance metrics have been adapted by the scripts available from changedetection.net.

5 Experimental Results

Several authors submitted their results to the SBM-RGBD challenge, and some of them provided a description of their method: RGBD-SOBS and RGB-SOBS [11], SCAD [12], and cwisardH+ [4]. Therefore, our experimental analysis is mainly devoted to assess to what extent the different background modelling challenges introduced in Sect. 2 pose troubles to these background subtraction methods.

In Table 1, we report average results on the whole dataset achieved by all submitted methods (as of July 4th, 2017), while in Tables 2 and 3, we report their average results for each challenge category^{Footnote 2}.

Table 1. Average results on the whole SBM-RGBD dataset.

Full size table

Table 2. Average results for each category of the SBM-RGBD dataset (Part 1).

Full size table

Table 3. Average results for each category of the SBM-RGBD dataset (Part 2).

Full size table

Bootstrapping can be a problem, especially for selective background subtraction methods (e.g., [9]), i.e. those that update the background model using only background information. Indeed, once a foreground object is erroneously included into the background model (e.g., due to inappropriate background initialization or to inaccurate segmentation of foreground objects), it will hardly be removed by the model, continuing to produce false negative results. The problem is even harder if some parts of the background are never shown during the sequences, as it happens in most of the videos of the Bootstrapping category. Indeed, in these cases, also the best performing background initialization methods [1] fail, as illustrated in Fig. 3, and only alternative techniques (e.g., inpainting) can be adopted to recover missing data [10]. Nonetheless, depth information seems to be beneficial for affording the challenge, as reported in Table 2, where accurate results are achieved by most of the methods that exploit depth information.

As expected, all the methods that exploit depth information achieve high accuracy in case of color camouflage. An evident example of the benefits induced by depth information for this category is given by the F-measure value achieved by the RGBD-SOBS method, that doubles the value achieved by the same method but without considering depth (RGB-SOBS). A similar reasoning can be applied to the illumination changes challenge. However, we point out that, in this case, the analysis should be based on Specificity, FPR, FNR, and PWC, rather than on the other three metrics. Indeed, two of the four videos of this category have no foreground objects throughout the whole duration, their rationale being the willingness of not detecting false positives under varying illumination conditions. This leads to have no positive cases in all ground truths and, consequently, to undefined values of Precision, Recall, and F-measure (in the experiments, values for these undefined cases are set to zero).

Depth can be beneficial also for detecting and properly handling cases of intermittent motion. Indeed, foreground objects can be easily identified based on their depth, that is lower than that of the background, even when they remain stationary for long time periods. Methods that explicitly exploit this characteristic (e.g., RGBD-SOBS and SCAD) succeed in handling cases of removed and abandoned objects, achieving high accuracy.

Overall, shadows do not seem to pose a strong challenge to most of the methods. Indeed, depth shadows due to moving objects cause some undefined depth values, generally close to the object contours, but these can be handled based on motion. Color shadows can be handled either exploiting depth information, that is insensitive to this challenge, or through color shadow detection techniques (e.g., as in RGB-SOBS and SCAD), when only color information is taken into account. Instead, they are still a challenge when the sole grey level intensity is considered (e.g., as in SRPCA).

Out of range and Depth camouflage are among the most challenging issues, at least when information on color is disregarded or not properly combined with depth. Indeed, even though accuracy of most of the methods is moderately high, several false negatives are produced, as shown in Fig. 4 for depth camouflage.

6 Conclusions and Perspectives

The paper describes a novel benchmarking framework that we set up and made publicly available in order to evaluate and compare scene background modeling methods for moving object detection on RGBD videos. The SBM-RGBD dataset is the largest RGBD video collection ever made available for this specific purpose. The 33 videos span seven categories, selected to include diverse scene background modeling challenges for moving object detection. Seven evaluation metrics, chosen among the most widely used, are adopted to evaluate the results against a wide set of pixel-wise ground truths. A preliminary analysis of results achieved by several methods investigates to what extent the various background modeling challenges pose troubles to background subtraction methods that exploit color and depth information. The proposed framework will serve as a reference for future methods aiming at overcoming these challenges.

Notes

1.
http://rgbd2017.na.icar.cnr.it/SBM-RGBDdataset.html.
2.
All the results are available at http://rgbd2017.na.icar.cnr.it/SBM-RGBDchallengeResults.html.

References

Bouwmans, T., Maddalena, L., Petrosino, A.: Scene background initialization: a taxonomy. Pattern Recogn. Lett. 96, 3–11 (2017)
Article Google Scholar
Bouwmans, T., Sobral, A., Javed, S., Jung, S.K., Zahzah, E.-H.: Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput. Sci. Rev. 23, 1–71 (2017)
Article MATH Google Scholar
Camplani, M., Salgado, L.: Background foreground segmentation with RGB-D Kinect data: an efficient combination of classifiers. J. Vis. Commun. Image Represent. 25(1), 122–136 (2014)
Article Google Scholar
De Gregorio, M., Giordano, M.: CwisarDH$^{+}$: Background Detection in RGBD Videos by Learning. In: Battiato, S., Gallo, G., Farinella, G., Leo, M. (eds.) ICIAP 2017. LNCS, vol. 10590, pp. 242–253. Springer, Cham (2017)
Google Scholar
Fernandez-Sanchez, E.J., Diaz, J., Ros, E.: Background subtraction based on color and depth using active sensors. Sensors 13, 8895–8915 (2013)
Article Google Scholar
Goyette, N., Jodoin, P.M., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, pp. 1–8, June 2012
Google Scholar
Kwolek, B., Kepski, M.: Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Programs Biomed. 117(3), 489–501 (2014)
Article Google Scholar
Laugraud, B., Piérard, S., Braham, M., Van Droogenbroeck, M.: Simple median-based method for stationary background generation using background subtraction algorithms. In: Murino, V., Puppo, E., Sona, D., Cristani, M., Sansone, C. (eds.) ICIAP 2015. LNCS, vol. 9281, pp. 477–484. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23222-5_58
Chapter Google Scholar
Maddalena, L., Petrosino, A.: A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans. Image Process. 17(7), 1168–1177 (2008)
Article MathSciNet Google Scholar
Maddalena, L., Petrosino, A.: Background model initialization for static cameras. In: Bouwmans, T., Porikli, F., Hoferlin, B., Vacavant, A. (eds.) Background Modeling and Foreground Detection for Video Surveillance, pp. 3-1-3-16. Chapman & Hall/CRC (2014)
Google Scholar
Maddalena, L., Petrosino, A.: Exploiting color and depth for background subtraction. In: Battiato, S., Gallo, G., Farinella, G., Leo, M. (eds.) ICIAP 2017. LNCS, vol. 10590, pp. 254–265. Springer, Cham (2017)
Google Scholar
Minematsu, T., Shimada, A., Uchiyama, H., Taniguchi, R.: Simple combination of appearance and depth for foreground segmentation. In: Battiato, S., Gallo, G., Farinella, G., Leo, M. (eds.) ICIAP 2017. LNCS, vol. 10590, pp. 266–277. Springer, Cham (2017)
Google Scholar
Moyá-Alcover, G., Elgammal, A., Jaume-i-Capó, A., Varona, J.: Modeling depth for nonparametric foreground segmentation using RGBD devices. Pattern Recogn. Lett. 96, 76–85 (2017)
Article Google Scholar
Song, S., Xiao, J.: Tracking revisited using RGBD camera: unified benchmark and baselines. In: Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV 2013, pp. 233–240. IEEE Computer Society (2013)
Google Scholar

Download references

Acknowledgments

We would like to thank all the authors who submitted their results to the SBM-RGBD Challenge, which will serve as reference for future generation methods. L. Maddalena wishes to acknowledge the GNCS (Gruppo Nazionale di Calcolo Scientifico) and the INTEROMICS Flagship Project funded by MIUR, Italy. A. Petrosino wishes to acknowledge Project VIRTUALOG Horizon 2020-PON 2014/2020. L. Salgado wishes to acknowledge projects TEC2013-48453 (MR-UHDTV) and TEC2016-75981 (IVME) funded by the Ministerio de Economa, Industria y Competitividad (AEI/FEDER) of the Spanish Government.

Author information

Authors and Affiliations

University of Bristol, Bristol, UK
Massimo Camplani
National Research Council, Naples, Italy
Lucia Maddalena
Universitat de les Illes Balears, Palma, Spain
Gabriel Moyá Alcover
University of Naples Parthenope, Naples, Italy
Alfredo Petrosino
Universidad Politécnica de Madrid, Madrid, Spain
Luis Salgado
Universidad Autónoma de Madrid, Madrid, Spain
Luis Salgado

Authors

Massimo Camplani
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Maddalena
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Moyá Alcover
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Petrosino
View author publications
You can also search for this author in PubMed Google Scholar
Luis Salgado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucia Maddalena .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Sebastiano Battiato
University of Catania, Catania, Italy
Giovanni Maria Farinella
University of Catania, Catania, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni Gallo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Camplani, M., Maddalena, L., Moyá Alcover, G., Petrosino, A., Salgado, L. (2017). A Benchmarking Framework for Background Subtraction in RGBD Videos. In: Battiato, S., Farinella, G., Leo, M., Gallo, G. (eds) New Trends in Image Analysis and Processing – ICIAP 2017. ICIAP 2017. Lecture Notes in Computer Science(), vol 10590. Springer, Cham. https://doi.org/10.1007/978-3-319-70742-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-70742-6_21
Published: 31 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70741-9
Online ISBN: 978-3-319-70742-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Abstract

1 Introduction

2 Video Categories

3 Ground Truths

4 Metrics

5 Experimental Results

6 Conclusions and Perspectives

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation