Carried Object Detection Based on an Ensemble of Contour Exemplars

Ghadiri, Farnoosh; Bergevin, Robert; Bilodeau, Guillaume-Alexandre

doi:10.1007/978-3-319-46478-7_52

Farnoosh Ghadiri¹⁷,
Robert Bergevin¹⁷ &
Guillaume-Alexandre Bilodeau¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9911))

Included in the following conference series:

European Conference on Computer Vision

15k Accesses
3 Citations

Abstract

We study the challenging problem of detecting carried objects (CO) in surveillance videos. For this purpose, we formulate CO detection in terms of determining a person’s contour hypothesis and detecting CO by exploiting the remaining contours. A hypothesis mask for a person’s contours is generated based on an ensemble of contour exemplars of humans with different standing and walking poses. Contours that are not falling in a person’s contour hypothesis mask are considered as candidates for CO contours. Then, a region is assigned to each CO candidate contour using biased normalized cut and is scored by a weighted function of its overlap with the person’s contour hypothesis mask and segmented foreground. To detect COs from obtained candidate regions, a non-maximum suppression method is applied to eliminate the low score candidates. We detect COs without protrusion assumption from a normal silhouette as well as without any prior information about the COs. Experimental results show that our method outperforms state-of-the-art methods even if we are using fewer assumptions.

This research was supported by FRQ-NT team grant No. 2014-PR-172083.

You have full access to this open access chapter, Download conference paper PDF

Efficient silhouette-based contour tracking using local information

Article 11 December 2014

An Extensive Study on Unattended Object Detection in Video Surveillance

Partially Camouflaged Object Tracking using Modified Probabilistic Neural Network and Fuzzy Energy based Active Contour

Article 04 October 2016

Keywords

1 Introduction

Detecting objects carried by people provides a basis for smart camera surveillance systems that aim to detect suspicious events such as exchanging bags, abandoning objects, or theft. However, the problem of detecting carried objects (CO) has not yet received the attention it deserves, mainly because of the inherent complexity of the task. This is a challenging problem because people can carry a variety of objects such as a handbag, a musical instrument, or even an unusual/dangerous item like an improvised explosive device. The difficulty is particularly pronounced when objects are small or partially visible.

Despite a lot of efforts in object detection, not much work has been done to detect COs. A successful approach such as deformable part model (DPM) [8] for object detection is not directly applicable to CO detection since COs may not be easily represented as a single deformable model or a collection of deformable parts. In addition, COs do not usually appear as regions enclosed by the contours or as compact regions with distinct gray-level or colour. This makes them difficult to segment (Fig. 1 illustrates this problem). There are few works that exploit appearance-based object detection approaches to detect COs. These approaches are mostly limited to the recognition of specific objects. Other approaches [4, 13, 14] use motion information of human gait to detect CO. To detect COs, motion of an average walking unencumbered person is modeled and those motion detections not fitting in the model are selected as COs. These approaches are usually based on the assumption that COs are sufficiently large to distort the spatio-temporal structure.

To develop a more generic CO detector, prior information of human silhouette is used to help better discriminate between a person’s region and a CO. To detect irregular parts in a human silhouette, some researchers [6, 11, 17] generate a generic model of a normal human silhouette and then subtracts it from a segmented foreground. The main assumption in these approaches is that COs alter a normal silhouette. This assumption limits these approaches to detect COs that are significantly protruding from normal silhouette and to miss those that are located inside it. Moreover, these approaches are highly dependent on the precise segmentation of foreground. Therefore, they usually cannot distinguish between COs and different types of clothes or imperfections of the segmented foreground if they all cause protrusions.

In this paper, we present a framework (sketched in Fig. 2) named Ensemble of Contour Exemplars (ECE) that combine high-level information from an exemplar-based person identification method with the low-level information of segmented regions to detect COs. A person’s contour hypothesis that is learned from an ensemble of exemplars of humans is used to discriminate a person’s contours from other contours in the image. We then use low-level cues such as color and texture to assign a region to each contour that does not belong to the person’s contours. Each region is considered a candidate CO and is scored based on high-level information of foreground and person’s contours hypothesis. Then, a non maximum suppression method is applied to each region to suppress any region that is not the maximum response in its neighborhood.

Contributions: Our two main contributions are: (1) generating a person’s contour hypothesis combined with low-level information cues to detect COs. Analyzing irregularity of a person’s contours instead of human silhouettes enables our method to detect COs that are too small to alter normal human silhouette and those that are contained inside it; and (2) no prior knowledge of CO shape, location and motion is assumed. Having no assumption on the motion of the person enables our method to be applied on any single frame where a person appears instead of relying on short video sequences of a tracked person.

2 Related Work

Detecting COs can be formulated as an object detection problem. Object detection is often conducted by object proposal generation and then by classification. Zheng et al. [19] detected COs using contextual information extracted from a polar geometric structure. Extracted features are fed into a Support Vector Machine (SVM) classifier to detect two types of luggages (suitcases and bags). Considering only the appearance of COs leads to numerous false detections corresponding to the head, hands, feet, or just noise. Therefore, more works have focused on incorporating prior information about humans to facilitate the detection of the COs.

Branca et al. [3] detected pedestrians as well as two types of COs using a SVM classifier and wavelet features. When a pedestrian is localized in a frame, a sliding window with different sizes is applied around the pedestrian to find the CO. Instead of a pre-trained model for CO, Tavanai et al. [16] utilized geometric criteria (convexity and elongation) among contours to find COs in non-person region. A person’s region is obtained by applying a person detector to obtain a bounding box followed by a color-based segmentation method. By assuming that COs are protruding from a window where a person is likely to occur, the two largest segments that are obtained from the color-based segmentation method are considered as regions belonging to a person. Then, under the assumption that only a carry event is occurring, a set of detections by geometric shape models is refined by incorporating spatial relationships of probable COs with respect to the walking person.

Pedestrian motion can be modeled as made of two components: a periodic motion for a person’s limbs and a uniform motion corresponding to the head and torso. Under the assumption that COs are held steadily, their motion can also be formulated as a uniform motion. Having this information helps the CO detector to search only regions with uniform motion. The main idea of [13] is that uniform motion of people carrying objects does not fit the average motion profile of unencumbered people. Pixels of moving objects with motion that do not fit the pre-trained motion model of people without CO are grouped as carried objects. In Dondera et al. [7] method, CO candidates are generated from protrusion, color contrast and occlusion boundary cues. Protruding regions from a person’s body are obtained by a method similar to [13] to remove limbs and then generate a template of unencumbered pedestrian (urn-shaped model) with the aim of removing the head and torso. A segmentation-based color contrast detector and an occlusion boundary based moving blob detector are applied to detect other candidate COs. Each candidate region is characterized by its shape and its relation to a human silhouette (e.g. relative distance of centroid of person’s silhouette to the object center) and classified using a SVM classifier as a CO or a non-CO.

The majority of works on CO detection have combined human motion cues with prior information about the silhouette of human to detect irregular parts in the human body such as the existence of COs. Chayanurak et al. [4] detected a CO using the time series of limbs motion. In their work, a star skeleton represents the human shape. Each limb of the star is analyzed through the time series of normalized limb positions. The limbs which are motionless or which are moving with the overall human body motion are detected as limbs related to the COs. Haritaoglu et al. [9] detected COs from a short video sequence of a pedestrian (typically lasting a few seconds) by assuming that unencumbered human shape is symmetric about its body axis. Asymmetric parts are grouped into connected components as candidate CO regions. Asymmetric regions that belong to the body parts are discriminated by periodicity analysis. The work of Damen et al. [6] is based on creating a temporal template of a moving person and subtracting an exemplar temporal template (ETT) from it. The ETT is generated offline from a 3D model of a walking person and is matched against the tracked person temporal template. Protruding regions from ETT are considered as likely to be COs if they are at expected locations of COs. Prior information about CO location is learned from the occurrence of COs in the ground truth temporal template. These information and protrusion cues are combined into a Markov Random Field (MRF) framework to segment COs. Tzanidou et al. [17] follow the steps of [6] method to detect COs but utilize instead color temporal templates.

In this work, we use prior information about the human body to build a normal human model. However, the main difference is that our method relies on the person’s contours instead of his silhouette to detect irregularities with respect to the normal human model. We show that our human model can efficiently be used to find the regions that belong to COs.

3 Our Approach

The goal of our approach is to have a fully automatic system to detect COs on any frame where a person appears in the camera field of view. Using only one frame to detect COs makes the algorithm robust to events such as handling over a luggage, or a change in the person’s direction. To detect COs, we build on two sources of information. The first is the output of the person’s contours hypothesis generator. The second source of information is the output of a bottom-up object segmentation. Our contribution is to combine this information to discriminate between COs and other objects (person, background).

3.1 Building Human Models

To build human contour models and to detect COs, we first need to detect the moving regions corresponding to a person and the COs in a video. To accomplish this task, the DPM person detector [8] is applied on each frame. The intuition behind this is to find a person’s location as well as obtaining a rough estimation of his height and width for further scale analysis. Since COs can protrude from the obtained person’s bounding box, extracted foreground by a foreground extractor is used to find a second bounding box that bounds the person and the COs. The largest connected component of the extracted foreground that significantly overlap with the obtained person’s bounding box is selected as our moving object target. In the rest of the paper, we will use the term moving object to refer to the person and the CO.

Learning an Ensemble of Contour Exemplars. The output of a person detector is a window where a person is likely to occur. Thus this information is very coarse to discriminate the person’s contours from other object contours. To have a class-specific contour detection, we follow [18] to generate a hypothesis mask for the person’s contours. Our aim is to learn contours of humans dressed with various clothes with different standing or walking poses by building a codebook of local shapes. We label our training images into 8 classes corresponding to 8 possible walking directions of a person. Each class includes persons with different types of clothes and different walking poses. A foreground mask for each image is extracted by a foreground extractor. COs are removed manually if the foreground contains COs. Figure 3 shows an example of exemplars in 8 categories.

Given a training image, a person is detected as described in the previous paragraph and is scaled so that his height and width are the same as a pre-defined size. A foreground mask corresponding to a detected person is extracted and contours inside the mask are extracted by the method of [1]. The obtained contours are highly localized since the method uses multiple cues such as brightness, color and texture. However, this information is not adequate to discriminate among contours of a person, COs and background. Using information of the contours, the foreground and the person’s bounding box, we build a codebook of local shapes. The foreground mask is sampled uniformly with sampling interval sm, and for each sample, the shape context (SC) feature is extracted from contours inside the foreground mask.

Each codebook entry $ ce_i=(s_i^ {ce}, d_i^ {ce}, k_i, m_i) $ records four types of information of a sample i on the segmented foreground, where, $s_i^ {ce}$ is a Shape Context (SC) [2] feature, $d_i^ {ce}$ is a relative distance of each sample to the center of the person’s bounding box, $k_i$ is a class identification of an exemplar that i-th sample belongs to, and $m_i$ is a patch of foreground mask with the center of sample i.

Using information of relative distance of each sample to the centroid of the person, redundant codebook entries can be removed. To this end, codebook entries with similar SC features are removed if their relative distances to the centroid of a person $d_i^ {ce}$ and $d_j^ {ce}$ are close enough to each other. The closeness of $d_i^ {ce}$ and $d_j^ {ce}$ is calculated as:

$$\begin{aligned} D_{ij}=exp({-||d_i^ {ce}-d_j^ {ce}||}) \end{aligned}$$

(1)

3.2 Carried Object Detection

Given a test video frame, a moving object mv corresponding to a person and his CO is detected as explained previously. The extracted moving object is scaled based on its size, as obtained from the person detector. Then, the foreground is sampled uniformly with sampling interval sm and SC feature is extracted for each sample. Having a rough estimation of the person location by the person detector, a relative distance of samples to the centroid of person is obtained. Therefore, each sample $t_i$ of the foreground in the test frame can be expressed by its SC feature and its relative position to the person’s center as $t_i=(s_i^ {mv},d_i^ {mv})$. Using this information, a hypothesis mask for the person is generated by classifying the person into one of eight classes and then generating a hypothesis based on the obtained class as described below. Our intuition behind the person’s view classification is that a person’s contours in one view can show similar characteristics to a CO contours in another view. Therefore, to detect COs, each person’s contour should be compared with the contour exemplars with the same viewing direction.

Person’s View Classification. Each $t_i$ is compared with a codebook entry, only if their relative distances to the centroid of a person $d_i^ {ce}$ are close enough to each other. The probability of matching sample $t_i$ at location $d_i^ {mv}$ to a set of codebook entries $ ce_j$ is defined by Eq. 2.

$$\begin{aligned} \begin{aligned} P(t_i|d_i^ {mv})=\sum _{j}{exp(-{||s_j^ {ce}-s_i^ {mv}||}) P(d_i^ {mv}|d_j^ {ce}, \varSigma )},\\ \text {Where:}\quad P(d_i^ {mv}|d_j^ {ce}, \varSigma )=\dfrac{1}{2\pi \sqrt{| \varSigma |}}exp(-\dfrac{1}{2}{(d_i^ {mv}-d_j^ {ce})}^T \varSigma ^{-1}(d_i^ {mv}-d_j^ {ce})) \end{aligned} \end{aligned}$$

(2)

where the $2\times 2$ covariance matrix $\varSigma $ is diagonal and ${ \varSigma _{11}}<{ \varSigma _{22}}$ to diminish the effect of error in the calculation of person’s height with DPM. Note that all moving objects in test and training images are scaled so that the person’s height and width are the same as a pre-defined size. Therefore, each $s_i^ {mv}$ compared with the one in the training set that is located in the same area as $d_i^ {mv}$. If a match is found, the corresponding codebook entry will cast a vote to the class, which it belongs to. The class with the maximum number of votes is selected as the person’s view class.

Hypothesis Generation. Now, we can build a hypothesis mask of the person’s contours by backtracking the matching results of the person’s view class. From all codebook entries in the specific view $ce_{k}$ that are matched to a $t_i$, we choose the one with maximum matching score and select its foreground patch $m_j$ as hypothesis mask for $t_i$. Probability of the assigned patch $Patch_i$ to the sample $t_i$ is calculated by Eq. 3.

$$\begin{aligned} \begin{aligned} P(Patch_i|t_i)= \max _{j}{exp(-{||s_j^ {ce_k}-s_i^ {mv}||}) P(d_i^ {mv}|d_j^ {ce_k}, \varSigma )m_j } \end{aligned} \end{aligned}$$

(3)

We only keep the patches with probability higher than 0.8 to build a hypothesis mask. Figure 4 shows two examples of hypothesis mask for a person’s contours which the probability of each patch is between 0.8 and 1. With the information of the hypothesis mask H, we can now analyze the contours that do not fall inside this hypothesis mask H as candidate CO contours. To determine which candidate CO contours belong to each of the three categories (CO, person, background), the three following steps are applied to the candidate contours.

Step 1: Seed Points Generation. In this step, geometric information of a contour is used to obtain a rough estimation of the local shape of the object the contour belongs to. To accomplish this task, probable contours of COs are splitted at junction points. Each obtained contour is characterized by its curvature and the distance between its endpoints. We compute the curvature of a contour line by dividing its arc length to the distance between its endpoints. Only high curvature contours are kept as more informative contours for further analysis. We use points located between a contour and the line joining its endpoints as seeds of a region to which the contour can be assigned to. To this end, each open contour is closed by connecting its two endpoints. Then, the enclosed area is uniformly sampled to generate the seeds. Figure 5 shows the remaining contours obtained by subtracting hypothesis mask $H^T$ and the associated seed points.

Step 2: Assigning a Region to a Set of Seed Points. We formulate the problem of assigning a region $R_j$ to a i-th contour of candidate CO contours, as an image segmentation problem. Here, we are looking for an image segment that has sufficient overlap with our pre-computed seed points. To this end, we apply biased normalized cut (BNC) by [10] to each object candidate. BNC starts by computing the K smallest eigenvectors of the normalized graph Laplacian $\mathscr {L}_G$ where the edge weight $w_{ij}$ of the graph are obtained by the contour cue of Sect. 3.1. Eigenvectors that are well correlated with our obtained seed points $s_P$ are up-weighted using the following Equation:

$$\begin{aligned} w_i \leftarrow \frac{u_i^T D_G se}{\lambda _i-\gamma }, \text{ for } i=2,...,K \end{aligned}$$

(4)

where $u_1,u_2,..,u_K$ is the eigenvectors graph laplacian $\mathscr {L}_G$ of corresponding to the K smallest eigenvalues $\lambda _1, \lambda _2, ..., \lambda _K$. $D_G$ denotes the diagonal degree matrix of graph G. se is a seed vector and $\gamma $ controls the amount of correlation. The BNC for each set of seed points $se_j$ is the weighted combination of eigenvectors by the pre-computed weight $w_i$. Figure 6 shows the result of applying BNC with different seed points. The results of BNC for each set of seed points $se_j$ is thresholded to segment region $R_j$.

Step 3: Non Maximal Suppression (NMS). For each region $R_i$, a score value $V_i$ (calculated in Eq. 5) is obtained based on overlapping ratio of the region with both complement of hypothesis mask H and foreground mask M.

$$\begin{aligned} \begin{aligned} V_i=(1+w)\frac{R_i\cap (1-H)}{R_i}+\frac{R_i\cap M}{R_i},\\ \text {where:}\quad w=\sum \limits _{k \in (R_i\cap H)}^n(1-P(Patch_k|t_k))/n \end{aligned} \end{aligned}$$

(5)

We weight the overlapping ratio of complement of hypothesis mask and the region by multiplying it to the average probability of all samples $P(Patch_i|t_i)$ (calculated in Eq. 3) in the intersection area. If region value $V_i$ is lower than pre-defined threshold T then the region is rejected. Then a NMS method is applied to each region. In case of overlapping regions, only the one with the highest score $V_i$ is accepted as a CO. The procedure to detect COs from the regions is formulated as follows:

4 Experimental Evaluation

The images for the training set are manually gathered from three different sources: PETS 2006^{Footnote 1}, i-Lids AVSS^{Footnote 2} and INRIA pedestrian [5] datasets. INRIA dataset is composed of still images and is only used in the training to complement frames from PETS and i-Lids. This way, we are able to keep more sequences of PETS and i-Lids for testing. In each image, a person is detected by DPM and its foreground is extracted automatically for PETS and i-Lids datasets, and manually for the INRIA dataset. Since our method is not too sensitive to the extracted foreground, we can use any foreground extractor in both testing and training steps. Here, we use a foreground extractor named PAWCS [15] for both PETS and i-Lids datasets. COs are removed manually from the obtained foreground. Then each person is labeled as one of 8 classes regarding the 8 possible viewpoints. For each class, an average of 15 persons (exemplars) are selected. Around 15 additional exemplars are obtained by horizontally flipping the previously selected ones.

We evaluate our algorithm on two publicly available datasets: PETS 2006 and i-Lids AVSS. For each dataset, COs are annotated with a ground truth bounding box. A detection is evaluated as true using the intersection over union criteria (IOU). That is, if the overlap between the bounding box of the detected object $(b_d)$ and that of the groundtruth $b_{gt}$ exceeds $k\,\%$ by the Eq. 6, the detection is considered a true positive (TP). Otherwise, it is considered a false positive (FP). Source code for CO detection and annotations for i-Lids dataset are available at https://sites.google.com/site/cosdetector/home.

$$\begin{aligned} overlap(b_d,b_{gt})=\frac{b_d\cap b_{gt}}{b_d\cup b_{gt}} \end{aligned}$$

(6)

4.1 PETS 2006

PETS 2006 contains 7 scenarios of varying difficulty filmed from multiple cameras. We selected 7 sequences of PETS 2006 that use the third camera. Eighty-three ground-truth bounding boxes of COs are provided online by Damen et al. [6] for 75 individuals among 106 pedestrians. Individuals that are not in the set provided by [6] are used in the training set. Since [6] relies on a short sequences of tracked person to detect COs, a tracked clip for each person is also provided. We detect moving objects on the first frame of each short video sequences of 75 pedestrians as described in Sect. 3.1, and our CO detector is applied on the obtained moving object. Figure 7 shows the result of our method on PETS dataset. Our algorithm can detect a variety CO successfully. However, some body parts are detected, since they are not modeled by the exemplar.

To compare with [6, 16] methods, we use the results presented in their papers with overlap threshold $k=0.15$ as in [6]. This threshold value is much lower than typically used in object detection (0.5), since [6] only detects the parts of the object that protrude from the person’s body. The comparison shows (see Table 1) that we achieve a higher detection rate and a slightly better FP rate compared to [6]. Comparing our method to [6, 16] in terms of F1 score, we can see that our method outperforms them by about 10 %. It should be noted that both [6, 16] use the whole sequence to detect COs while we only use the first frame of the whole sequence and still obtain better results.

Table 1. Comparison using PETS 2006 with a 0.15 overlap threshold.

Full size table

Using an overlap threshold of 0.15 may not show the real performance of a CO detector, since it can detect large parts of a person’s body as a CO and still have a high score because the required overlap for good detection is too small. For thoroughness and to give a better idea of the performance of our method, we depict precision and recall of our algorithm as the threshold of overlap is varied in Fig. 8.

We also explore the effects of foreground extraction in terms of detection performance. Figure 9 shows the results of our method with two different foreground extractors: Based on a simple thresholding on a results of optical flow by [12] and based on background subtraction with PAWCS [15]. The results show that our algorithm is not too sensitive to the extracted foreground. This robustness to the extracted foreground comes from the fact that we assign a region to each contour and analyze the region by the amount of overlap with the extracted foreground. Although, extracting the foreground with [12] has slightly improved our results on PETS, it does not occur in general cases. In this case, some parts of the foreground where abrupt movement exist such as a person’s limbs are missing. These errors are surprisingly beneficial in some scenarios by reducing the number of false positives, which however increases the number of false negatives in other cases.

4.2 i-Lids AVSS

Since all parameters (SC size, sm, T) are only dependent on the person’s scale and all detected pedestrians are scaled to a pre-defined window size (as described in Sect. 3.1) our algorithm can be testetrolld on other datasets with the same parameters used for PETS. i-Lids AVSS 2007 consists of both indoor and outdoor surveillance videos. We use three videos recorded at a train station. Fifty-nine individuals among 88 are selected for the test, and their 68 COs are manually annotated. Individuals that are not in the test set are used for the traning set. COs in this dataset are varied and include document holders, handbags, briefcases, and trolleys. Again, we compared our method with the state of the art method of Damen et al. [6], who are providing their code online. To apply [6] on i-Lids dataset, we prepared short video sequences of our selected individuals to create spatio-temporal template. Furthermore, in each frame, the person is detected manually and its foreground is obtained using PAWCS method. Since, [6] is sensitive to the extracted foreground, we only apply PAWCS to detect more accurate foreground mask. Viewing direction of a person is selected manually, as calibration data are not provided with this dataset. Detected COs on the temporal templates are projected onto the first frame of the sequence.

Figure 10 shows the results of our method (ECE) compared with [6]. It can be seen that our method can detect COs more successfully, and the boundaries of the COs are better delimited. Figure 10(a–b) shows the ability of our algorithm to detect objects with less protrusion or contained inside the person’s body area. Figure 10(c, g) shows failure cases as result of poor person model for the person’s clothes and body parts respectively. Figure 10(d, e) shows two false negative (FN) cases as they are both identified as part of the person’s clothes.

Table 2 shows the results of our method and [6] on i-Lids Dataset with overlap threshold ($k=0.15$). Although, we achieved better results compare to [6], as discussed previously, $k=0.15$ is very low to show the real performance of the system. As shown in Fig. 10, a large detected part of a person’s body that contains a CO is counted as TP with $k=0.15$. To view the complete picture, we plot the precision and recall of our algorithm and [6] with different overlap thresholds (Fig. 11). Figure 11 justifies the results of Fig. 10 as it shows that our algorithm achieves better performance with all overlap thresholds.

Table 2. Comparison of [6] with the proposed method over i-Lids AVSS.

Full size table

5 Conclusion

We presented a framework for detecting COs in surveillance videos that integrates both local and global shape cues. Several models of a normal person’s contours are learned to build an ensemble of contour exemplars of humans. Irregularity in a normal human model is detected as COs. Our experiments indicate that learning human model from human’s contours makes the system more robust to the factors that may give rise to irregularities such as clothing, than methods that model humans based on silhouettes [6]. Using biased normalized cut to segment object combined with the high-level information of human model, provides us with a rough estimation of the CO shape. Our method can have a better estimation of CO shape than [6], and it can be used for future analysis such as recognition of the object type.

Notes

References

Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)
Article Google Scholar
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)
Article Google Scholar
Branca, A., Leo, M., Attolico, G., Distante, A.: Detection of objects carried by people. In: Proceedings of the 2002 International Conference on Image Processing, vol. 3, pp. III-317-III-320 (2002)
Google Scholar
Chayanurak, R., Cooharojananone, N., Satoh, S., Lipikorn, R.: Carried object detection using star skeleton with adaptive centroid and time series graph. In: 2010 IEEE 10th International Conference on Signal Processing (ICSP), pp. 736–739, October 2010
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893, June 2005
Google Scholar
Damen, D., Hogg, D.: Detecting carried objects from sequences of walking pedestrians. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1056–1067 (2012)
Article Google Scholar
Dondera, R., Morariu, V., Davis, L.: Learning to detect carried objects with minimal supervision. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 759–766, June 2013
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Haritaoglu, I., Cutler, R., Harwood, D., Davis, L.: Backpack: detection of people carrying objects using silhouettes. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1, pp. 102–107 (1999)
Google Scholar
Maji, S., Vishnoi, N., Malik, J.: Biased normalized cuts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2057–2064, June 2011
Google Scholar
Mitzel, D., Leibe, B.: Taking mobile multi-object tracking to the next level: people, unknown objects, and carried items. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 566–579. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33715-4_41
Google Scholar
Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: EpicFlow: edge-preserving interpolation of correspondences for optical flow. In: Computer Vision and Pattern Recognition (2015)
Google Scholar
Senst, T., Evangelio, R., Sikora, T.: Detecting people carrying objects based on an optical flow motion model. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 301–306, January 2011
Google Scholar
Senst, T., Kuhn, A., Theisel, H., Sikora, T.: Detecting people carrying objects utilizing lagrangian dynamics. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 398–403, September 2012
Google Scholar
St-Charles, P.L., Bilodeau, G.A., Bergevin, R.: A self-adjusting approach to change detection based on background word consensus. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 990–997, January 2015
Google Scholar
Tavanai, A., Sridhar, M., Gu, F., Cohn, A.G., Hogg, D.C.: Carried object detection and tracking using geometric shape models and spatio-temporal consistency. In: Chen, M., Leibe, B., Neumann, B. (eds.) ICVS 2013. LNCS, vol. 7963, pp. 223–233. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39402-7_23
Chapter Google Scholar
Tzanidou, G., Zafar, I., Edirisinghe, E.: Carried object detection in videos using color information. IEEE Trans. Inform. Forensics Secur. 8(10), 1620–1631 (2013)
Article Google Scholar
Wang, L., Shi, J., Song, G., Shen, I.: Object detection combining recognition and segmentation. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4843, pp. 189–199. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76386-4_17
Chapter Google Scholar
Zheng, W.S., Gong, S., Xiang, T.: Quantifying contextual information for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 932–939, September 2009
Google Scholar

Download references

Author information

Authors and Affiliations

LVSN-REPARTI, Université Laval, Quebec City, Canada
Farnoosh Ghadiri & Robert Bergevin
LITIV Laboratory, Polytechnique Montréal, Montreal, Canada
Guillaume-Alexandre Bilodeau

Authors

Farnoosh Ghadiri
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bergevin
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume-Alexandre Bilodeau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farnoosh Ghadiri .

Editor information

Editors and Affiliations

RWTH Aachen , Aachen, Germany
Bastian Leibe
Czech Technical University , Prague 2, Czech Republic
Jiri Matas
University of Trento , Povo - Trento, Italy
Nicu Sebe
University of Amsterdam , Amsterdam, The Netherlands
Max Welling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghadiri, F., Bergevin, R., Bilodeau, GA. (2016). Carried Object Detection Based on an Ensemble of Contour Exemplars. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9911. Springer, Cham. https://doi.org/10.1007/978-3-319-46478-7_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-46478-7_52
Published: 16 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46477-0
Online ISBN: 978-3-319-46478-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics