Fundamental Matrices from Moving Objects Using Line Motion Barcodes

Kasten, Yoni; Ben-Artzi, Gil; Peleg, Shmuel; Werman, Michael

doi:10.1007/978-3-319-46475-6_14

Yoni Kasten¹⁷,
Gil Ben-Artzi¹⁷,
Shmuel Peleg¹⁷ &
…
Michael Werman¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9906))

Included in the following conference series:

European Conference on Computer Vision

18k Accesses
5 Citations

Abstract

Computing the epipolar geometry between cameras with very different viewpoints is often very difficult. The appearance of objects can vary greatly, and it is difficult to find corresponding feature points. Prior methods searched for corresponding epipolar lines using points on the convex hull of the silhouette of a single moving object. These methods fail when the scene includes multiple moving objects. This paper extends previous work to scenes having multiple moving objects by using the “Motion Barcodes”, a temporal signature of lines. Corresponding epipolar lines have similar motion barcodes, and candidate pairs of corresponding epipoar lines are found by the similarity of their motion barcodes. As in previous methods we assume that cameras are relatively stationary and that moving objects have already been extracted using background subtraction.

You have full access to this open access chapter, Download conference paper PDF

Differential Optical Flow Estimation Under Monocular Epipolar Line Constraint

Good Edgels to Track: Beating the Aperture Problem with Epipolar Geometry

Robust and Efficient Camera Motion Synchronization via Matrix Decomposition

Keywords

1 Introduction

1.1 Related Work

Calibrating a network of cameras is typically carried out by finding corresponding points between views. Finding such correspondences often fails when the cameras have very different viewpoints, since objects and background do not look similar across these views. Previous approaches to solve this problem utilized points on convex hull of the silhouette of a moving foreground object.

Sinha and Pollefeys [1] used silhouettes to calibrate a network of cameras, assuming a single moving silhouette in a video. Each RANSAC iteration takes a different frame and samples two pairs of corresponding tangent lines to the convex hull of the silhouette [2]. The intersection of each pair of lines proposes an epipole.

Ben-Artzi et al. [3] proposed an efficient way to accelerate Sinha’s method. Generalizing the concept of Motion Barcodes [4] to lines, they proposed using the best pair of matching tangent lines between the two views from each frame. The quality of line correspondence was determined by the normalized cross correlation of the Motion Barcodes of the lines.

Both methods above [1, 3] fail when there are multiple moving objects in the scene, as they are based on the convex hull of all the moving objects in the image. In the example shown in Fig. 1, objects that appear only in one of the cameras have a destructive effect on the convex hull. Our current paper presents an approach that does not use the convex hull, and can be used with videos having multiple moving objects.

In other related work, Meingast et al. [5] computed essential matrices between each pair of cameras from image trajectories of moving objects. They used the image centroids of the objects as corresponding points. However, since for most objects and most different views the centroids do not represent the same 3D point, this computation is error prone.

Other methods assumed that the objects are moving on a plane [6], or assume that the objects are people walking on a plane, for both single camera calibration [7, 8] and two camera calibration [9].

1.2 Motion Barcodes of Lines

We address the case of synchronized stationary cameras viewing a scene with moving objects. Following background subtraction [10] we obtain a binary video, where “0” represents static background and “1” moving objects.

Given a video of N binary frames, the Motion Barcode of a given image line l [3] is a binary vector $b_l$ in $\{0,1\}^N$. $b_l(i)=1$ iff a silhouette, pixel with value 1, of a foreground object is incident to line l in the $i^{th}$ frame. An example of a Motion Barcode is shown in Fig. 2.

The case of a moving object seen by two cameras is illustrated in Fig. 3. If the object intersects the epipolar plane $\pi $ at frame i, and does not intersect the plane $\pi $ at frame j, both Motion Barcodes of lines l and $l'$ will be 1, 0 at frames i, j respectively. Corresponding epipolar lines therefore have highly correlated Motion Barcodes.

1.3 Similarity Score Between Two Motion Barcodes

It was suggested in [4] that a good similarity measure between motion barcodes b and $b'$ is their normalized cross correlation.

$$\begin{aligned} corr(b,b')=\sum _{i=1}^{N}{\frac{(b(i)-mean(b))\cdot (b'(i)-mean(b'))}{{||b-mean(b) ||}_2 {||b'-mean(b')||}_2}} \end{aligned}$$

(1)

1.4 Overall Structure of the Paper

Our camera calibration approach includes two steps. The first step, described in Sect. 2, finds candidates for corresponding epipolar lines between two cameras. The second step, Sect. 3, describes how a fundamental matrix between these two cameras is computed from those candidates. Section 4 presents our experiments.

2 Corresponding Epipolar Lines Candidates

Given 2 synchronized videos recorded by a pair of stationary cameras, A and B, we want to compute their Fundamental Matrix F. The Fundamental Matrix F satisfies for each pair of corresponding points, $x \in A$ and $x' \in B$:

$$\begin{aligned} {x'}^TFx=0. \end{aligned}$$

(2)

The F matrix maps each point $x \in A$ to an epipolar line $l'=Fx$ in B so that the point $x'$ is on the line $l'$. Any point in image B that lies on the line $l'$ including $x'$ is transformed to a line $l=F^{T}x'$ such that x is on the line l. l and $l'$ are corresponding epipolar lines. F can be computed from points correspondences or from epipolar line correspondences [11].

In previous methods [1, 3] the convex hull of the silhouette of a moving object was used to search for corresponding epipolar lines.

Our proposed process to find candidates for corresponding epipolar lines does not use the silhouette of a moving object, and can therefore be applied also in cases of multiple moving objects.

Given a video, lines are selected by sampling pairs of points on the border of the image and connecting them. For each line, the Motion Barcode is computed. We continue only with informative lines, i.e. lines having enough zeros and ones in their motion barcode.

Given two cameras A and B, Motion Barcodes are generated for all selected lines in each camera, resulting $n_1$ vectors in $\{0,1\}^N$ for Camera A, and $n_2$ vectors in $\{0,1\}^N$ for Camera B, where N is the number of frames in the video.

The $n_1 \times n_2$ correlation matrix of the barcodes of the lines selected from Camera A with the lines selected from Camera B is computed using Eq. 1. The goal is to find corresponding line pairs from Camera A and Camera B. 1, 000 line pairs are selected using the correlation matrix as follows. For visual results see Fig. 4.

If the correlation of a pair of lines is in the mutual top 3 of each other, i.e. top 3 in both row and column, it is considered a candidate.
The 1,000 candidate pairs with the highest correlations are taken as corresponding epipolar lines candidates.

3 Fundamental Matrix from Corresponding Lines

Given a set of candidate corresponding pairs of epipolar lines between cameras A and B, our goal is to find the fundamental matrix F between the cameras.

Experimentally, of the 1000 candidates for corresponding epipolar lines described in Sect. 2, about half are correct. As not all of our candidates are real correspondences the algorithm continues using a RANSAC approach.

In each RANSAC trial, two pairs of candidate corresponding epipolar lines are selected. This gives two candidates for epipolar lines in each camera, and the epipole candidate for this camera is the intersection of these two epipolar lines. Next, an additional pair of corresponding epipolar lines is found from lines incident to these epipoles. The homography H between corresponding epipolar lines is computed from these three pairs of epipolar lines. This is described in detail in Sect. 3.1.

The proposed homography H gets a consistency score depending on the number of inliers that H transformed successfully as described in Sect. 3.2.

Given the homography H, and the epipole $e'$ in B, the fundamental matrix F is [11]:

$$\begin{aligned} F=[e']_x H^{-T} \end{aligned}$$

(3)

3.1 Computing the Epipolar Line Homography

We compute the Epipolar Line Homography using RANSAC. We sample pairs of corresponding epipolar line candidates with a probability proportional to the correlation of their Motion Barcodes as in Eq. 1. Given 2 sampled pairs $(l_1, l_1')$ and $(l_2, l_2')$, corresponding epipole candidates are computed by: $e = l_1 \times l_2$ in Camera A, and $e' = l_1' \times l_2'$ in Camera B. Given e and $e'$, the distances from these epipoles of each of the 1,000 candidate pairs is computed. A third pair of corresponding epipolar line candidates, $(l_3,l_3')$, is selected based on this distance:

$$\begin{aligned} (l_3,l_3')={\arg \min }\underset{(l_i,l_i')\in \{candidates\}\setminus \{(l_1,l_1'),(l_2,l_2')\}}{d(l_i,e)+d(l_i',e')} \end{aligned}$$

(4)

The homography H between the epipolar pencils is calculated by the homography DLT algorithm [11], using the 3 proposed pairs of corresponding epipolar lines.

3.2 Consistency Measure of Proposed Homography

Given the homography H, a consistency measure with all epipolar line candidates is calculated. This is done for each corresponding candidate pair $(l,l')$ by comparing the similarity between $l'$ and $\tilde{l'}= H l$. A perfect consistency should give $l' \cong \tilde{l'}$.

Each candidate line l in A is transformed to B using the homography H giving $\tilde{l'}= H l$. We measure the similarity in B between $l'$ and $\tilde{l'}$ as the area between the lines (illustrated in Fig. 5).

The candidate pair $(l, l')$ is considered an inlier relative to the homography H if the area between $l'$ and $\tilde{l'}$ is smaller than a predefined threshold. In the experiments in Sect. 4 this threshold was taken to be 3 pixels times the width of the image. The consistency score of H is the number of inliers among all candidate lines.

4 Experiments

We tested our method on both synthetic and real video sequences. We created two synthetic datasets: cubes and thin cubes, using the Model Renderer that was developed by Assif and Hassner and was used in [12]. Each Cube dataset contains multiple views of a synthetic scene with flying cubes, while thin cubes dataset has smaller cubes. Background subtraction is done automatically using the tool. As a real dataset we used PETS2009 [13], using [10] for background subtraction. All datasets have 800 synchronized video frames, recorded by multiple cameras.

These datasets cannot be calibrated using matching of image features (e.g. SIFT), since there is no overlapping background between views. The datasets cannot be calibrated by [1, 3] since they have multiple objects, causing problems with the convex hull. The cubes datasets can not be calibrated by [6–9] since the assumption of planar motion does not hold.

The approach described in Sect. 2 was applied to each pair of cameras from each dataset. Initial lines were generated by uniformly sampling two points on the border of Cameras A and B, where Every two points sampled define a line passing through the image.

4.1 Consistency of the Epipolar Line Pairs Candidates

Using the algorithm from Sect. 2, 1,000 pairs of corresponding epipolar lines candidates were generated.

The simplest distance measure between a candidate line to a true epipolar line is the distance of the candidate line from the true epipole. But this distance does not take into account the distance of the epipole from the image, and is inappropriate for epipoles that are far from the image. Instead, we measure the image area between the candidate line and a true epipolar line: the epipolar line going through the midpoint of the candidate line. This distance measure is illustrated in Fig. 5.

If this area is smaller than 3 times image length then the candidate line is considered a true positive. We call a pair of corresponding epipolar lines true if both lines are true. For each pair of cameras in each dataset we measured the true positives rate from all the 1,000 candidates, after removing the lines of motion (Sect. 4.2). The average rate of true positives from each dataset is as follows: thin cubes: $67.8\,\%$, cubes: $71.67\,\%$ and pets2009: $37.81\,\%$.

4.2 Multiple Objects Moving Same Straight Path

In some cases, e.g. busy roads, many objects in the scene may move in the same straight line. The projection of those lines on the video frames are straight lines. This may result in a high correspondence between two non epipolar lines, as both will have similar Motion Barcodes. To overcome this problem, we create a motion heat map by summing all binary motion frames of each video to a single image. Candidate epipolar lines pairs, where both substantially overlap lines with heavy traffic in their corresponding images, are removed from consideration. See Fig. 6.

4.3 Finding Fundamental Matrices

After completing the procedure in Sect. 2, we have 1,000 candidates for corresponding epipolar lines for each pair of cameras. Erroneous candidate lines, created by multiple objects moving in straight lines, are filtered out using the procedure in Sect. 4.2.

Given the candidates for corresponding epipolar lines, we perform up to 10,000 RANSAC iterations to compute the fundamental matrix according to Sect. 3. The quality of a generated fundamental matrix is determined by the number of inliers among all candidate pairs (See Sect. 3.2). We used the inlier threshold of 3 times the length of the image for all the datasets.

We checked our method on the 3 datasets which contains 37 pairs of cameras. Except for one camera pair in Pets2009 dataset, all fundamental matrices were found with high accuracy. One fundamental matrix for pair of cameras in Pets2009 could not be reproduced using our method. The reason is that all people in the scene are moving along one straight line which happens to be projected to the corresponding epipolar lines in the images of the cameras. Although we get a high correlation between lines that are close to one of the epipolar lines, since the objects barely cross the epipolar lines, there is no pencil of corresponding true epipolar lines in the image, essentially there is only one epipolar line pair which doesn’t allow finding the true fundamental matrix.

For each resulting F we checked its accuracy compared to the ground truth $F_{truth}$. The accuracy was measured using Symmetric Epipolar Distance [11]. By generating ground truth corresponding points using $F_{truth}$, the Symmetric Epipolar error of the resulting F, is measured on those points. Table 1 shows the results for the datasets that were tested. The table shows for each dataset, the average symmetric epipolar distance of all camera pairs, and how many camera pairs converged.

Table 1. Average symmetric epipolar distances for each dataset

Full size table

5 Conclusions

A method has been presented to calibrate two cameras having a very different viewpoints. The method has the following steps:

Given two pairs of corresponding points, they are used to efficiently find candidates pairs of corresponding epipolar lines.
Using a RANSAC process, three corresponding pairs of epipolar lines are selected, and the fundamental matrix is computed.
A method to evaluate the quality of the fundamental matrix has also been proposed.

The proposed method is very accurate.

This method can be applied to cases where other methods fail, such as two cameras with very different viewpoints observing a scene with multiple moving objects. (i) Point matching will fail as the appearance can be very different from very different viewpoints. (ii) Silhouette methods will work on very different viewpoints, but only if they include a single object.

References

Sinha, S.N., Pollefeys, M.: Camera network calibration and synchronization from silhouettes in archived video. IJCV 87(3), 266–283 (2010)
Article Google Scholar
Cipolla, R., Giblin, P.: Visual Motion of Curves and Surfaces. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Ben-Artzi, G., Kasten, Y., Peleg, S., Werman, M.: Camera calibration from dynamic silhouettes using motion barcodes (2015). arXiv preprint arXiv:1506.07866
Ben-Artzi, G., Werman, M., Peleg, S.: Event retrieval using motion barcodes. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 2621–2625. IEEE (2015)
Google Scholar
Meingast, M., Oh, S., Sastry, S.: Automatic camera network localization using object image tracks. In: 2007 IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)
Google Scholar
Stein, G.P.: Tracking from multiple view points: self-calibration of space and time. In: 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE (1999)
Google Scholar
Krahnstoever, N., Mendonca, P.R.: Bayesian autocalibration for surveillance. In: 2005 Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 2, pp. 1858–1865. IEEE (2005)
Google Scholar
Lv, F., Zhao, T., Nevatia, R.: Camera calibration from video of a walking human. IEEE Trans. Pattern Anal. Mach. Intell. 9, 1513–1518 (2006)
Google Scholar
Chen, T., Bimbo, A.D., Pernici, F., Serra, G.: Accurate self-calibration of two cameras by observations of a moving person on a ground plane. In: 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, pp. 129–134. IEEE (2007)
Google Scholar
Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1337–1342 (2003)
Article Google Scholar
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision (2003)
Google Scholar
Hassner, T.: Viewing real-world faces in 3D. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3607–3614 (2013)
Google Scholar
Pets-2009: Data set (2009). http://www.cvg.reading.ac.uk/pets2009/a.html

Download references

Acknowledgment

This research was supported by Google, by Intel ICRI-CI, by DFG, and by the Israel Science Foundation.

Author information

Authors and Affiliations

School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Yoni Kasten, Gil Ben-Artzi, Shmuel Peleg & Michael Werman

Authors

Yoni Kasten
View author publications
You can also search for this author in PubMed Google Scholar
Gil Ben-Artzi
View author publications
You can also search for this author in PubMed Google Scholar
Shmuel Peleg
View author publications
You can also search for this author in PubMed Google Scholar
Michael Werman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoni Kasten .

Editor information

Editors and Affiliations

RWTH Aachen, Aachen, Germany
Bastian Leibe
Czech Technical University, Prague 2, Czech Republic
Jiri Matas
University of Trento, Povo - Trento, Italy
Nicu Sebe
University of Amsterdam, Amsterdam, The Netherlands
Max Welling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kasten, Y., Ben-Artzi, G., Peleg, S., Werman, M. (2016). Fundamental Matrices from Moving Objects Using Line Motion Barcodes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9906. Springer, Cham. https://doi.org/10.1007/978-3-319-46475-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-46475-6_14
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46474-9
Online ISBN: 978-3-319-46475-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us