Keywords

1 Introduction

A line matching method comprises a set of algorithms which put in correspondence segments across different images showing common environment, elements, or regions of interest. A 3D reconstruction, abstraction or spatial sketch based on line correspondences is an estimation for the position of singular primitives captured in several images, relative to the position of the camera that captured them. The proposed approach is framed in the group of line matching methods aimed for 3D reconstruction from pictures of objects built by humans, buildings, urban structures, industrial elements or computer generated models.

The vast majority of the current approaches for feature matching are based on 2D appearance. The point-based algorithms are the most common, including steerable filters [3], moment invariants [12], SIFT [9], and more recently KAZE [1]. In order for these Structure-From-Motion (SfM) pipelines to generate initial estimations for the location of points in 3D space, the feature points in correspondence are triangulated [4].

The logical evolution of the environment abstraction from multiple views is to incorporate line segments. This addition for SfM provides geometrical information independently of described points. Beside, coplanar line primitives can be intersected to further reveal observed information.

Our proposed line matching algorithm is aimed for application altogether with 3D line based abstraction, and it takes advantage of a segment detection method using the Gaussian scale-space, an iterative voting algorithm running in groups of lines with the same structural distribution [10], and the robustness of an outliers rejection algorithm that uses 3D structures to discriminate potential outliers. The main hypothesis is that an outliers detection algorithm based on coplanar line intersections can improve the result of line matching algorithms. The inputs for the segment matching method are both the images and the intrinsic parameters of the camera, being the output the relation of matched lines among the images.

1.1 Related Work Based on Lines

A group of proposed solutions are exclusively based on the exploitation of 2D observations on the images. MLSD [15] encode a SIFT-like description of the different regions of a line into a description matrix. This method firstly avoids overlapped regions within the detections. In order to put each segment in correspondence in other images, the algorithm compares the mean and standard deviation of the gradient of the pixel values stored in the columns of the description matrix. LBD [17] improved MLSD by adding geometric constraints and an outliers topological filter. The method LJL [7] features another way to pick matching candidates, by finding detected line touching each other in one of their endpoints, and referred as junctions. Secondly, it compares intensity changes along the segments, the angle that the pair of segments is drawing, and the neighboring line junctions. Finally, sole lines that are observed close to a junction are grouped and become matching candidates to the segments detected close to the junction counterpart on other images. [16] uses appearance and structure for hypothesis generation, but adding feature point descriptor SIFT for outliers detection and removal. Another approach to match images pairwisely is by drawing convex hull around groups of close segments, and exploiting affine invariants in the hull [10]. The ratios of the areas of triangles drawn inside the hull are compared. This method has been improved for the proposed paper.

Another group of methods is employing homography constraints in order to obtain the relations. The method LJL [7] was evolved to VJ [8] by adding homography constraints from the intersections of the elongations of closely located pairs of line segments. The method LPI [2] shows how to team line detection with feature point detector and descriptor. It exploits the line-points affine invariants analogously to the methods of the first group, and takes advantage of the projective homography invariants by using four feature points beside the line. A recent evolution of this method is CLPI [5], that construct the line-points projective invariant on the intersections of coplanar lines. The drawback is that for both exploitations they had to suppose that all points and the line are coplanar, and often the lines resemble the limits of two planar surfaces.

The proposed outliers removal extension employs line based SfM to group lines according to their coplanarity. The method is described in Sect. 2. Section 3 exposes the quantitative comparison altogether with several other state of the art line matching methods. Section 4 goes through the conclusions.

Fig. 1.
figure 1

Process of line matching

2 Description of the Outlier Detection Method

The employed line matching algorithm is the same used in [10]. A matching outlier is a line whose counterpart in the other image does not correspond to the same human perceived segment. The purpose of the outlier detection algorithm is to double-check the final set of line correspondences by performing geometric relations among their mutual intersections. A line crossing several intersections in different order than its counterpart is prone to be a matching outlier. The output of the algorithm is a set of line correspondences flagged as not trustworthy. The integration of the outlier detection in the line matching method is depicted in Fig. 1, and the outlier detection is shown in the lowest section of the figure.

Matched segments are extended to intersect neighbors within the image boundaries. This intersection and its counterpart in the other image are stored if they are apart from each segment a distance shorter than two times the length of the shortest originating segment, and the inner angle drawn by the intersecting lines is greater than \(\pi /6\). These requirements are implemented because the location of intersections will carry the uncertainty just in the direction of both crossing lines.

The camera matrix \(\varvec{K}\) is provided in this problem, nevertheless the camera poses \(\mathcal {P}=\{\varvec{P}^i,\varvec{P}^j\}\) are unknown. We have chosen to estimate them from a SfM pipeline based on KAZE [1] features, through the segments endpoints can be used for the same task. The Essential Matrix \(\mathbf E \) is estimated by using the Five-Point Algorithm [13]. Having \(\mathbf E =\varvec{R}[\varvec{t}]_\times \) and the set of 3D points \(\mathcal {Y}\), the relative camera rotation and translation among the first pair of cameras \(\varvec{P}^j=\varvec{K}\times [\varvec{R}|\varvec{t}]\) are estimated by using cheirality check [13] and discarding the triangulated points of \(\mathcal {Y}\) that are not in front of the cameras.

The goal of the final stage is to divide the set of intersections and their counterparts on the other image \(\mathcal {I}=\{\mathcal {I}^i,\mathcal {I}^j\}\) into groups according to their coplanarity: \(\mathcal {I}^i=\{\mathcal {I}_1^i,\mathcal {I}_2^i,\cdots \,\mathcal {I}_V^i\}\) being V the total number of planes that were fitted taking 10 or more 3D estimations for the intersections. These new points obtained by intersecting lines are projected into space from the cameras, then the obtained 3D points are fit to planes. RANSAC is employed for the generation of hypothetical groups of these 3D points. A minimum threshold of 10 points is required to resemble a valid plane, and the 3D points that are not related to any plane after 100 iterations will be discarded for the rest of the algorithm. Therefore, each fitted plane gets related to a group of known corresponding intersections. Within each group, a search for neighboring coplanar intersections is performed by a k-Nearest-Neighbors (kNN) algorithm, as shown in the main loop of Algorithm 1. The relative 2D position of the intersections within these groups are checked and compared on both images. The output is a subset of \(\mathcal {Y}\) comprised by the intersections that are most likely to be outliers. Any matched segment that originates four or more suspicious intersections is quarantined, as written in the last condition of the Algorithm 1.

figure a

3 Experimental Results

The goal of this section is to quantitatively evaluate the proposed method with the outliers rejection algorithm, against public datasets of pairs of images, altogether with other state-of-the-art line matching method. These public datasets are selected looking for a fair compromise of scenes with and without texture, transformations that include camera translation, moderate global rotations, and changes in illumination conditions. The dataset “Castle” comprises the pictures {0, 1} of the dataset [11]. It features a viewpoint change with a camera rotation, unveiling repetitive structures that can be tricky to identify. It was chosen in order to evaluate the structural cohesion of the line neighborhoods. The rest of datasets were obtained from [6]. The pairs “Low Texture” and “Textureless corridor” portrait a complicated classical interior of a building, featuring few observed long segments, and are selected to evaluate the resilience of the method to an absence of texture information.“Outdoor light” and “Leuven” are included to test changes of illumination in two different scenarios. Finally, “Drawer” combine a change of light exposition with a viewpoint change, and the scene feature repetitive similar line patterns. The proposed method is quantitatively compared against the state-of-the-art methods addressed in the introduction of this paper CLPI [5], LBD [17] and LJL [7]. The implementations for all the methods are provided by their respective corresponding authors, and its applicability is restricted to pairs of views. These are executed by a notebook with Intel i7 3720QM Quad-Core and 16 GB DDR3. The average results are shown in Table 1.

The Ground Truth evaluation of the methods adopted an approach similar to [14]: A line is marked as correct match if located 5 or less pixels apart from the human-perceived line in the orthogonal direction, and if the difference in line direction respect to the Ground Truth match of the counterpart is less than \(5^\circ \) of rotation. It has to be noted that for short atomic segments it is difficult to assess if the angle is correct compared to the Ground Truth. Despite the ratio of matching inliers compared to Ground Truth brings up meaningful information of the performance of each method, it is not possible to extract an unique global score to compare methods as a whole, as some can perform better in specific scenarios, and the characteristics of the extracted corresponding structures of lines will vary from one method to another.

The results show that the proposed method extracts line correspondences featuring longer, less fragmented lines that the competition. In addition these segments of are more similar in length to their respective counterparts compared with the results of other methods. Full length and non-fragmented matched lines profit when the method is applied to line-based 3D reconstruction from three or more images. Therefore, an average segment length has been extracted from all the results, and shown in Table 1. Besides this data, it is shown the total number of pixels covered by the matched segments on one image. Another measure that is crucial for the success of spacial reconstructions is the similarity between features in correspondence. This measure is valuable if the zoom global transformation is not featured in the image datasets, like in the ones included in this study. It is computed as the absolute value of the difference of lengths of the lines in correspondence, divided by the length of the longer segment. A better score is given to a result if both segments in correspondence are of similar length. This mark penalises correspondences of atomic short segments, as they return poor geometric information of the scene. The best average matched segment length is obtained by the proposed method, with an average of 105.1 pixels. It is distantly followed by the other methods, with resulting average lengths of less than two thirds the number of pixels covered by the segments put in correspondence by the presented method. This proposed method also returns the best average dissimilarity score of 0.14. This result shows a high advantage compared to the other methods in this mixed comparative, because the second on the line is CLPI [5] with an average dissimilarity score of 0.23. The last column in the Table 1 shows the processing times in seconds for each specific method on the evaluated dataset. The highest processing times were taken by LJL [7].

The main measure in the comparison is the number of correct correspondences. The overall score was computed as the ratio of inliers. The method CLPI [5] failed to return any correspondence from pairs of images featuring low texture and repetitive patterns. The method LJL [7] was the second on the line, just with downs in the images that present the segments more isolated. On the other hand, LBD [17] performed poorly in almost all the scenarios, showing a lackluster understanding of the structure cohesion. The goal of the proposed matching method is to obtain a result that serve as input for a 3D reconstruction pipeline based on straight lines. Therefore, it is critical that every human perceived line segment is represented by one sole complete entity. The most severe fragmentation was observed on the results of the methods LBD [17] and LJL [7].

Table 1. Quantitative comparison. Line matching accuracy and processing times.

The candidate outliers detection algorithm, based on coplanar line intersections, has been evaluated separately. Two examples of the matching outliers detection are shown in Fig. 2. The first row of pictures is showing the matching results. The second row shows the suspicious line intersections. Lines crossing this point are flagged as possible outliers. These flagged segments are marked in the third row of pictures.

On the left hand side example, the outliers detection algorithm extracts the most noticeable segment correspondence outlier which is surrounded by lines visible on both images. There are other outliers on the figure, but the structural context is not enough the minimum number of neighbor intersections. On the right hand side of the figure it can be seen that from 13 suspicious line matches that are indicated, just 4 are actual correct matches, and 9 are real outliers.

The outliers detection algorithm is only applicable when there is a noticeable change of viewpoint among both images. Therefore results have been extracted for both datasets featuring perspective change. These results shown on Table 2 brings up to the validity of the proposed outliers detection method. The addition of the outliers detection improves the results on both datasets, without increasing the processing times.

Fig. 2.
figure 2

Examples of the search for outliers using coplanar neighbor intersections.

Table 2. Average line matching accuracy and processing times from the results.

4 Conclusions

The present paper proposes a method for multi-view matching of straight segments. The search for line counterparts starts by describing individual segments by the appearance properties and creating geometrical relations among groups of segments. The method has been quantitatively compared altogether with three different state-of-the-art methods, against public image datasets. The chosen images feature different man-made scenarios, including low texture, high texture with complex structures, changes of illumination, global rotations, camera viewpoint change, and a change of scale. The presented results show that the proposed method outperforms the competition against the segment matching inlier ratio, by returning longer and more structurally meaningful straight segment relations, and featuring noticeably higher similarity between the length of each segment and its counterpart’s.

An outliers detection algorithm has been proposed to team with the method. It is rooted on the hypothesis that geometric relations between coplanar line intersections unveil inconsistencies in the resulting sets of correspondences. It has been proved advantageous by reducing the ratio of outliers in two datasets. Future work might extend the line matching to three and more views. In this case, the outliers detection would evolve to employ 3D planes obtained from advanced SfM based algorithms.