Keywords

1 Introduction

The motion averaging problem, also called “multiple rotation averaging” when dealing with 3D rotations or “pose-graph inference” when applied to camera poses, has been studied for more than fifteen years [6, 14, 1619, 24, 31, 32] and is still a very active area of research [4, 5, 7, 8, 10, 12, 20, 29]. This generic problem arises in a large number of applications, such as video mosaicking [6, 24], reconstruction of 3D scenes [10, 27] or visual SLAM [13, 14], where only the considered group of transformations changes: \(SE{\left( 3\right) }\) for 3D euclidean motions, \(SL{\left( 3\right) }\) for homographies, \(Sim{\left( 3\right) }\) for 3D similarities. In fact, in all these applications, the task consists in estimating absolute transformations, between a “world” coordinate system and local coordinate systems, given noisy measurements corresponding to relative transformations between pairs of local coordinate systems.

The noisy relative transformation measurements are usually obtained by processing a video stream, coming from an RGB or RGB-D camera, with two different modules:

  • a visual odometry module that continuously computes the transformation between the current and the previous local coordinate system of the camera;

  • a loop closure module that detects when the camera comes back in a previously visited area and computes a relative transformation.

The odometry measurements and loop closure measurements are essentially of the same nature, however, in practice the loop closure module might produce erroneous measurements because of some perceptual aliasing (two different places can be very similar), while the visual odometry module usually produces outlier free measurements.

Since the input data is a video stream, most of the applications require an online estimation of the absolute transformations. However, the majority of the state of the art approaches do not take that constraint into account in their initial specifications. They usually design a batch algorithm which is applied each time a new measurement is received, using a generic optimization toolbox such as GTSAM [11], g2o [21] or Google Ceres Solver [2]. These toolboxes are highly optimized and able to provide an online estimation with a reasonable computational time for small or medium sized problems, nevertheless, their computational time becomes prohibitive for large scale problems (see [12]).

Fig. 1.
figure 1

Results for monocular visual SLAM (\(Sim{\left( 3\right) }\)) on sequence KITTI 13. The ground truth is not available for that sequence. Thus, we reported the best result obtained using a Lidar [33].

The purpose of this paper is to present a novel approach specifically designed to operate online on large scale problems.

Requirements and Contributions Besides trying to be as accurate as possible, an algorithm dedicated to online motion averaging for large scale problems should also satisfy the following specifications:

  1. (1)

    Computational efficiency: As it was recently pointed out in [12], minimizing a criterion involving all the past measurements each time a new measurement is received is not suitable for the problem we consider. In order to achieve a low computational time, it is compulsory to perform filtering, i.e. to process the measurements one by one;

  2. (2)

    Memory efficiency: Nevertheless, to perform filtering is not sufficient to obtain an efficient algorithm. For instance, applying a Kalman filter, as proposed in [5], leads to maintaining a covariance matrix whose size grows quadratically with the number of absolute transformations. Hence, such a filter becomes impractical for large scale problems. One way to get a filter able to deal with large scale problems is to seek to continuously approximate the posterior distribution of the estimated transformations, such that the number of parameters of that distribution grows at most linearly with the number of estimated transformations;

  3. (3)

    Robustness: Finally, dealing with large scale problems increases the risk of perceptual aliasing and consequently the number of wrong loop closures. Hence, our approach should also be able to detect and remove wrong loop closures.

As we will see, taking into account the constraints previously described will lead us to considering mathematical tools, such as variational Bayesian approximations, that have not been applied to motion averaging yet. Using these tools, we show that it is possible to obtain a highly efficient and robust online motion averaging algorithm that significantly outperforms the state of the art algorithm [12] (see Fig. 1).

Outline of the paper: The rest of the paper is organized as follows: The mathematical notations and models are presented in Sect. 2. In Sect. 3 we discuss work related to our novel approach. Section 4 deals with the specific case of motion averaging from odometry measurements and a single loop closure. Based on the analysis performed in Sect. 4, we derive a novel motion averaging algorithm in Sect. 5 which is evaluated experimentally in Sect. 6. Finally, a conclusion and future work directions are provided in Sect. 7.

2 Models and Notations

Let us now introduce the notations and mathematical models that are used throughout the paper.

2.1 Lie Group Notations

The theory we develop in the paper can be applied to any matrix Lie group (typically \(SE{\left( 3\right) }\), \(SL{\left( 3\right) }\), \(Sim{\left( 3\right) }\), etc.), which turns out to be very convenient in practice since it allows to apply our algorithm to various applications (see Sect. 6). For a detailed description of Lie groups the reader is referred to [9]. Throughout the paper, we will use the following notations: \(G\subset \mathbb {R}^{n\times n}\) is a matrix Lie group of intrinsic dimension p (i.e. \(p=6\) if \(G=SE{\left( 3\right) }\subset \mathbb {R}^{4\times 4}\), \(p=8\) if \(G=SL{\left( 3\right) }\subset \mathbb {R}^{3\times 3}\), etc.); \(\text {exp}_{G}^{\wedge }\left( \cdot \right) :\mathbb {R}^{p}\rightarrow G\) and \(\text {log}_{G}^{\vee }\left( \cdot \right) :G\rightarrow \mathbb {R}^{p}\) correspond to the exponential and logarithm maps of G respectively; \(T_{ij}\in G\) is a matrix representing the transformation from the coordinate system j to the coordinate system i, thus in our notations \(T_{ij}T_{jk}=T_{ik}\) and \(T_{ij}^{-1}=T_{ji}\).

Another important operator that we will employ is the adjoint representation of G, \(\text {Ad}_{G}\left( \cdot \right) :G\rightarrow \mathbb {R}^{p\times p}\), which allows to transport an element \(\delta _{ij}\in \mathbb {R}^{p}\), acting initially on \(T_{ij}\) through left multiplication, onto the right side of \(T_{ij}\) such that \(\text {exp}_{G}^{\wedge }\left( \delta _{ij}\right) T_{ij}=T_{ij}\text {exp}_{G}^{\wedge }\left( \text {Ad}_{G}\left( T_{ji}\right) \delta _{ij}\right) .\) Finally, we introduce the notation for a Gaussian distribution on G:

$$\begin{aligned} \mathcal {N}_{G}\left( T_{ij};\overline{T}_{ij},P_{ij}\right) \propto e^{-\frac{1}{2}\left\| \text {log}_{G}^{\vee }\left( T_{ij}\overline{T}_{ij}^{-1}\right) \right\| _{P_{ij}}^{2}}\Longleftrightarrow \begin{array}{c} T_{ij}=\text {exp}_{G}^{\wedge }\left( \epsilon _{ij}\right) \overline{T}_{ij}\\ \text {where }\epsilon _{ij}\sim \mathcal {N}_{\mathbb {R}^{p}}\left( \epsilon _{ij};\varvec{0},P_{ij}\right) \end{array}\text {} \end{aligned}$$
(1)

where \(\left\| \cdot \right\| _{\cdot }^{2}\) stands for the squared Mahalanobis distance while \(\overline{T}_{ij}\) and \(P_{ij}\) are the mean and the covariance of the random variable \(T_{ij}\) respectively.

2.2 Measurement Models

In order to tackle the motion averaging problem, two different parametrizations of the absolute transformations are commonly used: the relative parametrization and the absolute parametrization. Of course, different parametrizations lead to different measurement models and consequently to algorithms having different computational complexities and posterior distributions having different shapes. In this paper, we employ the relative parametrization. This choice is motivated in Sect. 4. Here we simply introduce the notations and the measurement models for this parametrization.

Table 1. Odometry measurement model and loop closure measurement model using a relative parametrization of the absolute transformations

Let us first define our notations for the measurements. An odometry measurement, which we denote \(Z_{n\left( n+1\right) }\in G\), is a noisy transformation between two temporally consecutive local coordinate systems. A loop closure measurement, which we denote \(Z_{mn}\in G\) where \(n\ne m+1\), is a noisy transformation between two temporally nonconsecutive local coordinate systems. Moreover, in this work we assume the noises on the measurements to be mutually independent.

The relative parametrization consists in estimating transformations of the form \(T_{\left( k-1\right) k}\) where k is the local coordinate system of the camera at time instant k. Thus, at time instant k, the set of estimated transformations is \(\left\{ T_{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-1}\). Let us note that the absolute transformation \(T_{1k}\) can be obtained simply by composing the estimated relative transformations i.e. \(T_{1k}=T_{12}T_{23}\cdots T_{\left( k-1\right) k}=\prod _{i=1}^{k-1}T_{i\left( i+1\right) }\). The likelihood for an odometry measurement and a loop closure are assumed to be Gaussian and are given in Table 1, Eq. (2) and Eq. (3), respectively.

3 Related Work

In this section, we describe the most recent state of the art approaches and how they are related to the novel method we propose in this paper.

The current workhorse for motion averaging is the Gauss-Newton (GN) algorithm. In fact, this algorithm has been employed for more than a decade to tackle the motion averaging problem (the seminal work of [17] was already proposing to use it). In this context, both relative and absolute parametrizations of the absolute transformations have been employed.

The most widely used is the absolute parametrization [13]. The main reason why people tend to use this parametrization is that it leads to solving, at each iteration of the GN, a sparse linear system. Even if the size of this linear system is proportional to the number of absolute transformations, its sparsity is usually exploited in solvers such as g2o, resulting in an algorithm with a reasonable computational time for a small or medium sized problem. Using this formalism, several algorithms have been recently proposed to perform robust motion averaging (i.e. when loop closures contain erroneous measurements): [1, 8] proposed re-weighted schemes; [10, 28] introduced auxiliary variables that in fact correspond to using a robust kernel as it was recently shown in [30] in the context of bundle adjustment; [22] proposed a consensus based algorithm which optimizes clusters of loop closures with a GN and checks their consistency with statistical tests. None of these approaches fulfill our first two requirements (Computational efficiency and Memory efficiency). However, in order to demonstrate the ability of our novel algorithm to detect wrong loop closures, we will compare its results against the Dynamic Covariance Scaling (DCS) [1].

The relative parametrization was initially used in [14] for the specific case of planar motions and was then extended to general matrix Lie groups in [25]. Each iteration of these algorithms corresponds to a GN step, even if it is not presented as such. Employing the relative parametrization leads to solving, at each iteration of the GN, a dense linear system whose size is proportional to the number of loop closure measurements. Consequently, the approach proposed in [25] is highly efficient when the number of loop closures is small but impractical for large scale problems. At first sight, the relative parametrization does not seem very attractive for our problem since we are mostly interested in large scale problems. However, we will see that, using this parametrization, the posterior distribution of the relative transformations has a specific shape that can be approximated with few parameters.

To the best of our knowledge, the most closely related approaches to the one we propose in this paper are the filters proposed in [5, 12]. The algorithm proposed in [5] uses a Kalman filter to estimate absolute transformations using an absolute parametrization and validation gating to detect wrong loop closures. However, their approximation of the posterior distribution is a multivariate Gaussian distribution whose covariance matrix grows quadratically with the number of absolute transformations. Consequently, this filter is impractical for large scale problems. On the contrary, the authors of [12] use a relative parametrization and propose a novel closed-form way to process each loop closure using the concept of trajectory bending. This leads to a highly efficient filter which does not explicitly try to approximate the posterior distribution of the relative transformations but estimates the uncertainty of each transformation with a single scalar. Consequently, this filter also fulfills our “memory efficiency” requirement. However, this approach assumes that the loop closure measurements do not contain outliers.

Contrary to these methods, in this paper, we propose a novel filter based on a variational Bayesian approximation of the posterior distribution of the relative transformations which allows to fulfill our three requirements (see Table 2) while being almost as accurate as a batch approach.

Table 2. Comparison of state of the art approaches dedicated to motion averaging

4 The Case of a Single Loop

In this paper, we are interested in designing a Bayesian filter which, by definition, has to process the measurements sequentially in order to approximate the posterior distribution of the estimated transformations. However, as we have already seen, two parametrizations of the absolute transformations are possible. In this section, we motivate our choice of employing the relative parametrization on the simpler problem of motion averaging from odometry measurements and a single loop closure (see Fig. 2(a)).

In fact, we consider a loop of length \(N_{L}\), where we are given \(N_{L}-1\) odometry measurements \(\left\{ Z_{i\left( i+1\right) }\right\} _{i=1,\,...\,,N_{L}-1}\) and a single loop closure \(Z_{1N_{L}}\) between the local coordinate systems 1 and \(N_{L}\).

Using the likelihoods defined in Eqs. 2 and 3, we wish to minimize the following criterion w.r.t the relative transformations \(\left\{ T_{i\left( i+1\right) }\right\} _{i=1,\,...\,,N_{L}-1}\):

(4)

One way to minimize this criterion is to apply a Gauss-Newton algorithm where the relative transformations are jointly refined iteratively as follows (the superscript stands for the iteration):

$$\begin{aligned} T_{i\left( i+1\right) }^{\left( l\right) }=\text {exp}_{G}^{\wedge }\left( \delta _{i\left( i+1\right) }^{\left( l/l-1\right) }\right) T_{i\left( i+1\right) }^{\left( l-1\right) }\quad \text {for}\quad i=1\,...\,N_{L}-1. \end{aligned}$$
(5)

The increments \(\left\{ \delta _{i\left( i+1\right) }^{\left( l/l-1\right) }\right\} _{i=1,\,...\,,N_{L}-1}\) are obtained at each iteration by solving the following (dense) linear system of size \(p\times N_{L}\):

$$\begin{aligned} \left[ \begin{array}{c} \delta _{12}^{\left( l/l-1\right) }\\ \vdots \\ \delta _{\left( N_{L}-1\right) N_{L}}^{\left( l/l-1\right) } \end{array}\right] =\left( \left( J_{rel}^{\left( l\right) }\right) ^{T}\,{{\varLambda }}\, J_{rel}^{\left( l\right) }\right) ^{-1}\left( J_{rel}^{\left( l\right) }\right) ^{T}\,{{\varLambda }}\,\left[ \begin{array}{c} r_{12}^{\left( l-1\right) }\\ \vdots \\ r_{\left( N_{L}-1\right) N_{L}}^{\left( l-1\right) }\\ r_{1N_{L}}^{\left( l-1\right) } \end{array}\right] \end{aligned}$$
(6)

where \(J_{rel}^{\left( l\right) }\) is the Jacobian matrix of the system (see Fig. 2(b)), \(\,{{\varLambda }}\,\) is a block diagonal matrix concatenating the inverse of the covariance matrices of the measurements, \({r_{1N_{L}}^{\left( l-1\right) } = \text {log}_{G}^{\vee }\!\left( Z_{1N_{L}}\!\left( \prod _{i=1}^{N_{L}-1}T_{i\left( i+1\right) }^{\left( l-1\right) }\right) ^{-1}\!\right) }\) and \({r_{i\left( i+1\right) }^{\left( l-1\right) } = \text {log}_{G}^{\vee }\!\left( \! T_{i\left( i+1\right) }^{\left( l-1\right) }\! Z_{i\left( i+1\right) }^{-1}\right) }\).

Fig. 2.
figure 2

Illustration of the motion averaging problem on \(SE{\left( 3\right) }\) for a single loop. Using an absolute parametrization, the inverse pseudo-Hessian exhibits very strong correlations between the absolute transformations. On the contrary, using a relative parametrization, the inverse pseudo-Hessian has very small correlations (not null but close to zero) between the relative transformations, motivating our variational Bayesian approximation of the posterior distribution which assumes independent relative transformations (see text for details). (a) Illustration of a perfect loop of length 10, where a cone represents a camera pose (camera 1 is black camera 10 is blue). The (noiseless) odometry measurements are plotted as solid blue lines while the (noiseless) loop closure measurement is shown as a dashed red line. (b) Jacobian, pseudo-Hessian and inverse pseudo-Hessian for absolute and relative parametrizations (only the magnitude of the coefficients is shown). (Color figure online)

At first sight, the relative parametrization does not seem very interesting compared to the absolute parametrization since it requires to solve a dense linear system (in Fig. 2(b) the pseudo-Hessian \(J_{rel}^{T}\,{{\varLambda }}\, J_{rel}\) is completely dense) instead of a sparse one in the absolute parametrization caseFootnote 1 (in Fig. 2(b) the pseudo-Hessian \(J_{abs}^{T}\,{{\varLambda }}\, J_{abs}\) is extremely sparse). However, as proven in the supplementary material, by initializing \(T_{i\left( i+1\right) }^{\left( 0\right) }=Z_{i\left( i+1\right) }\) for \(i=1\,...\,\,N_{L}-1\), using the Woodbury formula and exploiting the structure of the problem, it is possible to show that

(7)

where \(J_{LC}^{\left( l\right) }=\left[ \begin{array}{ccc} J_{11}^{\left( l\right) }&\cdots&J_{1\left( N_{L}-1\right) }^{\left( l\right) }\end{array}\right] \) is the Jacobian of the loop closure error and\(J_{1n}^{\left( l\right) }\simeq \text {Ad}_{G}\left( \prod _{i=1}^{n-1}T_{i\left( i+1\right) }^{\left( l-1\right) }\right) \). In this case, only a linear system of size p (i.e. independent of the length of the loop) has to be solved, making the algorithm highly efficient to close a single loop (in practice, \(p=6\) for \(G=SE{\left( 3\right) }\), \(p=8\) for \(G=SL{\left( 3\right) }\), etc.). Moreover, the inverse of the pseudo-Hessian \(\left( J_{rel}^{T}\,{{\varLambda }}\, J_{rel}\right) ^{-1}\)(see Fig. 2(b)), which represents (once the algorithm has reached convergence) the covariance matrix of the posterior distribution under a linear approximation, exhibits very small correlations between the transformations. Therefore, a block diagonal approximation of that covariance matrix seems to be a reasonable approximation that would allow us to derive a filter being able to deal with large scale problems very efficiently. On the contrary, when using the absolute parametrization, the (pseudo-)inverse of the pseudo-Hessian \(\left( J_{abs}^{T}\,{{\varLambda }}\, J_{abs}\right) ^{\dagger }\) manifests very strong correlations, making any approximation of that matrix difficult.

From this point of view, the relative parametrization seems to be much more attractive than the absolute parametrization, at least for online inference.

Consequently, when designing our filter, we will employ a relative parametrization. Loop closure measurements will be processed sequentially using this highly efficient GN which only requires to solve a linear system of size p at each iteration. After having processed a loop closure, the covariance matrix of the posterior distribution will be approximated with a block diagonal covariance matrix using a variational Bayesian approximation. All these steps are detailed in Sect. 5.

Let us note that, since our approach employs a GN to process loop closures sequentially, it is optimal for any problem (i.e. any matrix Lie group G with anisotropic noises on the measurements) containing loops that do not interact with each other. On the contrary, for the same problems, COP-SLAM [12] is only optimal when the noise is isotropic and the logarithm map of G is related to a bi-invariant metric which in practice is usually not true, except for \(SO{\left( 3\right) }\).

5 Online Variational Bayesian Motion Averaging

In the previous Section, we showed that the relative parametrization was appealing for the online motion averaging problem. We now derive our novel filter using this parametrization.

5.1 Estimated State

At time instant \(k-1\) (where \(k>2\)), the estimated state consists in all the relative transformations \(\mathcal {X}_{k-1}=\left\{ T_{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-2}\). More specifically, at time instant \(k-1\), the posterior distribution of the state is assumed to have the following factorized form:

$$\begin{aligned} p\left( \mathcal {X}_{k-1}|\mathcal {D}_{odo,k-1},\!\mathcal {D}_{LC,k-1}\right) = Q_{k-1}\!\left( \mathcal {X}_{k-1}\right) = \prod _{i=1}^{k-2}\!\mathcal {N}_{G}\!\left( T_{i\left( i+1\right) };\overline{T}_{i\left( i+1\right) },P_{i\left( i+1\right) }\right) \!\! \end{aligned}$$
(8)

where \(\mathcal {D}_{odo,k-1}=\left\{ Z_{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-2}\) and \(\mathcal {D}_{LC,k-1}=\left\{ Z_{ij}\right\} _{1\le i<j-1<k-1}\).

5.2 Processing of a New Odometry Measurement

At time instant k, when the new odometry measurement \(Z_{\left( k-1\right) k}\) (with known covariance \({\varSigma }_{\left( k-1\right) k}\)) is available, the estimated state simply augments, i.e. \(\mathcal {X}_{k}=\left\{ T_{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-1}\). Consequently, the posterior distribution of the state remains factorized and has the following form:

$$\begin{aligned} p\left( \mathcal {X}_{k}|\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1}\right) =Q_{k}^{odo}\left( \mathcal {X}_{k}\right) =\prod _{i=1}^{k-1}\mathcal {N}_{G}\left( T_{i\left( i+1\right) };\overline{T}_{i\left( i+1\right) },P_{i\left( i+1\right) }\right) \end{aligned}$$
(9)

where \(\mathcal {D}_{odo,k}=\left\{ Z_{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-1}\), \(\overline{T}_{\left( k-1\right) k}=Z_{\left( k-1\right) k}\) and \(P_{\left( k-1\right) k}={\varSigma }_{\left( k-1\right) k}\).

5.3 Processing of a New Loop Closure Measurement

At time instant k, after having received the odometry measurement \(Z_{\left( k-1\right) k}\), a new loop closure measurement \(Z_{lk}\) (with known covariance \({\varSigma }_{lk}\)) may be available (where \(l<k\)). In fact, multiple loop closures may be available, however, in order to keep the notations uncluttered, we only describe how to deal with one loop closure. In practice, the processing is applied sequentially to each loop closure as it is described in the pseudo-code presented in the supplementary material.

When a new loop closure measurement \(Z_{lk}\) is available, we would like to take into account the information coming from that observation in order to refine our current estimate of the state. However, the observation model Eq. 3 creates dependencies between all the relative transformations involved in the loop, and, therefore, the posterior distribution \(p\left( \mathcal {X}_{k}|\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1},Z_{lk}\right) \) is not factorized anymore. Thus, the number of parameters required to describe that non-factorized posterior distribution becomes huge (typically in \(O\left( k^{2}\right) \) using a linear approximation, see [5]), especially for large scale problems.

In order for our filter to be able to operate online on large scale problems, we propose to approximate that posterior distribution with a factorized distribution, whose number of parameters will be in \(O\left( k\right) \). Such an approximation is motivated by our analysis of the single loop case (see Fig. 2). One way to find a factorized distribution “similar” to the true posterior distribution is to minimize the Kullback-Leibler divergence

(10)

This approach is usually called “variational Bayesian approximation” in the literature [3] and sometimes “Structured Mean Field” since we do not assume a fully factorized distribution but only the relative transformations to be mutually independent.

Variational Bayesian Approximation. Minimizing the KL divergence in (10) w.r.t \(\mathcal {X}_{k}\) corresponds to maximizing the lower bound

$$\begin{aligned} \mathcal {L}\left( Q_{VB}\left( \mathcal {X}_{k}\right) \right) =\int Q_{VB}\left( \mathcal {X}_{k}\right) \ln \left( \frac{p\left( \mathcal {X}_{k},\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1},Z_{lk}\right) }{Q_{VB}\left( \mathcal {X}_{k}\right) }\right) d\mathcal {X}_{k}. \end{aligned}$$
(11)

In our case the (log)-joint distribution of all the random variables has the form:

(12)

However, because of the curvature of the Lie group, the terms inside the norms in (12) are not linear in the transformations. One way to apply a variational Bayesian approach in this case is to linearize the terms inside the norms.

At this point, we find it convenient to define the variables involved in the linearization step:

$$\begin{aligned} T_{i\left( i+1\right) }=\text {exp}_{G}^{\wedge }\left( \epsilon _{i\left( i+1\right) }\right) \breve{T}_{i\left( i+1\right) }\quad \text {for}\quad i=1\,...\,\,k-1 \end{aligned}$$
(13)

where \(\breve{T}_{i\left( i+1\right) }\) is the (fixed) linearization point for the relative transformation \(T_{i\left( i+1\right) }\) and \(\epsilon _{i\left( i+1\right) }\) is a random variable. This linearization point is quite important and its value is discussed at the end of the section. After linearization, the log-joint distribution becomes:

$$\begin{aligned}&\text {ln}\left( p\left( \left\{ \epsilon _{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-1},\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1},Z_{lk}\right) \right) \nonumber \\&=\,{-}\frac{1}{2}\left\| r_{lk}-\sum _{i=l}^{k-1}J_{li}\epsilon _{i\left( i+1\right) }\right\| _{{\varSigma }_{lk}}^{2}-\frac{1}{2}\sum _{i=1}^{k-1}\left\| r_{i\left( i+1\right) }+\epsilon _{i\left( i+1\right) }\right\| _{P_{i\left( i+1\right) }}^{2}+\text {cst} \end{aligned}$$
(14)

where we approximated the Jacobian of \(\text {log}_{G}^{\vee }\) by the identity, \(J_{li}\simeq \text {Ad}_{G}\left( \prod _{j=l}^{i-1}\breve{T}_{j\left( j+1\right) }\right) \) for \(i>l\), \(J_{ll}\simeq Id\), \(r_{lk}=\text {log}_{G}^{\vee }\left( Z_{lk}\left( \prod _{i=l}^{k-1}\breve{T}_{i\left( i+1\right) }\right) ^{-1}\right) \) and \(r_{i\left( i+1\right) }=\text {log}_{G}^{\vee }\left( \breve{T}_{i\left( i+1\right) }\overline{T}_{i\left( i+1\right) }^{-1}\right) \).

Given the log-joint distribution (14), the objective is now to maximize the lower bound (11) where \(\mathcal {X}_{k}\) is now replaced with \(\left\{ \epsilon _{i\left( i+1\right) }\right\} _{i=1,\,...\,,k-1}\) because of the linearization step. Here:

$$\begin{aligned} Q_{VB}\left( \left\{ \epsilon _{i\left( i+1\right) }\right\} _{i=1,\,...\, ,k-1}\right) =\prod _{i=1}^{k-1}q_{VB}\left( \epsilon _{i\left( i+1\right) }\right) . \end{aligned}$$
(15)

In fact, it is possible to show that (see [3] pp. 446), for each variable \(\epsilon _{i\left( i+1\right) }\), the best approximated distribution is given by the following expression:

$$\begin{aligned} \!\!&\!\text {ln}\left( q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) \right) \nonumber \\ \!\!&\!=\mathbb {E}_{Q_{VB}\backslash q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) }\!\left[ \text {ln}\left( p\left( \left\{ \epsilon _{i\left( i+1\right) }\right\} _{i=1,\,...\, ,k-1},\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1},Z_{lk}\right) \right) \right] \!+\!\text {cst} \end{aligned}$$
(16)

where \(\mathbb {E}_{Q_{VB}\backslash q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) }\) stands for the conditional expectation w.r.t all the variables except \(\epsilon _{i\left( i+1\right) }\). Thus, from (16) and (14), we obtain

$$\begin{aligned} \text {ln}\left( q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) \right) =-\frac{1}{2}\left( \epsilon _{i\left( i+1\right) }-\mu _{i\left( i+1\right) }\right) ^{T}{\varXi }_{i\left( i+1\right) }^{-1}\left( \epsilon _{i\left( i+1\right) }-\mu _{i\left( i+1\right) }\right) +\text {cst} \end{aligned}$$
(17)

where

$$\begin{aligned} {\varXi }_{i\left( i+1\right) }&=\left( J_{li}^{T}{\varSigma }_{lk}^{-1}J_{li}+P_{i\left( i+1\right) }^{-1}\right) ^{-1} \end{aligned}$$
(18)

and

$$\begin{aligned} \mu _{i\left( i+1\right) }= & {} {\varXi }_{i\left( i+1\right) }^{-1}\left( J_{li}^{T}{\varSigma }_{lk}^{-1}e_{lk,i}-P_{i\left( i+1\right) }^{-1}r_{i\left( i+1\right) }\right) \end{aligned}$$
(19)

with \(e_{lk,i}=r_{lk}-\left( \sum _{j=l,j\ne i}^{k-1}J_{lj}\mu _{j\left( j+1\right) }\right) \). Therefore, for each random variable \(\epsilon _{i\left( i+1\right) }\) (\(i=1,\,...\,,k-1\)), the best approximated distribution is a Gaussian of the form:

$$\begin{aligned} q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) =\mathcal {N}_{\mathbb {R}^{p}}\left( \epsilon _{i\left( i+1\right) };\mu _{i\left( i+1\right) },{\varXi }_{i\left( i+1\right) }\right) \quad \text {for}\quad i=1,\,...\,,k-1 \end{aligned}$$
(20)

Let us note that if \(i<l\), i.e. if the relative transformation \(T_{i\left( i+1\right) }\) is not involved in the loop closure \(Z_{lk}\), then

$$\begin{aligned} q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) =\mathcal {N}_{\mathbb {R}^{p}}\left( \epsilon _{i\left( i+1\right) };\varvec{0},{\varXi }_{i\left( i+1\right) }=P_{i\left( i+1\right) }\right) \quad \text {for}\quad i<l \end{aligned}$$
(21)

making our algorithm very efficient since a loop closure will only modify the relative transformations involved in that loop.

In theory, in order to obtain the values of \(\left\{ \mu _{i\left( i+1\right) }\right\} _{i=l,\,...\,,k-1}\) we should cycle through (19) for each relative transformation involved in the loop until convergence. However, if the linearization step (see Eq. (13)) is performed around the maximizer of (12), then \(\mu _{i\left( i+1\right) }=\varvec{0}\) for \(i=l,\,...\,,k-1\). Thus in practice, for each new loop closure measurement, we first apply, the Gauss-Newton algorithm described in Sect. 4 Footnote 2 in order to find the maximizer of (12) very efficiently. Then we only have to compute the covariances \({\varXi }_{i\left( i+1\right) }\) (see Eq. (18)) for \(i=l,\,...\,,k-1\).

Finally, for each relative transformation, \(q_{VB}^{*}\left( \epsilon _{i\left( i+1\right) }\right) \) is a Gaussian with zero mean. Therefore, from Eq. (13), one can see that (up to a linear approximation) \(q_{VB}^{*}\left( T_{i\left( i+1\right) }\right) \) is a Gaussian distribution on Lie group (see Eq. (1)) of the form \(\mathcal {N}_{G}\left( T_{i\left( i+1\right) };\breve{T}_{i\left( i+1\right) },{\varXi }_{i\left( i+1\right) }\right) \). Consequently, after having processed a new loop closure, our factorized approximation of the posterior has the following form:

$$\begin{aligned} p\left( \mathcal {X}_{k}|\mathcal {D}_{odo,k},\mathcal {D}_{LC,k-1},Z_{lk}\right) \approx Q_{VB}\left( \mathcal {X}_{k}\right) = \prod _{i=1}^{k-1}\mathcal {N}_{G}\!\left( T_{i\left( i+1\right) };\breve{T}_{i\left( i+1\right) },{\varXi }_{i\left( i+1\right) }\right) \! \end{aligned}$$
(22)

Detection of Outlier Loop Closure Through Validation Gating. So far, we have proposed an efficient way to process a new loop closure measurement assuming it was following the generative model 3. However, in practice, two places being perceived as the same usually produce a wrong loop closure. Consequently, detecting and removing these wrong loop closure measurements is crucial in order to perform motion averaging, especially for large scale problems where wrong loop closures are very likely to occur.

Since we continuously maintain an approximation of the posterior distribution, it is possible to detect wrong loop closure measurements through validation gating [26]. This approach consists in first computing the mean \(\overline{Z}_{lk}\) and covariance \(\overline{{\varSigma }}_{lk}\) parameters of the following distribution:

(23)

and then testing w.r.t a threshold t whether or not the received measurement is likely to be an inlier:

$$\begin{aligned} \left\| \text {log}_{G}^{\vee }\left( Z_{lk}\overline{Z}_{lk}^{-1}\right) \right\| _{\overline{{\varSigma }}_{lk}}^{2}<t \end{aligned}$$
(24)

In theory, t should be based on the p-value of the Chi-squared distribution. However, as we will see in the experiments, such a theoretical value is sometimes too restrictive, especially when processing real data where the covariance of the odometry and loop closure measurements are not very accurate and the assumption of mutually independent noises might be violated.

6 Experiments

We now evaluate experimentally, both on synthetic and real datasets, our novel online variational Bayesian motion averaging algorithm (a pseudo-code is proposed in supplementary material) against the state of the art algorithms LG-IEKF [5], COP-SLAM [12] and DCS [1] (which uses g2o). To do so, we first compare both the accuracy and the computational time of these approaches on datasets which do not contain wrong loop closures, since COP-SLAM is not able to detect and remove wrong loop closures. The robustness of the different approaches is evaluated separately on datasets specifically dedicated to this task (see [23]). We finally present qualitative results on monocular visual SLAM and video mosaicking applications. In all these experiments, when dealing with synthetic datasets, the threshold t of the validation gating stage of our algorithm has been set to the \(\mathcal {X}^{2}\) value with p degrees of freedom given by a p-value of 0.001. Otherwise, when dealing with real data, we empirically defined \(t=900\), which is much higher that the theoretical \(\mathcal {X}^{2}\) value since the covariance of the odometry and loop closure measurements are usually not very accurate in this case and the assumption of mutually independent noises might be violated.

6.1 Evaluation of the Accuracy and the Computational Time

In this experiment, we consider a binocular 6D SLAM application (Lie group \(SE{\left( 3\right) }\)) and use one synthetic sequence (Sphere) and two real sequences (originally from the KITTI dataset [15]) provided by the authors of [12]. The results for this experiment are given in Table 3 where we reported both the Root Mean Squared Error (RMSE) for the absolute positions as well as the computational time for our approach, COP-SLAM, LG-IEKF and g2o. We provide the computational time both in C++ and Matlab because only a Matlab implementation of LG-IEKF is available.

Let us first note that we should not expect the RMSE of COP-SLAM and our approach to be as low as the RMSE of LG-IEKF and g2o because these approaches do not try to summarize the past information with a small number of parameters at each time instant but keep all the past information (g2o keeps all the past measurements while LG-IEKF maintains a full covariance matrix). However, one can see that our approach remains very accurate. Indeed, on the KITTI 02 sequence, our approach even obtains the same RMSE as LG-IEKF. On the contrary, COP-SLAM obtains much higher RMSE than our approach on every sequence. From the computational time point of view, as expected, both our approach and COP-SLAM are orders of magnitude faster than LG-IEKF and g2o. Moreover, our approach is only slightly slower than COP-SLAM which is largely compensated by the fact that our approach has a much lower RMSE.

Table 3. Results for binocular 6D SLAM (\(SE{\left( 3\right) }\)): In terms of RMSE (for the position), our approach is much closer to the solutions of both \(\hbox {g}^{2}\)o and LG-IEKF [5], compared to COP-SLAM [12]. In terms of computational time, our approach is orders of magnitude faster than both \(\hbox {g}^{2}\)o and LG-IEKF, while being only slightly slower than COP-SLAM. Remark: for these experiments, wrong loop closures have been removed since COP-SLAM cannot cope with them. The robustness of our method w.r.t wrong loop closures is evaluated against LG-IEKF [5] and DCS [1] (which uses \(g^{2}o\) ) in Sect. 6.2.

6.2 Evaluation of the Robustness

In this experiment, we employ the dataset provided by the authors of [23] which allows to evaluate the robustness of an approach to wrong loop closures on a planar visual SLAM application (Lie group \(SE{\left( 2\right) }\)). The results and details regarding this experiment are provided in the supplementary material. Our approach surprisingly achieved exactly the same precision and recall as both LG-IEKF and DCS. This is a remarkable result since these two algorithms are not designed to perform online large scale estimation and are consequently much slower than our approach (see Table 3).

6.3 Additional Experiments

In Fig. 1, we present results for monocular visual SLAM (Lie group \(Sim{\left( 3\right) }\)) on sequence 13 of the KITTI dataset. The details regarding this experiment are provided in the supplementary material. However, one can see that the trajectory estimated with our approach is visually much closer to the result of [33] (which employs a Lidar) than the trajectory estimated with COP-SLAM. Results on sequence 15 of the KITTI dataset as well as results for video mosaicking (Lie group \(Sim{\left( 3\right) }\)) are also provided as supplementary material.

7 Conclusion and Future Work

In this paper, we proposed a novel filter dedicated to online motion averaging for large scale problems. We have shown that using a relative parametrization of the absolute transformations produces a posterior distribution that can be efficiently approximated assuming independent relative transformations. Based on this representation, we demonstrated that it is possible to obtain an accurate, efficient and robust filter by employing a variational Bayesian approach.

The performances of our novel algorithm were extensively evaluated against the state of the art algorithm COP-SLAM [12]. Actually, our approach achieved a significantly lower RMSE than COP-SLAM while being only slightly slower.

Since COP-SLAM cannot detect wrong loop closures, we also compared the robustness of our filter against LG-IEKF [5] and DCS [1]. In this context, our approach surprisingly achieved the same robustness as these algorithms. This is a remarkable result since our approach is designed to perform online large scale estimation and, consequently, is orders of magnitude faster than both LG-IEKF and DCS.

As future work, we plan to exploit the high efficiency of our filter to build a multi-hypothesis filter. This would prevent failures, such as those described in the supplementary material, to which LG-IEKF, DCS and the approach presented in this paper are prone due to the fact that they are forced to take a decision when a loop closure measurement is available and cannot wait until new evidence is received.