Keywords

1 Introduction

Robustly fitting a geometric model onto noisy and outlier-contaminated data is a necessary capability in computer vision [1], due to the imperfectness of data acquisition systems and preprocessing algorithms (e.g., edge detection, keypoint detection and matching). Without robustness against outliers, the estimated geometric model will be biased, leading to failure in the overall pipeline.

In computer vision, robust fitting is typically performed under the framework of inlier set maximisation, a.k.a. consensus maximisation [2], where one seeks the model with the most number of inliers. For concreteness, say we wish to estimate the parameter vector \(\mathbf {x}\in \mathbb {R}^d\) that defines the linear relationship \(\mathbf {a}^T \mathbf {x}= b\) from a set of outlier-contaminated measurements \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\). The consensus maximisation formulation for this problem is as follows.

Problem 1

(MAXCON). Given input data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), where \(\mathbf {a}_i \in \mathbb {R}^d\) and \(b_i \in \mathbb {R}\), and an inlier threshold \( \epsilon \in \mathbb {R}_+\), find the \(\mathbf {x}\in \mathbb {R}^d\) that maximises

$$\begin{aligned} \mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}) = \sum _{i=1}^N \mathbb {I}\left( |\mathbf {a}_i^T\mathbf {x}- b_i | \le \epsilon \right) , \end{aligned}$$
(1)

where \(\mathbb {I}\) returns 1 if its input predicate is true, and 0 otherwise.

The quantity \(| \mathbf {a}_i^T\mathbf {x}- b_i |\) is the residual of the i-th measurement with respect to \(\mathbf {x}\), and the value given by \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\) is the consensus of \(\mathbf {x}\) with respect to \(\mathcal {D}\). Intuitively, the consensus of \(\mathbf {x}\) is the number of inliers of \(\mathbf {x}\). For the robust estimate to fit the inlier structure well, the inlier threshold \(\epsilon \) must be set to an appropriate value; the large number of applications that employ the consensus maximisation framework indicate that this is usually not an obstacle.

Developing algorithms for robust fitting, specifically for consensus maximisation, is an active research area in computer vision. Currently, the most popular algorithms belong to the class of randomised sampling techniques, i.e., RANSAC [2] and its variants [3, 4]. Unfortunately, such techniques do not provide certainty of finding satisfactory solutions, let alone optimal ones [5].

Increasingly, attention is given to constructing globally optimal algorithms for robust fitting, e.g., [6,7,8,9,10,11,12,13,14]. Such algorithms are able to deterministically calculate the best possible solution, i.e., the model with the highest achievable consensus. This mathematical guarantee is regarded as desirable, especially in comparison to the “rough” solutions provided by random sampling heuristics.

Recent progress in globally optimal algorithms for consensus maximisation seems to suggest that global solutions can be obtained efficiently or tractably [6,7,8,9,10,11,12,13,14]. Moreover, decent empirical performances have been reported. This raises hopes that good alternatives to the random sampling methods are now available. However, to what extent is the problem solved? Can we expect the global algorithms to perform well in general? Are there fundamental obstacles toward efficient robust fitting algorithms? What do we even mean by “efficient”?

1.1 Our Contributions and Their Implications

Our contributions are theoretical. We resolve the above ambiguities in the literature, by proving the following computational hardness results. The implications of each result are also listed below.

figure a

As usual, the implications of the hardness results are subject to the standard complexity assumptions P\(\ne \)NP [15] and FPT\(\ne \)W[1]-hard [16].

Our analysis indicates the “extreme” difficulty of consensus maximisation. MAXCON is not only intractable (by standard notions of intractability [15, 16]), the W[1]-hardness result also suggests that any global algorithm will scale exponentially in a function of d, i.e., \(N^{f(d)}\). In fact, if a conjecture of Erickson et al. [17] holds, MAXCON cannot be solved faster than \(N^d\). Thus, the decent performances in [6,7,8,9,10,11,12,13,14] are unlikely to extend to the general cases in practical settings, where \(N \ge 1000\) and \(d \ge 6\) are common. More pessimistically, APX-hardness shows that MAXCON is impossible to approximate, in that there are no polynomial time approximation schemes (PTAS) [18] for MAXCONFootnote 1.

A slightly positive result is as follows.

figure b

This is achieved by applying a special case of the algorithm of Chin et al. [13] on MAXCON to yield a runtime of \(\mathcal {O}(d^o)\text {poly}(N,d)\). However, this still scales exponentially in o, which can be large in practice (e.g., \(o \ge 100\)).

1.2 How Are Our Theoretical Results Useful?

First, our results clarify the ambiguities on the efficiency and solvability of consensus maximisation alluded to above. Second, our analysis shows how the effort scales with the different input size parameters, thus suggesting more cogent ways for researchers to test/compare algorithms. Third, since developing algorithms for consensus maximisation is an active topic, our hardness results encourage researchers to consider alternative paradigms of optimisation, e.g., deterministically convergent heuristic algorithms [19,20,21] or preprocessing techniques [22,23,24].

1.3 What About Non-linear Models?

Our results are based specifically on MAXCON, which is concerned with fitting linear models. In practice, computer vision applications require the fitting of non-linear geometric models (e.g., fundamental matrix, homography, rotation). While a case-by-case treatment is ideal, it is unlikely that non-linear consensus maximisation will be easier than linear consensus maximisation [25,26,27].

1.4 Why Not Employ Other Robust Statistical Procedures?

Our purpose here is not to benchmark or advocate certain robust criteria. Rather, our primary aim is to establish the fundamental difficulty of consensus maximisation, which is widely used in computer vision. Second, it is unlikely that other robust criteria are easier to solve [28]. Although some that use differentiable robust loss functions (e.g., M-estimators) can be solved up to local optimality, it is unknown how far the local optima deviate from the global solution.

The rest of the paper is devoted to developing the above hardness results.

2 NP-hardness

The decision version of MAXCON is as follows.

Problem 2

(MAXCON-D). Given data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), an inlier threshold \(\epsilon \in \mathbb {R}_+\), and a number \(\psi \in \mathbb {N}_+\), does there exist \(\mathbf {x}\in \mathbb {R}^d\) such that \(\varPsi _\epsilon (\mathbf {x}\mid \mathcal {D}) \ge \psi \)?

Another well-known robust fitting paradigm is least median squares (LMS), where we seek the vector \(\mathbf {x}\) that minimises the median of the residuals

$$\begin{aligned} \underset{\mathbf {x}\in \mathbb {R}^d}{\min } \quad \text {med}\,\left( |\mathbf {a}_1^T\mathbf {x}- b_1|, \dots , |\mathbf {a}_N^T\mathbf {x}- b_N| \right) . \end{aligned}$$
(2)

LMS can be generalised by minimising the k-th largest residual instead

$$\begin{aligned} \underset{\mathbf {x}\in \mathbb {R}^d}{\min } \quad \mathrm {kos}\left( |\mathbf {a}_1^T\mathbf {x}- b_1|, \dots , |\mathbf {a}_N^T\mathbf {x}- b_N| \right) , \end{aligned}$$
(3)

where function \(\mathrm {kos}\) returns its k-th largest input value.

Geometrically, LMS seeks the slab of the smallest width that contains half of the data points \(\mathcal {D}\) in \(\mathbb {R}^{d+1}\). A slab in \(\mathbb {R}^{d+1}\) is defined by a normal vector \(\mathbf {x}\) and width w as

$$\begin{aligned} h_w(\mathbf {x}) = \left\{ (\mathbf {a},b) \in \mathbb {R}^{d+1} \; \left| \; |\mathbf {a}^T\mathbf {x}-b | \le \frac{1}{2}w \right. \right\} . \end{aligned}$$
(4)

Problem (3) thus seeks the thinnest slab that contains k of the points. The decision version of (3) is as follows.

Problem 3

(k-SLAB). Given data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), an integer k where \(1 \le k \le N\), and a number \(w^\prime \in \mathbb {R}_+\), does there exist \(\mathbf {x}\in \mathbb {R}^d\) such that k of the members of \(\mathcal {D}\) are contained in a slab \(h_w(\mathbf {x})\) of width at most \(w^\prime \)?

k-SLAB has been proven to be NP-complete in [17].

Theorem 1

MAXCON-D is NP-complete.

Proof

Let \(\mathcal {D}\), k and \(w^\prime \) define an instance of k-SLAB. This can be reduced to an instance of MAXCON-D by simply reusing the same \(\mathcal {D}\), and setting \(\epsilon = \frac{1}{2}w^\prime \) and \(\psi = k\). If the answer to k-SLAB is positive, then there is an \(\mathbf {x}\) such that k points from \(\mathcal {D}\) lie within vertical distance of \(\frac{1}{2}w^\prime \) from the hyperplane defined by \(\mathbf {x}\), hence \(\varPsi _\epsilon (\mathbf {x}\mid \mathcal {D})\) must be at least \(\psi \) and the answer to MAXCON-D is also positive. Conversely, if the answer to MAXCON-D is positive, then there is an \(\mathbf {x}\) such that \(\psi \) points have vertical distance of less than \(\epsilon \) to \(\mathbf {x}\), hence a slab that is centred at \(\mathbf {x}\) of width at most \(w^\prime \) can enclose k of the points, and the answer to k-SLAB is also positive.\(\square \)

The NP-completeness of MAXCON-D implies the NP-hardness of the optimisation version MAXCON. See Sect. 1.1 for the implications of NP-hardness.

3 Parametrised Complexity

Parametrised complexity is a branch of algorithmics that investigates the inherent difficulty of problems with respect to structural parameters in the input [16]. In this section, we report several parametrised complexity results of MAXCON.

First, the consensus set \(\mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D})\) of \(\mathbf {x}\) is defined as

$$\begin{aligned} \mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D}) := \{ i \in \{1,\dots ,N \} \mid | \mathbf {a}^T_i \mathbf {x}- b_i | \le \epsilon \}. \end{aligned}$$
(5)

An equivalent definition of consensus (1) is thus

$$\begin{aligned} \mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}) = |\mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D})|. \end{aligned}$$
(6)

Henceforth, we do not distinguish between the integer subset \(\mathcal {C}\subseteq \{1,\dots ,N \}\) that indexes a subset of \(\mathcal {D}\), and the actual data that are indexed by \(\mathcal {C}\).

3.1 XP in the Dimension

The following is the Chebyshev approximation problem [29, Chapter 2] defined on the input data indexed by \(\mathcal {C}\):

$$\begin{aligned} \underset{\mathbf {x}\in \mathbb {R}^d}{\min } \quad \max _{i \in \mathcal {C}}~|\mathbf {a}_i^T\mathbf {x}- b_i| \end{aligned}$$
(7)

Problem (7) has the linear programming (LP) formulation

which can be solved in polynomial time. Chebyshev approximation also has the following property.

Lemma 1

There is a subset \(\mathcal {B}\) of \(\mathcal {C}\), where \(|\mathcal {B}| \le d+1\), such that

$$\begin{aligned} \underset{\mathbf {x}\in \mathbb {R}^d}{\min } \quad \max _{i \in \mathcal {B}}~r_i(\mathbf {x}) = \underset{\mathbf {x}\in \mathbb {R}^d}{\min } \quad \max _{i \in \mathcal {C}}~r_i(\mathbf {x}) \end{aligned}$$
(8)

Proof

See [29, Sect. 2.3]. \(\square \)

We call \(\mathcal {B}\) a basis of \(\mathcal {C}\). Mathematically, \(\mathcal {B}\) is the set of active constraints to \(\mathrm {LP}[\mathcal {C}]\), hence bases can be computed easily. In fact, \(\mathrm {LP}[\mathcal {B}]\) and \(\mathrm {LP}[\mathcal {C}]\) have the same minimisers. Further, for any subset \(\mathcal {B}\) of size \(d+1\), a method by de la Vallée-Poussin can solve \(\mathrm {LP}[\mathcal {B}]\) analytically in time polynomial to d; see [29, Chapter 2] for details.

Let \(\mathbf {x}\) be an arbitrary candidate solution to MAXCON, and \((\hat{\mathbf {x}}, \hat{\gamma })\) be the minimisers to \(\mathrm {LP}[\mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D})]\), i.e., the Chebyshev approximation problem on the consensus set of \(\mathbf {x}\). The following property can be established.

Lemma 2

\(\mathrm {\Psi }_\epsilon (\hat{\mathbf {x}} \mid \mathcal {D}) \ge \mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\).

Proof

By construction, \(\hat{\gamma } \le \epsilon \). Hence, if \((\mathbf {a}_i, b_i)\) is an inlier to \(\mathbf {x}\), i.e., \(|\mathbf {a}^T_i \mathbf {x}- b_i| \le \epsilon \), then \(|\mathbf {a}_i^T\hat{\mathbf {x}} -b_i | \le \hat{\gamma } \le \epsilon \), i.e., \((\mathbf {a}_i, b_i)\) is also an inlier to \(\hat{\mathbf {x}}\). Thus, the consensus of \(\hat{\mathbf {x}}\) is no smaller than the consensus of \(\mathbf {x}\). \(\square \)

Lemmas 1 and 2 suggest a rudimentary algorithm for consensus maximisation that attempts to find the basis of the maximum consensus set, as encapsulated in the proof of the following theorem.

Theorem 2

MAXCON is XP (slice-wise polynomial) in the dimension d.

Proof

Let \(\mathbf {x}^*\) be a witness to an instance of MAXCON-D with positive answer, i.e., \(\mathrm {\Psi }_\epsilon (\mathbf {x}^* \mid \mathcal {D}) \ge \psi \). Let \((\hat{\mathbf {x}}^*, \hat{\gamma }^*)\) be the minimisers to \(\mathrm {LP}[\mathcal {C}_\epsilon (\mathbf {x}^* \mid \mathcal {D})]\). By Lemma 2, \(\hat{\mathbf {x}}^*\) is also a positive witness to the instance. By Lemma 1, \(\hat{\mathbf {x}}^*\) can be found by enumerating all \((d+1)\)-subsets of \(\mathcal {D}\), and solving Chebyshev approximation (7) on each \((d+1)\)-subset. There are a total of \(\left( {\begin{array}{c}N\\ d+1\end{array}}\right) \) subsets to check; including the time to evaluate \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\) for each candidate, the runtime of this simple algorithm is \(\mathcal {O}(N^{d+2}\mathrm {poly}(d))\), which is polynomial in N for a fixed d. \(\square \)

Theorem 2 shows that for a fixed dimension d, MAXCON can be solved in time polynomial in the number of measurements N (this is consistent with the results in [8, 12]). However, this does not imply that MAXCON is tractable (following the standard meaning of tractability in complexity theory [15, 16]). Moreover, in practical applications, d could be large (e.g., \(d \ge 5\)), thus the rudimentary algorithm above will not be efficient for large N.

3.2 W[1]-Hard in the Dimension

Can we remove d from the exponent of the runtime of a globally optimal algorithm? By establishing W[1]-hardness in the dimension, this section shows that it is not possible. Our proofs are inspired by, but extends quite significantly from, that of [30, Sect. 5]. First, the source problem is as follows.

Problem 4

(k-CLIQUE). Given undirected graph \(G = (V, E)\) with vertex set V and edge set E and a parameter \(k \in \mathbb {N}_+\), does there exist a clique in G with k vertices?

k-CLIQUE is W[1]-hard w.r.t. parameter k [31]. Here, we demonstrate an FPT reduction from k-CLIQUE to MAXCON-D with fixed dimension d.

Generating the Input Data. Given input graph \(G = (V, E)\), where \(V = \{1,\dots ,M \}\), and size k, we construct a \((k+1)\)-dimensional point set \(\mathcal {D}_{G} = \{ (\mathbf {a}_i, b_i) \}^{N}_{i=1} = \mathcal {D}_V \cup \mathcal {D}_E\) as follows:

  • The set \(\mathcal {D}_V\) is defined as

    $$\begin{aligned} \mathcal {D}_V = \{ (\mathbf {a}^v_\alpha , b^v_\alpha ) \}_{\alpha = 1,\dots ,k}^{v = 1, \dots , M}, \end{aligned}$$
    (9)

    where

    $$\begin{aligned} \mathbf {a}^v_\alpha = \left[ 0, \dots , 0, 1, 0, \dots , 0 \right] ^T \end{aligned}$$
    (10)

    is a k-dimensional vector of 0’s except at the \(\alpha \)-th element where the value is 1, and

    $$\begin{aligned} b^v_\alpha = v. \end{aligned}$$
    (11)
  • The set \(\mathcal {D}_E\) is defined as

    $$\begin{aligned} \begin{aligned} \mathcal {D}_E = \{ (\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta }) \mid ~&u, v = 1,\dots ,M, \\&\langle u, v \rangle \in E, \langle v, u \rangle \in E,\\&\alpha , \beta = 1, \dots , k,\\&\alpha < \beta ~\}, \end{aligned} \end{aligned}$$
    (12)

    where

    $$\begin{aligned} \mathbf {a}^{u,v}_{\alpha ,\beta } = \left[ 0, \dots , 0, 1, 0, \dots , 0, M, 0, \dots , 0 \right] ^T \end{aligned}$$
    (13)

    is a k-dimensional vector of 0’s, except at the \(\alpha \)-th element where the value is 1 and the \(\beta \)-th element where the value is M, and

    $$\begin{aligned} b^{u,v}_{\alpha ,\beta } = u + Mv. \end{aligned}$$
    (14)

The size N of \(\mathcal {D}_{G}\) is thus \(|\mathcal {D}_V| + |\mathcal {D}_E| = kM + 2|E|\left( {\begin{array}{c}k\\ 2\end{array}}\right) \).

Setting the Inlier Threshold. Under our reduction, \(\mathbf {x}\in \mathbb {R}^d\) is responsible for “selecting” a subset of the vertices V and edges E of G. First, we say that \(\mathbf {x}\) selects vertex v if a point \((\mathbf {a}^v_\alpha , b^v_\alpha ) \in \mathcal {D}_V\), for some \(\alpha \), is an inlier to \(\mathbf {x}\), i.e., if

$$\begin{aligned} | (\mathbf {a}^v_\alpha )^T\mathbf {x}- b^v_\alpha | \le \epsilon \equiv x_\alpha \in [v - \epsilon , v + \epsilon ], \end{aligned}$$
(15)

where \(x_\alpha \) is the \(\alpha \)-th element of \(\mathbf {x}\). The key question is how to set the value of the inlier threshold \(\epsilon \), such that \(\mathbf {x}\) selects no more than k vertices, or equivalently, such that \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\) for all \(\mathbf {x}\).

Lemma 3

If \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\), with equality achieved if and only if \(\mathbf {x}\) selects k vertices of G.

Proof

For any u and v, the ranges \([u-\epsilon , u+\epsilon ]\) and \([v-\epsilon , v+\epsilon ]\) cannot overlap if \(\epsilon < \frac{1}{2}\). Hence, \(x_\alpha \) lies in at most one of the ranges, i.e., each element of \(\mathbf {x}\) selects at most one of the vertices; see Fig. 1. This implies that \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\). \(\square \)

Fig. 1.
figure 1

The blue dots indicate the integer values in the dimensions \(x_\alpha \) and \(x_\beta \). If \(\epsilon <\frac{1}{2}\), then the ranges defined by (15) for all \(v = 1,\dots ,M\) do not overlap. Hence, \(x_\alpha \) can select at most one vertex of the graph. (Color figure online)

Second, a point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\) from \(\mathcal {D}_E\) is an inlier to \(\mathbf {x}\) if

$$\begin{aligned} | (\mathbf {a}^{u,v}_{\alpha ,\beta })^T\mathbf {x}- b^{u,v}_{\alpha ,\beta } | \le \epsilon \equiv | (x_\alpha -u) + M(x_\beta - v)| \le \epsilon . \end{aligned}$$
(16)

As suggested by (16), the pairs of elements of \(\mathbf {x}\) are responsible for selecting the edges of G. To prevent each element pair \(x_\alpha , x_\beta \) from selecting more than one edge, or equivalently, to maintain \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), the setting of \(\epsilon \) is crucial.

Lemma 4

If \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achieved if and only if \(\mathbf {x}\) selects \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges of G.

Proof

For each \(\alpha , \beta \) pair, the constraint (16) is equivalent to the two linear inequalities

$$\begin{aligned} \begin{aligned} x_\alpha + Mx_\beta - u - Mv&\le \epsilon , \\ x_\alpha + Mx_\beta - u - Mv&\ge -\epsilon , \end{aligned} \end{aligned}$$
(17)

which specify two opposing half-planes (i.e., a slab) in the space \((x_\alpha ,x_\beta )\). Note that the slopes of the half-plane boundaries do not depend on u and v. For any two unique pairs \((u_1, v_1)\) and \((u_2, v_2)\), we have the four linear inequalities

$$\begin{aligned} \begin{aligned} x_\alpha + Mx_\beta - u_1 - Mv_1&\le \epsilon , \\ x_\alpha + Mx_\beta - u_1 - Mv_1&\ge -\epsilon , \\ x_\alpha + Mx_\beta - u_2 - Mv_2&\le \epsilon , \\ x_\alpha + Mx_\beta - u_2 - Mv_2&\ge -\epsilon . \end{aligned} \end{aligned}$$
(18)

The system (18) can be simplified to

$$\begin{aligned} \begin{aligned} \frac{1}{2}\left[ u_2 - u_1 + M(v_2 - v_1) \right]&\le \epsilon , \\ \frac{1}{2}\left[ u_1 - u_2 + M(v_1 - v_2) \right]&\le \epsilon . \end{aligned} \end{aligned}$$
(19)

Setting \(\epsilon < \frac{1}{2}\) ensures that the two inequalities (19) cannot be consistent for all unique pairs \((u_1, v_1)\) and \((u_2, v_2)\). Geometrically, with \(\epsilon < \frac{1}{2}\), the two slabs defined by (17) for different \((u_1, v_1)\) and \((u_2, v_2)\) pairs do not intersect; see Fig. 2 for an illustration.

Fig. 2.
figure 2

The blue dots indicate the integer values in the dimensions \(x_\alpha \) and \(x_\beta \). If \(\epsilon <\frac{1}{2}\), then any two slabs defined by (17) for different \((u_1, v_1)\) and \((u_2, v_2)\) pairs do not intersect. The figure shows two slabs corresponding to \(u_1 = 1\), \(v_1 = 5\), \(u_2 = 2\), \(v_2 = 5\). (Color figure online)

Hence, if \(\epsilon < \frac{1}{2}\), each element pair \(x_\alpha , x_\beta \) of \(\mathbf {x}\) can select at most one of the edges. Cumulatively, \(\mathbf {x}\) can select at most \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges, thus \(\mathrm {\Psi }_{\epsilon }(\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \). \(\square \)

Up to this stage, we have shown that if \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) \le k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achievable if there is a clique of size k in G. To establish the FPT reduction, we need to establish the reverse direction, i.e., if \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), then there is a k-clique in G. The following lemma shows that this can be assured by setting \(\epsilon <\frac{1}{M+2}\).

Lemma 5

If \(\epsilon < \frac{1}{M+2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) \le k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achievable if and only if there is a clique of size k in G.

Proof

The ‘only if’ direction has already been proven. To prove the ‘if’ direction, we show that if \(\epsilon <\frac{1}{M+2}\) and \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), the subgraph S(\(\mathbf {x}\)) = \(\{\lfloor x_1 \rceil ,\ldots ,\lfloor x_k \rceil \}\) is a k-clique, where each \(\lfloor x_\alpha \rceil \) represents a vertex index in G. Since \(\epsilon <\frac{1}{2}\), \(\lfloor x_\alpha \rceil = u\) if and only if \((\mathbf {a}^u_\alpha , b^u_\alpha )\) is an inlier. Therefore, S(\(\mathbf {x}\)) consists of all vertices selected by \(\mathbf {x}\). From Lemmas 3 and 4, when \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), \(\mathbf {x}\) is consistent with k points in \(\mathcal {D}_V\) and \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) points in \(\mathcal {D}_E\). The inliers in \(\mathcal {D}_V\) specifies the k vertices in S(\(\mathbf {x}\)). The ‘if’ direction is true if all selected \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges are only edges in S(\(\mathbf {x}\)), i.e., for each inlier point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\in \mathcal {D}_E\), \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\) are also inliers w.r.t. \(\mathbf {x}\). The prove is done by contradiction:

If \(\epsilon <\frac{1}{M+2}\), given an inlier \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\), from (16) we have:

$$\begin{aligned} \begin{aligned}&| (x_\alpha -u) + M(x_\beta - v)| =\\&|[(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)] + [(x_\alpha -\lfloor x_\alpha \rceil ) + M(x_\beta - \lfloor x_\beta \rceil )]| < \frac{1}{M+2}. \end{aligned} \end{aligned}$$
(20)

Assume at least one of \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\) is not an inlier, from (15) and \(\epsilon <\frac{1}{M+2}\), we have \(\lfloor x_\alpha \rceil \ne u\) or \(\lfloor x_\beta \rceil \ne v\), which means that at least one of \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) is not zero. Since all elements of \(\mathbf {x}\) satisfy (15), both \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) are integers between \([-(M-1),(M-1)]\). If only one of \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) is not zero, then \(|(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)| \ge |1+M\cdot 0| = 1\). If both are not zero, then \(|(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)| \ge |(M-1)+M\cdot 1| = 1\) Therefore, we have

$$\begin{aligned} |(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)|\ge 1. \end{aligned}$$
(21)

Also due to (15), we have

$$\begin{aligned} |(x_\alpha -\lfloor x_\alpha \rceil ) + M(x_\beta - \lfloor x_\beta \rceil )| \le (M+1)\cdot \epsilon = \frac{M+1}{M+2}. \end{aligned}$$
(22)

Combining (21) and (22), we have

$$\begin{aligned} \begin{aligned}&|[(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)] + [(x_\alpha -\lfloor x_\alpha \rceil ) + M(x_\beta - \lfloor x_\beta \rceil )]|\ge \\&|[(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)]| - |[(x_\alpha -\lfloor x_\alpha \rceil ) + M(x_\beta - \lfloor x_\beta \rceil )]|\ge \\&1-\frac{M+1}{M+2} = \frac{1}{M+2}, \end{aligned} \end{aligned}$$
(23)

which contradicts (20). It is obvious that S(\(\mathbf {x}\)) can be computed within linear time. Hence, the ‘if’ direction is true when \(\epsilon <\frac{1}{M+2}\).\(\square \)

To illustrate Lemma 5, Fig. 3 depicts the value of \(\mathrm {\Psi }_\epsilon ( \mathbf {x}\mid \mathcal {D}_G )\) in the subspace \((x_\alpha , x_\beta )\) for \(\epsilon < \frac{1}{M+2}\). Observe that \(\mathrm {\Psi }_\epsilon ( \mathbf {x}\mid \mathcal {D}_G )\) attains the highest value of 3 in this subspace if and only if \(x_\alpha \) and \(x_\beta \) select a pair of vertices that are connected by an edge in G.

Fig. 3.
figure 3

If \(\epsilon <\frac{1}{M+2}\), then the slab (17) that contains a point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\in \mathcal {D}_E\), where (uv) is an edge in \(\mathcal {G}\), does not intersect with any grid region besides the one formed by \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\). In this figure, \(u = 1\) and \(v = 5\).

Completing the Reduction. We have demonstrated a reduction from k-CLIQUE to MAXCON-D, where the main work is to generate data \(\mathcal {D}_G\) which has number of measurements \(N = k|V| + 2|E|\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) that is linear in |G| and polynomial in k, and dimension \(d = k\). In other words, the reduction is FPT in k. Setting \(\epsilon < \frac{1}{M+2}\) and \(\psi = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \) completes the reduction.

Theorem 3

MAXCON is W[1]-hard w.r.t. the dimension d.

Proof

Since k-CLIQUE is W[1]-hard w.r.t. k, by the above FPT reduction, MAXCON is W[1]-hard w.r.t. d. \(\square \)

The implications of Theorem 3 have been discussed in Sect. 1.1.

3.3 FPT in the Number of Outliers and Dimension

Let \(f(\mathcal {C})\) and \(\hat{\mathbf {x}}_\mathcal {C}\) respectively indicate the minimised objective value and minimiser of \(\mathrm {LP}[\mathcal {C}]\). Consider two subsets \(\mathcal {P}\) and \(\mathcal {Q}\) of \(\mathcal {D}\), where \(\mathcal {P}\subseteq \mathcal {Q}\). The statement

$$\begin{aligned} f(\mathcal {P}) \le f(\mathcal {Q}) \end{aligned}$$
(24)

follows from the fact that \(\mathrm {LP}[\mathcal {P}]\) contains only a subset of the constraints of \(\mathrm {LP}[\mathcal {Q}]\); we call this property monotonicity.

Let \(\mathbf {x}^*\) be a global solution of an instance of MAXCON, and let \(\mathcal {I}^* := \mathcal {C}_\epsilon (\mathbf {x}^* \mid \mathcal {D}) \subset \mathcal {D}\) be the maximum consensus set. Let \(\mathcal {C}\) index a subset of \(\mathcal {D}\), and let \(\mathcal {B}\) be the basis of \(\mathcal {C}\). If \(f(\mathcal {C}) > \epsilon \), then by Lemma 1

$$\begin{aligned} f(\mathcal {D}) \ge f(\mathcal {C}) = f(\mathcal {B}) > \epsilon . \end{aligned}$$
(25)

The monotonicity property affords us further insight.

Lemma 6

At least one point in \(\mathcal {B}\) do not exist in \(\mathcal {I}^*\).

Proof

By monotonicity,

$$\begin{aligned} \epsilon < f(\mathcal {B}) \le f(\mathcal {I}^* \cup \mathcal {B}). \end{aligned}$$
(26)

Hence, \(\mathcal {I}^* \cup \mathcal {B}\) cannot be equal to \(\mathcal {I}^*\), for if they were equal, then \(f(\mathcal {I}^* \cup \mathcal {B}) = f(\mathcal {I}^*) \le \epsilon \) which violates (26). \(\square \)

The above observations suggest an algorithm for MAXCON that recursively removes basis points to find a consensus set, as summarised in Algorithm 1. This algorithm is a special case of the technique of Chin et al. [13]. Note that in the worst case, Algorithm 1 finds a solution with consensus d (i.e., the minimal case to fit \(\mathbf {x}\)), if there are no solutions with higher consensus to be found.

figure c

Theorem 4

MAXCON is FPT in the number of outliers and dimension.

Proof

Algorithm 1 conducts a depth-first tree search to find a recursive sequence of basis points to remove from \(\mathcal {D}\) to yield a consensus set. By Lemma 6, the longest sequence of basis points that needs to be removed is \(o = N-|\mathcal {I}^*|\), which is also the maximum tree depth searched by the algorithm (each descend of the tree removes one point). The number of nodes visited is of order \((d+1)^o\), since the branching factor of the tree is \(|\mathcal {B}|\), and by Lemma 1, \(|\mathcal {B}| \le d+1\).

At each node, \(\mathrm {LP}[\mathcal {C}]\) is solved, with the largest of these LPs having \(d+1\) variables and N constraints. Algorithm 1 thus runs in \(\mathcal {O}(d^o \mathrm {poly}(N,d))\) time, which is exponential only in the number of outliers o and dimension d.\(\square \)

Using [32, Theorem 2.3] and the repeated basis detection and avoidance procedure in [13, Sec. 3.1], the complexity of Algorithm 1 can be improved to \(\mathcal {O}((o+1)^d\mathrm {poly}(N,d))\). See [33, Sec. 3.5] for details.

4 Approximability

Given the inherent intractability of MAXCON, it is natural to seek recourse in approximate solutions. However, this section shows that it is not possible to construct PTAS [18] for MAXCON.

Our development here is inspired by [34, Sec. 3.2]. First, we define our source problem: given a set of k Boolean variables \(\{ v_j \}^{k}_{j=1}\), a literal is either one of the variables, e.g., \(v_j\), or its negation, e.g., \(\lnot v_j\). A clause is a disjunction over a set of literals, i.e., \(v_1 \vee \lnot v_2 \vee v_3\). A truth assignment is a setting of the values of the k variables. A clause is satisfied if it evaluates to true.

Problem 5

(MAX-2SAT). Given M clauses \(\mathcal {K}= \{ \mathcal {K}_i \}^{M}_{i=1}\) over k Boolean variables \(\{ v_j \}^{k}_{j=1}\), where each clause has exactly two literals, what is the maximum number of clauses that can be satisfied by a truth assignment?

MAX-2SAT is APX-hard [35], meaning that there are no algorithms that run in polynomial time that can approximately solve MAX-2SAT up to a desired error ratio. Here, we show an L-reduction [36] from MAX-2SAT to MAXCON, which unfortunately shows that MAXCON is also APX-hard.

Generating the Input Data. Given an instance of MAX-2SAT with clauses \(\mathcal {K}= \{ \mathcal {K}_i \}^{M}_{i=1}\) over variables \(\{ v_j \}^{k}_{j=1}\), let each clause \(\mathcal {K}_i\) be represented as \((\pm v_{\alpha _i})\vee (\pm v_{\beta _i})\), where \(\alpha _i, \beta _i \in \{1, \dots , k\}\) index the variables that exist in \(\mathcal {K}_i\), and ± here indicates either a “blank” (no negation) or \(\lnot \) (negation). Define

$$\begin{aligned} \mathrm {sgn}(\alpha _i) = {\left\{ \begin{array}{ll} +1 &{} \mathrm {if}~v_{\alpha _i}~\text {occurs without negation in}~\mathcal {K}_i, \\ -1 &{} \mathrm {if}~v_{\alpha _i}~\text {occurs with negation in}~\mathcal {K}_i; \end{array}\right. } \end{aligned}$$
(27)

similarly for \(\mathrm {sgn}(\beta _i)\). Construct the input data for MAXCON as

$$\begin{aligned} \mathcal {D}_{\mathcal {K}} = \{ (\mathbf {a}^{p}_i,b^{p}_i) \}^{p = 1,\dots ,6}_{i=1,\dots ,M}, \end{aligned}$$
(28)

where there are six measurements for each clause. Namely, for each clause \(\mathcal {K}_i\),

  • \(\mathbf {a}^{1}_i\) is a k-dimensional vector of zeros, except at the \(\alpha _i\)-th and \(\beta _i\)-th elements where the values are respectively \(\mathrm {sgn}(\alpha _i)\) and \(\mathrm {sgn}(\beta _i)\), and \(b^{1}_i = 2\).

  • \(\mathbf {a}^{2}_i = \mathbf {a}^{1}_i\) and \(b^{2}_i = 0\).

  • \(\mathbf {a}^{3}_i\) is a k-dimensional vector of zeros, except at the \(\alpha _i\)-th element where the value is \(\mathrm {sgn}(\alpha _i)\), and \(b^{3}_i = -1\).

  • \(\mathbf {a}^{4}_i = \mathbf {a}^{3}_i\) and \(b^{4}_i = 1\).

  • \(\mathbf {a}^{5}_i\) is a k-dimensional vector of zeros, except at the \(\beta _i\)-th element where the value is \(\mathrm {sgn}(\beta _i)\), and \(b^{5}_i = -1\).

  • \(\mathbf {a}^{6}_i = \mathbf {a}^{5}_i\) and \(b^{6}_i = 1\).

The number of measurements N in \(\mathcal {D}_{\mathcal {K}}\) is 6M.

Setting the Inlier Threshold. Given a solution \(\mathbf {x}\in \mathbb {R}^k\) for MAXCON, the six input measurements associated with \(\mathcal {K}_i\) are inliers under these conditions:

$$\begin{aligned} \begin{aligned} (\mathbf {a}^1_i, b^1_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\alpha _i)x_{\alpha _i} + \mathrm {sgn}(\beta _i)x_{\beta _i} - 2| \le \epsilon ,\\ (\mathbf {a}^2_i, b^2_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\alpha _i)x_{\alpha _i} + \mathrm {sgn}(\beta _i)x_{\beta _i}| \le \epsilon , \end{aligned} \end{aligned}$$
(29)
$$\begin{aligned} \begin{aligned} (\mathbf {a}^3_i, b^3_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\alpha _i)x_{\alpha _i} + 1| \le \epsilon ,\\ (\mathbf {a}^4_i, b^4_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\alpha _i)x_{\alpha _i} - 1| \le \epsilon , \end{aligned} \end{aligned}$$
(30)
$$\begin{aligned} \begin{aligned} (\mathbf {a}^5_i, b^5_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\beta _i)x_{\beta _i} + 1| \le \epsilon ,\\ (\mathbf {a}^6_i, b^6_i)~\text {is an inlier}&\iff |\mathrm {sgn}(\beta _i)x_{\beta _i} - 1| \le \epsilon , \end{aligned} \end{aligned}$$
(31)

where \(x_{\alpha }\) is the \(\alpha \)-th element of \(\mathbf {x}\). Observe that if \(\epsilon < 1\), then at most one of (29), one of (30), and one of (31) can be satisfied. The following result establishes an important condition for L-reduction.

Lemma 7

If \(\epsilon < 1\), then

$$\begin{aligned} \mathrm {OPT(\text {MAXCON})} \le 6\cdot \mathrm {OPT(\text {MAX-2SAT})}, \end{aligned}$$
(32)

\(\mathrm {OPT(\text {MAX-2SAT})}\) is the maximum number of clauses that can be satisfied for a given MAX-2SAT instance, and \(\mathrm {OPT(\text {MAXCON})}\) is the maximum achievable consensus for the MAXCON instance generated under our reduction.

Proof

If \(\epsilon < 1\), for all \(\mathbf {x}\), at most one of (29), one of (30), and one (31), can be satisfied, hence \(\mathrm {OPT(\text {MAXCON})}\) cannot be greater than 3M. For any MAX-2SAT instance with M clauses, there is an algorithm [37] that can satisfy at least \(\lceil \frac{M}{2} \rceil \) of the clauses, thus \(\mathrm {OPT(\text {MAX-2SAT})} \ge \lceil \frac{M}{2} \rceil \). This leads to (32).\(\square \)

Note that, if \(\epsilon < 1\), rounding \(\mathbf {x}\) to its nearest bipolar vector (i.e.,, a vector that contains only \(-1\) or 1) cannot decrease the consensus w.r.t. \(\mathcal {D}_\mathcal {K}\). It is thus sufficient to consider \(\mathbf {x}\) that are bipolar in the rest of this section.

Intuitively, \(\mathbf {x}\) is used as a proxy for truth assignment: setting \(x_j = 1\) implies setting \(v_j = true\), and vice versa. Further, if one of the conditions in (29) holds for a given \(\mathbf {x}\), then the clause \(\mathcal {K}_i\) is satisfied by the truth assignment. Hence, for \(\mathbf {x}\) that is bipolar and \(\epsilon < 1\),

$$\begin{aligned} \mathrm {\Psi }_\epsilon ( \mathbf {x}\mid \mathcal {D}_\mathcal {K}) = 2M + \sigma , \end{aligned}$$
(33)

where \(\sigma \) is the number of clauses satisfied by \(\mathbf {x}\). This leads to the final necessary condition for L-reduction.

Lemma 8

If \(\epsilon < 1\), then

$$\begin{aligned} \left| \mathrm {OPT(\text {MAX-2SAT})} - \mathrm {SAT}(\mathbf {t}(\mathbf {x})) \right| = \left| \mathrm {OPT(\text {MAXCON})} - \mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_\mathcal {K}) \right| , \end{aligned}$$
(34)

where \(\mathbf {t}(\mathbf {x})\) returns the truth assignment corresponding to \(\mathbf {x}\), and \(\mathrm {SAT}(\mathbf {t}(\mathbf {x}))\) returns the number of clauses satisfied by \(\mathbf {t}(\mathbf {x})\).

Proof

For any bipolar \(\mathbf {x}\) with consensus \(2M + \sigma \), the truth assignment \(\mathbf {t}(\mathbf {x})\) satisfies exactly \(\sigma \) clauses. Since the value of \(\mathrm {OPT(\text {MAXCON})}\) must take the form \(2M + \sigma ^*\), then \(\mathrm {OPT(\text {MAX-2SAT})} = \sigma ^*\). The condition (34) is immediately seen to hold by substituting the values into the equation.\(\square \)

We have demonstrated an L-reduction from MAX-2SAT to MAXCON, where the main work is to generate \(\mathcal {D}_\mathcal {K}\) in linear time. The function \(\mathbf {t}\) also takes linear time to compute. Setting \(\epsilon < 1\) completes the reduction.

Theorem 5

MAXCON is APX-hard.

Proof

Since MAX-2SAT is APX-hard, by the above L-reduction, MAXCON is also APX-hard.\(\square \)

See Sect. 1.1 for the implications of Theorem 5.

5 Conclusions and Future Work

Given the fundamental difficulty of consensus maximisation as implied by our results (see Sect. 1.1), it would be prudent to consider alternative paradigms for optimisation, e.g., deterministically convergent heuristic algorithms [19,20,21] or preprocessing techniques [22,23,24].