1 Introduction

Many fundamental problems in image processing and computer vision, such as image filtering, segmentation, registration, and stereo vision, can naturally be formulated as optimization problems. Often, these optimization problems can be described as labeling problems in which we wish to assign to each image element (pixel or vertex of an associated graph) \(v\in V\) an element \(\ell (v)\) from some finite K-element set of labels, usually \(\{0,\ldots ,K-1\}\). The interpretation of these labels depends on the optimization problem at hand. In image segmentation, the labels might indicate object categories. In registration and stereo disparity problems, the labels represent correspondences between images, and in image reconstruction and filtering the labels represent intensities in the filtered image.

In what follows an undirected graph \(\mathcal {G}\) is identified with a pair \(\langle V, \mathcal {E}\rangle \), where V is its set of vertices and \(\mathcal {E}\) is the set of its edges. Each edge connecting vertices s and t is identified with a pair \(\{s,t\}\). We make the assumption that the vertices in V are linearly ordered, and let \( {\hat{{{\mathcal {E}}}}}:=\{\langle s,t\rangle \in V^2:\{s,t\}\in \mathcal {E}\ \& \ s<t\}\).

Our new algorithms have no restriction on the format of the graph to which they can be applied. However, in what follows we will often treat \(\mathcal {G}\) as associated with a digital image. In this case, V is the set of all pixels of the image, while \(\mathcal {E}\) is the set of pairs \(\{s,t\}\) of vertices/pixels that are adjacent according to some given adjacency relation.

In this paper, we seek the vertex label assignments \(\ell :V\rightarrow \{0,1,\ldots , K-1\}\) of the undirectedFootnote 1 graphs \(\mathcal {G}=(V, \mathcal {E})\) that minimize a given objective (energy) function \(E_\infty \) of the form

$$\begin{aligned} E_\infty (\ell ) : = \max \bigl \{\max _{s\in V} \phi _s(\ell (s)), \max _{\langle s,t\rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi _{st}(\ell (s),\ell (t))\bigr \}.\!\!\!\!\! \end{aligned}$$
(1)

The functions \(\phi _s(\cdot )\) are referred to as unary terms. The value of \(\phi _s(j)\) depends explicitly only on the label \(j\in \{0,1,\ldots , K-1\}\),

but typically is also based on some prior information. These terms are used to indicate a preference for a vertex/pixel s to be assigned a particular label j.

The functions \(\phi _{st}(\cdot ,\cdot )\) are referred to as pairwise or binary terms. The value of \(\phi _{st}(\cdot ,\cdot )\) depends simultaneously on the labels assigned to the vertices/pixels s and t, and thus introduces a dependency between the labels of different pixels. Typically, this dependency between pixels is used to express an expectation that the desired solution should have some degree of smoothness or regularity.

The unary and pairwise terms taken together form the local costs error measures we mentioned in the abstract (and forming the functional \(\varPhi \) defined in Sect. 3). The same local costs are used in the \(L_1\)-norm energy \(E_1\), that we discuss briefly in the next section.

Finding a labeling that globally minimizes an objective function of the form \(E_\infty \) is generally a challenging computational task—in Sect. 7, we show that this problem is in fact NP-hard in the general case, for \(K > 2\). As we will see, however, there exist restricted classes of local cost functionals for which efficient algorithms can be formulated.

In the conference version of this paper [16], we introduced an algorithm for finding a binary labeling (i.e., with \(K=2\)) and showed that the labeling it returns is always \(E_\infty \)-optimal as long as all pairwise local cost terms \(\phi _{st}\) are \(\infty \)-submodular, that is, that they satisfy the condition

$$\begin{aligned} \max \{\phi _{st}(0,0),\phi _{st}(1,1)\} \le \max \{\phi _{st}(1,0),\phi _{st}(0,1)\}.\!\!\! \end{aligned}$$
(2)

This algorithm, presented in Sect. 6, is very efficient, with quasi-linear time complexity.Footnote 2 An important question left open in our previous work [16] was whether it is possible to optimize objective function \(E_\infty \) in polynomial time without any additional assumptions on the local cost functional, like that of \(\infty \)-submodularity needed for the algorithm from [16]. Here, we answer this question affirmatively by presenting in Sect. 5 an algorithm that produces, in \(\mathcal {O}((|V|+|\mathcal {E}|)^2)\) time, a binary labeling that is globally \(E_\infty \)-optimal for any local cost functional.

2 Background and Related Work

2.1 \(L_p\) Norm Objective Functions and Minimal Graph Cuts

While the main focus of this paper is to find efficient algorithms for the direct optimization of objective functions of the form \(E_\infty \), we will start by discussing the more general problems of optimizing \(L_p\) norm objective functions for \(p\in [1,\infty ]\).

In their seminal work, Kolmogorov and Zabih [13] considered binary labeling problems for the \(L_1\)-norm-based objective function of the form

$$\begin{aligned} E_1(\ell ) : = \sum _{s\in V} \phi _s(\ell (s))+\sum _{\langle s,t\rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi _{st}(\ell (s),\ell (t)) \end{aligned}$$
(3)

and showed that a globally optimal binary labeling can be found by solving a max-flow/min-cut problem on a suitably constructed graph under the condition that all pairwise terms \(\phi _{st}\) are submodular, that is, that they satisfy the inequality

$$\begin{aligned} \phi _{st}(0,0)+\phi _{st}(1,1)\le \phi _{st}(0,1)+\phi _{st}(1,0). \end{aligned}$$
(4)

Looking at the objective functions \(E_1\) and \(E_\infty \), we can view them both as consisting of two parts:

  • The local error measures, in our case expressed by the unary and pairwise terms.

  • A global error measure, aggregating the local errors into a final score.

In the case of \(E_1\), the global error measure is obtained by summing all local error measures; in the case of \(E_\infty \), the global error measure is taken to be the maximum of all local error measures. If we assume for a moment that all local error measurements are nonnegative, then \(E_1\) can be seen as measuring the \(L_1\)-norm of a vectorFootnote 3 containing all local costs/errors. Similarly, \(E_\infty \) can be interpreted as the \(L_\infty \)- (or max-) norm of the same vector. The \(L_1\) and \(L_\infty \) norms are both the special cases of \(L_p\) norms, with \(p\in [1,\infty ]\), which for finite p are defined as

$$\begin{aligned} E_p(\ell ) : = \left( \sum _{s\in {V}} \phi _s^p(\ell (s))+\sum _{\langle s,t \rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi ^p_{st}(\ell (s),\ell (t)) \right) ^{1/p},\!\!\! \end{aligned}$$
(5)

where \(\phi _s^p(\cdot )=(\phi _s(\cdot ))^p\) and \(\phi ^p_{st}(\cdot ,\cdot )=(\phi _{st}(\cdot ,\cdot ))^p\). The value \(p\in [1,\infty ]\) can be seen as a parameter controlling the balance between minimizing the overall cost versus minimizing the magnitude of the individual terms. For \(p=1\), the optimal labeling may contain arbitrarily large individual terms as long as the sum of the terms is small. As p increases, a larger penalty is assigned to solutions containing large individual terms. In the limit as p approaches infinity, \(E_p\) approaches \(E_\infty \) and the penalty assigned to a solution is determined by the largest individual term only. The limit behavior of \(L_p\) norm optimizers as p approaches \(\infty \) has also been studied in, e.g., [8, 18, 20]. Abbas and Swoboda [1] considered optimization of mixed optimization problems, where the objective function contains both \(L_1\) and \(L_\infty \) terms.

Labeling problems with objective functions of the form \(E_p\), for \(p\in [1,\infty )\), can be solved using minimal graph cuts, provided that all pairwise terms \(\phi ^p_{st}\) are p-submodular [17]. A binary term \(\phi \) is said to be p-submodular if the corresponding term \(\phi ^p\) is submodular, which is equivalent to the condition

$$\begin{aligned} (\phi _{st}^p(0,0)+\phi _{st}^p(1,1))^{1/p} \le (\phi _{st}^p(0,1)+\phi _{st}^p(1,0))^{1/p}. \end{aligned}$$
(6)

In the limit, as p goes to infinity, this inequality becomes

$$\begin{aligned} \max \{\phi _{st}(0,0),\phi _{st}(1,1)\} \le \max \{\phi _{st}(1,0),\phi _{st}(0,1)\}, \end{aligned}$$

that is, the \(\infty \)-submodularity condition (2). As observed by Malmberg and Strand [17], 1-submodularity does not necessarily imply p-submodularity.Footnote 4 The following theorem was shown by Malmberg and Strand [17]:

Theorem 1

If a binary term \(\phi \) is 1-submodular and \(\infty \)-submodular, then it is also p-submodular for any real \(p \ge 1\).

We note here that Theorem 1 implies also the following seemingly stronger result.

Corollary 1

Let \(\phi \) be a binary term. Then for every \(\rho \in [1,\infty )\) the following conditions are equivalent.

  1. (i)

    \(\phi \) is \(\rho \)-submodular and \(\infty \)-submodular.

  2. (ii)

    \(\phi \) is p-submodular for every \(p\in [\rho ,\infty )\).

Proof

To see that (ii) implies (i) notice that the p-submodularity inequality (6) can be written as

$$\begin{aligned} \Vert \langle \phi _{st}(0,0),\phi _{st}(1,1)\rangle \Vert _p \le \Vert \langle \phi _{st}(0,1),\phi _{st}(1,0)\rangle \Vert _p. \end{aligned}$$

Since the \(L_p\) norm converges to the \(L_\infty \) norm, as p goes to infinity, the limit of both sides of the above inequality becomes

$$\begin{aligned} \Vert \langle \phi _{st}(0,0),\phi _{st}(1,1)\rangle \Vert _\infty \le \Vert \langle \phi _{st}(0,1),\phi _{st}(1,0)\rangle \Vert _\infty , \end{aligned}$$

that is, the \(\infty \)-submodularity condition (2).

To see that (i) implies (ii) assume that \(\phi _{st}\) satisfies (i). Then \(\phi _{st}^\rho \) is both 1-submodular (raise both sides of the inequality (2) with \(p=\rho \) to the power \(\rho \)) and \(\infty \)-submodular (as the map \(x^\rho \) is increasing on \((0,\infty )\)). In particular, \(\phi _{st}\) satisfies the assumptions of Theorem 1. Therefore, for every \(p\in [\rho ,\infty )\) it is \(\frac{p}{\rho }\)-submodular, that is, satisfies

$$\begin{aligned} (\phi _{st}^{\rho \frac{p}{\rho }}(0,0)+\phi _{st}^{\rho \frac{p}{\rho }}(1,1))^{\rho /p} \le (\phi _{st}^{\rho \frac{p}{\rho }}(0,1)+\phi _{st}^{\rho \frac{p}{\rho }}(1,0))^{\rho /p}. \end{aligned}$$

But this clearly implies p-submodularity of \(\phi _{st}\). \(\square \)

2.2 Optimization of \(E_\infty \) by Classical Algorithms

In Sect. 4, we will show that if the binary terms \(\phi \) satisfy (i) of Corollary 1, then an optimal labeling for the associated energy \(E_\infty \) can be found by solving an appropriate max-flow/min-cut problem.

Moreover, it turns out that in some problem instances a labeling that is globally optimal with respect to \(E_\infty \) can be found using very efficient, greedy algorithms. Specifically, if

  1. (D)

    all pairwise terms are such that \(\phi _{st}(1,0)=\phi _{st}(0,1)\) and \(\phi _{st}(0,0)=\phi _{st}(1,1)=0\), while all unary terms have values in \(\{0,\infty \}\),

then an optimal labeling for the associated energy \(E_\infty \) can be found by computing the partitioning induced by an optimum spanning forest on a suitably constructed graph using, e.g., Prim’s algorithm [7, 19]Footnote 5. See more on this in Sect. 4. This property of optimum spanning forests has been observed by several authors [2, 6, 8]. This result has a high practical value since the computation time for constructing an optimal spanning forest is substantially lower than the computation time for solving a max-flow/min-cut problem, asymptotically as well as in practice [8].

Wolf et al. [21,22,23] recently proposed various extension of this greedy approach and also reported state-of-the-art results on various image segmentation benchmarks. We note also that the notion of partitioning an image-induced graph by computing an optimum spanning forest is tightly connected to the classic watershed image segmentation method [9, 10].

Based on the above, an interesting question is therefore whether it is possible to use similar greedy techniques to optimize the objective function \(E_\infty \) beyond the special case when the local costs satisfy property (D). The results presented in this paper answer this question affirmatively and show that the class of \( E_\infty \) optimization problems that are solvable by the efficient greedy algorithms is larger than what was previously known.

3 Algorithms for Direct Optimization of \(E_\infty \): Preliminaries

In Sects. 5 and 6, we will introduce two novel algorithms, each finding a binary labeling minimizing \(E_{\infty }\).

The exposition of these algorithms relies on the notion of unary and binary solution atoms, which we introduce in this section. Informally, a unary atom represents one possible label configuration for a single vertex, and a binary atom represents a possible label configuration for a pair of adjacent vertices. Thus, for a binary labeling problem, there are two atoms associated with every vertex and four atoms for every edge. The total number of atoms for a binary labeling problem is thus \(\mathcal {O}(|V|+|{{\mathcal {E}}}|)\).

Formally, we let \(\mathcal {V}=\{\{v\}:v\in V\}\), put \(\mathcal {D}=\mathcal {V}\cup \mathcal {E}\), and let \(\mathcal {A}\) be the family of all binary maps from \(D\in \mathcal {D}\) into \(\{0,1\}\). An atom, in this notation, is an element of \(\mathcal {A}\). If we identify, as it is common, maps with their graphs then each unary atom associated with a vertex \(s\in V\) has form \(\{\langle s,i\rangle \}\), with \(i\in \{0,1\}\). Similarly, each binary atom associated with an edge \(\{s,t\}\in \mathcal {E}\) has the form \(\{\langle s,i\rangle ,\langle t,j\rangle \}\), with \(i,j\in \{0,1\}\).

Notice, that the maps \(\phi _s\) and \(\phi _{st}\) used for the unary and binary terms in (1) can be combined to form a single function \(\varPhi :\mathcal {A}\rightarrow [0,\infty )\) defined, for every \(A\in \mathcal {A}\), as

$$\begin{aligned} \varPhi (A):= {\left\{ \begin{array}{ll} \phi _s(i) &{} \text{ for } A=\{\langle s,i\rangle \},\\ \phi _{s,t}(i,j) &{} \text{ for } A=\{\langle s,i\rangle ,\langle t,j\rangle \}. \end{array}\right. } \end{aligned}$$

For a given labeling \(\ell \), we define \(\phi _\ell :\mathcal {D}\rightarrow [0,\infty )\), for every \(D\in \mathcal {D}\), as \(\phi _\ell (D):= \varPhi (\ell \restriction D)\), that is,

$$\begin{aligned} \phi _\ell (D):= {\left\{ \begin{array}{ll} \phi _s(\ell (s)) &{} \text{ for } D=\{s\}\in \mathcal {V},\\ \phi _{s,t}(\ell (s),\ell (t)) &{} \text{ for } D=\{s,t\}\in \mathcal {E}, \end{array}\right. } \end{aligned}$$

where \(\ell \restriction D\) is the restriction of \(\ell \) to D. With this notation, we may write the objective function \(E_\infty \) as

$$\begin{aligned} E_\infty (\ell )=\Vert \phi _\ell \Vert _\infty = \max _{D\in \mathcal {D}} {\phi _\ell (D)}. \end{aligned}$$
(7)

Similarly, \(E_p(\ell )=\Vert \phi _\ell \Vert _p\) for any \(p\in [1,\infty )\).

3.1 Consistency

Conceptually, both the proposed algorithms work as follows: Starting from the set of all possible unary and binary atoms, the algorithm iteratively removes one atom at a time until the remaining atoms define a unique labeling. A key issue in this process is to ensure that, at all steps of the algorithm, at least one labeling can be constructed from the set of remaining atoms.

Let \(\ell \) be a binary labeling. We define \(\mathcal {A}(\ell )\), the atoms for \(\ell \), as the family

$$\begin{aligned} \mathcal {A}(\ell ): =\{\ell \restriction D:D\in \mathcal {D}\}. \end{aligned}$$

Notice that \(\ell \) can be easily recovered from \(\mathcal {A}(\ell )\) as its union: \(\ell =\bigcup \mathcal {A}(\ell )\).

Definition 1

Let \(\mathcal {A}' \subset \mathcal {A}\) be a set of atoms. We say that \(\mathcal {A}'\) is consistent if there exists at least one labeling \(\ell \) such that \(\mathcal {A}(\ell )\subseteq \mathcal {A}'\).

We will now derive one of our main results, namely that the problem of determining whether a given set of atoms is consistent can be formulated as a 2-satisfiability problem. The 2-satisfiability problem is a well-studied problem in computer science, and several efficient algorithms exists for its solution. This result quite directly leads to Algorithm 1, presented in Sect. 5, for finding a labeling minimizing \(E_\infty \).

For a set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms denote by \(\bar{\mathcal {A}' }\) the complement of \(\mathcal {A}' \) relative to \(\mathcal {A}\), that is, \(\bar{\mathcal {A}' }:={\mathcal {A}}\setminus {\mathcal {A}'}\). Then \(\mathcal {A}'\) is consistent if, and only if, there exists a labeling \(\ell \) such that \(\mathcal {A}(\ell ) \cap \bar{\mathcal {A}'} = \emptyset \). We will show that the existence of such labeling \(\ell \) can be determined by solving a 2-satisfiability problem.

For this, let’s treat any vertex \(v\in V\) of our graph as a variable of propositional calculus, that is, a variable that can take two possible values: TRUE, which will be identified with number 1, and FALSE, which will be identified with 0. Upon such identification, any labeling \(\ell :V\rightarrow \{0,1\}\) can be treated as a truth functional.

Now, with any unary atom \(A=\{\langle s,i\rangle \}\), with \(i\in \{0,1\}\), we associate a propositional calculus formula in a very simple format known as literal (i.e., a variable or its negation):

$$\begin{aligned} \psi _A(s):={\left\{ \begin{array}{ll} \lnot s&{} \text{ if } i=1,\\ s &{} \text{ if } i=0. \end{array}\right. } \end{aligned}$$

Less formally, but more concisely, \(\psi _A(s):=``s\ne i\).” Notice that \(\ell :V\rightarrow \{0,1\}\) disagrees with A if, and only if, \(\psi _A\) is satisfied by \(\ell \) treated as a truth functional.

Similarly, for every binary atom \(A=\{\langle s,i\rangle ,\langle t,j\rangle \}\) we define

$$\begin{aligned} \psi _A(s,t):=\psi _{\{\langle s,i\rangle \}}(s)\vee \psi _{\{\langle t,j\rangle \}}(t) \end{aligned}$$

or, equivalently, as \(``(s\ne i) \vee (t\ne j)\).” Once again, \(\ell :V\rightarrow \{0,1\}\) disagrees with A if, and only if, \(\psi _A\) is satisfied by \(\ell \) treated as a truth functional.

Finally, for a set \(\mathcal {A}' =\{A_1, A_2, \ldots , A_m\}\) of atoms define

$$\begin{aligned} \psi _{\mathcal {A}'}:= \bigwedge \limits _{i=1}^m \psi _{A_i}=\psi _{A_1}\wedge \cdots \wedge \psi _{A_m}. \end{aligned}$$

Also, \(\ell :V\rightarrow \{0,1\}\) disagrees with every \(A\in \mathcal {A}' \) if, and only if, \(\psi _{\mathcal {A}'}\) is satisfied by \(\ell \). Notice also that the formula \(\psi _{\mathcal {A}'}\) is in the so-called 2-conjunctive normal form, that is, it is a conjunction of formulas \(\psi _{A_i}\), each of which is a disjunction of at most two literals.

The above discussion leads to the following result.

Theorem 2

A set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms is consistent if, and only if, the 2-satisfiability problem for a formula \(\psi _{\bar{\mathcal {A}'}}\) has a positive solution.

Proof

This follows from the equivalence of the following conditions, each consecutive pair of which was argued above.

  • \(\mathcal {A}' \subseteq \mathcal {A}\) is consistent.

  • \(\mathcal {A}(\ell ) \cap \bar{\mathcal {A}'} = \emptyset \) for some \(\ell :V\rightarrow \{0,1\}\).

  • There is an \(\ell :V\rightarrow \{0,1\}\) which disagrees with every \(A\in \bar{\mathcal {A}'}\).

  • There is an \(\ell :V\rightarrow \{0,1\}\) such that \(\psi _{\bar{\mathcal {A}'}}\) is satisfied by \(\ell \).

  • The 2-satisfiability problem for a formula \(\psi _{\bar{\mathcal {A}'}}\) has a positive solution.\(\square \)

Recall that the solution to the 2-satisfiability problem for a formula in the 2-conjunctive normal form that is a conjunction of n 2-disjunctions can be found in \(\mathcal {O}(n)\) time, using, e.g., the algorithm by Aspvall et al. [3]. Thus, for any set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms, the question

Is \(\mathcal {A}'\) consistent?

can be answered in a linear time with respect to the number \(n:=|\bar{\mathcal {A}'}|\) of elements in \(\bar{\mathcal {A}'}={\mathcal {A}}\setminus {\mathcal {A}'}\) by deciding the satisfiability of \(\psi _{\bar{\mathcal {A}'}}\).

4 Strict Optimality

In this section, we will introduce a refinement of the \(L_\infty \) norm measure. This will help us in the discussion of the two proposed algorithms, which will be introduced in the next two sections.

A potential drawback of the \(L_\infty \)-norm is that it does not distinguish between solutions with high or low errors below the maximum error. To resolve this problem, Levi and Zorin introduced, in a 2014 paper [15], the concept of strict minimizers.Footnote 6 In this framework, two solutions are compared by ordering all elements (in our case, binary and unary terms) non-increasingly by their local error value and then performing their lexicographical comparison.

Formally, using the notation from Sect. 3, let \(\ell _1\) and \(\ell _2\) be two labelings. Furthermore, let \(\langle A_1, A_2, \ldots , A_k\rangle \) and \(\langle B_1, B_2, \ldots , B_k\rangle \) be the sequences of all atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\), respectively, each ordered by the decreasing costs of atoms, that is, with \(\varPhi (A_1)\ge \cdots \ge \varPhi (A_k)\) and \(\varPhi (B_1)\ge \cdots \ge \varPhi (B_k)\). We say that \(\ell _1\)precedes\(\ell _2\)lexicographically and denote this as \(\ell _1 \prec \ell _2\), provided there exists an \(i\in \{1,2,\ldots ,k\}\) such that \(\varPhi (A_i) \ne \varPhi (B_i)\) and for the smallest such i we have \(\varPhi (A_i) < \varPhi (B_i)\). Also, we write \(\ell _1 \preceq \ell _2\) provided either \(\ell _1 \prec \ell _2\) or \(\varPhi (A_i) = \varPhi (B_i)\) for all \(i\in \{1,2,\ldots ,k\}\).

Definition 2

A labeling \(\ell \) is said to be strictly minimal provided \(\ell \preceq \ell '\) for any other labeling \(\ell '\).

From this definition, it is clear that any strict minimizer is also an \(L_\infty \)-optimal solution. Thus, the set of all strict minimizers is a subset of all \(L_\infty \)-norm optimal solutions. In fact, the limit, as \(p\rightarrow \infty \), of \(L_p\)-norm minimizers discussed above, is not only an \(L_\infty \)-minimizer but also a strict minimizer [15]. (For the local cost functions satisfying the property (D), it was proved earlier, in a 2012 paper [6] of Ciesielski et al.)Footnote 7

The above discussion indicates that it would be desirable to have an efficient algorithm that not only finds \(L_\infty \)-minimizers, but also strict minimizers. Unfortunately, in the general setting that we examine here, the problem of finding strict minimizers is NP-hard. We will show this at the end of this section. Nevertheless, there are two special situations in which efficient algorithms for finding strict minimizers do exist. The first case is described in the next subsection. The second one, discussed in Sect. 5.1 and solved by the algorithm presented there, is when all local terms have distinct weights.

4.1 When all \(\phi _{st}\) are p-Submodular for Large Enough p

For a finite set \(Z\subset [0,\infty )\) and \(k\ge 1\) let \(\delta _Z^k:=\log _b k\), where

$$ \begin{aligned} b:=\min \left\{ \frac{s}{r}:0<r<s \ \& \ r,s\in Z\right\} . \end{aligned}$$

We will use the following result, that identifies the strict optimality with the optimality with respect to \(E_p\) for p large enough. For the local costs maps satisfying (D), this was first proved in [6, theorem 5.3].

Proposition 1

Let \(|V|=k\) and assume that all local cost maps \(\phi _{s}\) and \(\phi _{s,t}\) have values in a finite set \(Z\subset [0,\infty )\). If \(p\ge \delta _Z^k\), then a binary labeling \(\ell \) is strictly minimal if, and only if, it is minimal with respect to \(E_p\).

Proof

To see this, notice first that for every \(p\ge \delta _Z^k\)

$$\begin{aligned} \text{ if } \ell _1 \prec \ell _2,\hbox { then }E_p(\ell _1)<E_p(\ell _2). \end{aligned}$$
(8)

Indeed, using the notation as in the definition of \(\prec \), let i be the smallest such that \(\varPhi (A_i) < \varPhi (B_i)\). If \(\varPhi (A_i) =0\), then \(E_p^p(\ell _1)=\sum _{j=1}^{i-1}\varPhi ^p(A_j)<\sum _{j=1}^{k}\varPhi ^p(B_j)=E_p^p(\ell _2)\) justifying (8). So, assume that \(\varPhi (A_i)>0\). Then, for b defined as above, we have \(b\le \frac{\varPhi (B_i)}{\varPhi (A_i)}\) and

$$\begin{aligned} \log _b k=\delta _Z^k\le p\le p \log _b \frac{\varPhi (B_i)}{\varPhi (A_i)}=\log _b \frac{\varPhi ^p(B_i)}{\varPhi ^p(A_i)} \end{aligned}$$

so that \(k\varPhi ^p(A_i) < \varPhi ^p(B_i)\). Therefore,

$$\begin{aligned} E_p^p(\ell _1)\le \sum _{j=1}^{i-1}\varPhi ^p(A_j)+k\varPhi ^p(A_i) <\sum _{j=1}^{k}\varPhi ^p(B_j)=E_p^p(\ell _2), \end{aligned}$$

completing the argument for (8).

To prove the proposition, choose \(p\ge \delta _Z^k\) and labelings \(\ell _1\) and \(\ell _2\). If \(\ell _1\) is strictly minimal, then either \(\ell _1 \prec \ell _2\), in which case (8) implies that \(E_p(\ell _1)<E_p(\ell _2)\), or \(\langle \varPhi (A_1), \ldots , \varPhi (A_k)\rangle =\langle \varPhi (B_1), \ldots , \varPhi (B_k)\rangle \), in which case clearly \(E_p(\ell _1)=E_p(\ell _2)\). Thus, strict minimality of \(\ell _1\) indeed implied its minimality with respect to \(E_p\).

Conversely, if \(\ell _1\) is minimal with respect to \(E_p\), then we must have \(\ell _1 \preceq \ell _2\), since otherwise we would have \(\ell _2 \prec \ell _1\) and, by (8), \(E_p(\ell _2)<E_p(\ell _1)\), a contradiction. \(\square \)

A number p for which the proposition holds is referred to by Wolf et al. [21] as a dominant power. Its existence is proved in that paper; however, no estimate similar to that of \(\delta _Z\) is provided there. The estimate \(\delta _Z\) can be found, in a similar settings, in [6, theorem 5.3]; however, this result does not explicitly relate this number with the lexicographical order.

The proposition immediately implies the next theorem.

Theorem 3

Let \(\delta _Z\) be as in Proposition 1 and assume that \(p\in [\delta _Z,\infty )\) is such that all terms \(\phi _{st}\) are submodular. Then any labeling \(\ell \) minimizing \(E_p\) is a strict minimizer. In particular, if there is a \(\rho \in [1,\infty )\) such that \(\phi \) is \(\rho \)-submodular and \(\infty \)-submodular, then there is a \(p\in [\rho ,\infty )\) such that any \(E_p\)-optimizing label \(\ell \) returned by max-flow/min-cut algorithm is a strict optimizer.

We observe that in practice, the dominant power p may be large. This may give rise to numerical issues when solving the max-flow/min-cut problem, as each local cost is raised to the power p. The novel algorithms proposed in Sects. 5 and 6 do not suffer from his potential issue.

4.2 NP-Hardness of Finding Strict Optimizers

We will now show that, in the general case, the problem of finding strict optimizers is indeed NP-hard. This is justified by an example from Kolmogorov and Zabih [13, Appendix A] that shows that \(L_1\)-optimality for non-submodular energies is NP-hard.

Recall, that the set U of vertices of a graph \(\mathcal {G}=\langle V, \mathcal {E}\rangle \) is independent when it contains no two vertices connected by an edge. It is known that the problem of finding maximal independent set of vertices of an arbitrary graph is NP-hard [7, chapter 34].

In the example, associate the following local costs:

  • for every vertex v of label i, give the cost \(1-i\);

  • for every edge with both vertices of label 1, let the cost be \(N:=|V|+1\);

  • with any other edge, associate the cost 0.

Notice that the max-cost of any labeling \(\ell \) is \(<N\) if, and only if, the set \(U:=\ell ^{-1}(1)\) is independent. Among all labelings \(\ell \) associated with an independent U, the max cost is 1. Moreover, the labeling \(\ell \) is a strict minimizer when the number of cost 1 atoms for U, which is \(|V|-|U|\), is minimal, that is, when the size of U is maximal.

In other words, if for a graph \(\mathcal {G}\) we use the local costs assignments as above, then \(\ell \) is a strict minimizer if, and only if, \(U:=\ell ^{-1}(1)\) is a maximal independent set of vertices. So, our problem is indeed NP-hard, similarly as the problem of finding maximal independent set of vertices.

5 A Quadratic Time Algorithm for Direct Optimization of \(E_{\infty }\)

With these preliminaries in place, we are now ready to introduce a general method for finding a binary labeling that globally optimizes \(E_{\infty }\). Pseudocode for this method is given in Algorithm 1.

figure c

If n is the number of elements, atoms, in \(\mathcal {A}\), then Algorithm 1 terminates after \(O(n^2)\) operations. This is the case, since the execution of line 1 has complexity \(O(n \ln n)\) (as it requires ordering of \(\mathsf {H}\)) while the loop 2–4 is executed n times and each its execution requires O(n) operations, as we indicated after Theorem 2.

Theorem 4

An \(\ell \) returned by Algorithm 1 is a labeling minimizing energy \(E_\infty \).

Proof

The main loop 2–4 is executed precisely n-times, where \(n:=|\mathcal {A}|\).

For every \(k\in \{0,1,\ldots ,n\}\) let \(\mathsf {H}_k\) and \(\mathsf {L}_k\) be the states of \(\mathsf {H}\) and \(\mathsf {L}\), respectively, directly after the kth execution of the loop 2–4. First notice that, for every \(k\in \{0,1,\ldots ,n\}\),

(\(C_k\)):

\(\mathsf {H}_k\cup \mathsf {L}_k\) is consistent.

Clearly \(\mathsf {H}_0\cup \mathsf {L}_0= \mathcal {A}\), is consistent. Also, for every \(k<n\), if \(\mathsf {H}_k\cup \mathsf {L}_k\) is consistent, then so is \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\). Indeed, if during the \((k+1)\)st execution of line 3 an atom A is removed from \(\mathsf {H}_k\), then \(\mathsf {H}_{k+1}=\mathsf {H}_{k}\setminus \{A\}\). If \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent, then \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and (\(C_{k+1}\)) holds. Otherwise, line 4 ensures that \(\mathsf {L}_{k+1}=\mathsf {L}_k\cup \{A\}\) and \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}=\mathsf {H}_k\cup \mathsf {L}_k\) is consistent by (\(C_k\)).

The above shows that \(\mathsf {H}_n\cup \mathsf {L}_n=\mathsf {L}_n\) is consistent, that is, there exists a labeling \(\ell ':V\rightarrow \{0,1\}\) so that \(\mathcal {A}(\ell ')\subseteq \mathsf {L}_n\). To finish the proof that \(\ell =\bigcup \mathsf {L}_n\) is a labeling, we need to show that \(\mathcal {A}(\ell ')= \mathsf {L}_n\).

So see this, first notice that \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\subseteq \mathsf {H}_{k}\cup \mathsf {L}_{k}\) for every \(k<n\). So, \(\mathcal {A}(\ell ')\subseteq \mathsf {L}_n\subseteq \mathsf {H}_{k}\cup \mathsf {L}_{k}\). To see that \(\mathsf {L}_n\subseteq \mathcal {A}(\ell ')\), assume by way of contradiction that there is an \(A\in \mathsf {L}_n\setminus \mathcal {A}(\ell ')\). Then, A is removed from \(\mathsf {H}\) during some, say kth, execution of line 2. So, \(A\notin \mathsf {H}_{k+1}\). Also, if \(A\notin \mathcal {A}(\ell ')\), then \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent, as it contains \(\mathcal {A}(\ell ')\). Therefore, \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and \(A\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n\), a contradiction. This means that \(\mathcal {A}(\ell ')= \mathsf {L}_n\).

Finally, by way of contradiction, assume that \(\ell =\bigcup \mathsf {L}_n\) does not minimize \(E_\infty \), that is, that there is a labeling \(\ell '\) with \(c:=E_\infty (\ell ')<E_\infty (\ell )\). Then, there is an \(A\in \mathcal {A}(\ell )\) of cost \(>c\). Let \(k\le n\) be such that A is removed from \(\mathsf {H}\) during the kth execution of line 2. Then \(A\notin \mathsf {H}_{k+1}\). Also, by the ordering of \(\mathsf {H}\), we have \(\mathcal {A}(\ell ')\subset \mathsf {H}_{k+1}\). So, \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent and \(\mathsf {L}_{k+1}=\mathsf {L}_k\). In particular, \(A\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n=\mathcal {A}(\ell )\), contradicting the fact that \(A\in \mathcal {A}(\ell )\). \(\square \)

5.1 Atoms with Unique Weights

We say that the atoms (in \(\mathcal {A}\)) have unique weights provided the map \(\varPhi :\mathcal {A}\rightarrow [0,\infty )\) is injective, that is, when \(\varPhi (A_1)\ne \varPhi (A_2)\) for every distinct \(A_1,A_2\in \mathcal {A}\). Our main result here is the following

Theorem 5

If the atoms in \(\mathcal {A}\) have unique weights, then the labeling \(\ell \) returned by Algorithm 1 is the unique strict optimizer.

First we prove the uniqueness part of the theorem, in form of the following lemma.

Lemma 1

If the atoms in \(\mathcal {A}\) have unique weights, then the strictly optimal labeling is unique.

Proof

Let \(\ell _1\) and \(\ell _2\) be strictly optimal labelings. We will show that \(\ell _1=\ell _2\).

To see this, consider the sequences of the atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\), respectively, each ordered by decreasing cost. Then, since both labelings are strictly optimal, the decreasing sequences of the costs of the atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\) must be identical. However, since every atom has a unique weight, this means that the sets of atoms in \(\mathcal {A}(\ell _1)\) and in \(\mathcal {A}(\ell _2)\) must themselves be identical. In particular \(\mathcal {A}(\ell _1)=\mathcal {A}(\ell _2)\) and therefore \(\ell _1=\bigcup \mathcal {A}(\ell _1)=\bigcup \mathcal {A}(\ell _2)=\ell _2\), as needed. \(\square \)

Proof of Theorem 5

We will use the same notation as in the proof of Theorem 4. Let \(\ell \) and \(\ell '\) be distinct labelings such that \(\ell \) is strictly optimal and, by way of contradiction, assume that Algorithm 1 returns labeling \(\ell '\) rather than \(\ell \). Fix the sequences \(\langle A_1, A_2, \ldots , A_m\rangle \) and \(\langle B_1, B_2, \ldots , B_m\rangle \) of all atoms in \(\mathcal {A}(\ell )\) and \(\mathcal {A}(\ell ')\), respectively, each ordered by the decreasing costs of atoms. By Lemma 1 , we have \(\ell \prec \ell '\). Therefore, there exists an \(i\in \{1,2,\ldots ,m\}\) such that \(\varPhi (A_i) < \varPhi (B_i)\) and \(\varPhi (A_j) =\varPhi (B_j)\) for all \(j<i\).

Let \(k\le n\) be such that \(B_i\) is removed from \(\mathsf {H}\) during the kth execution of line 2. Then, \(\{B_1, B_2, \ldots , B_m\}=\mathcal {A}(\ell ')\subset \mathsf {L}_n\subset \mathsf {H}_{k}\cup \mathsf {L}_{k}\). In fact, by the ordering principle of \(\mathsf {H}\) we have \(\{A_1, \ldots , A_{i-1}\}=\{B_1, \ldots , B_{i-1}\}\subset \mathsf {H}_{k}\) and \(\{A_i, \ldots , A_{n}\}\subset \mathsf {L}_{k}\). In particular, \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent since it contains \(\{A_1, A_2, \ldots , A_m\}=\mathcal {A}(\ell )\). Thus, \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and \(B_i\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n=\mathcal {A}(\ell ')\), a contradiction that finishes the proof of Theorem 5. \(\square \)

The requirement in Theorem 5 (and the forthcoming Theorem 7) that all atoms in \({{\mathcal {A}}}\) have unique weights may appear restrictive, and for real-world problems, this condition may or may not hold. We will therefore now discuss how these theorems may be interpreted when all atoms weights are not unique. First we observe that when all atom weights are not unique, it is straightforward to define a new local cost function \({\hat{\varPhi }}\) with unique weights and such that, for any atoms \(A,A'\in {\mathcal A}\), \(\varPhi (A)<\varPhi (A')\) implies \({\hat{\varPhi }}(A)<{\hat{\varPhi }}(A')\). Such weights may, e.g., be defined by the following simple procedure:

  • Fix, by some method (e.g., a sorting algorithm), an increasing order of the atoms in \({{\mathcal {A}}}\) by weight, i.e., find a map \(O:{{\mathcal {A}}}\rightarrow Z\) such that \(O(A_1) \not = O(A_2)\) for every distinct \(A_1, A_2, \in \mathcal {A}\) and \(O(A_1) < O(A_2) \Rightarrow \varPhi (A_1) \le \varPhi (A_j)\) for all \(A_1, A_2 \in \mathcal {A}\).

  • For all \(A \in \mathcal {A}\), define \({\hat{\varPhi }}(A) := O(A)\).

By design, all atoms associated with the local costs \({\hat{\phi }}\) have unique weights and thus running Algorithm 1 (or Algorithm 2 in case of Theorem 7) with these weights will return a strict optimizer with respect to the local costs \({\hat{\varPhi }}\).

We observe that if the original atom weights are all unique, then the ordering O is also unique and running either of our new algorithms with the new local costs \({\hat{\varPhi }}\) induced by O would yield an identical result as with the original weights. Furthermore, we observe that the procedure above is essentially what happens during the execution of the algorithms: By ordering the max-priority queue \(\mathsf {H}\), we are establishing a specific (implementation dependent) ordering of the atoms that is increasing by weight just like the ordering O defined in the procedure above. Thus, even when all atoms do not have unique weights, the algorithms will return labelings that are strictly optimal with respect to some increasing order of the atoms by weight. When all atom weights are not unique, however, this ordering will not be unique but will depend on the specific implementation of the max-priority queue \(\mathsf {H}\).

6 A Quasi-Linear Time Algorithm for Direct Optimization of \(E_\infty \) When All Binary Terms are \(\infty \)-Submodular

We now present a more efficient algorithm, previously reported in the conference version of this manuscript [16], for the case when all binary terms are \(\infty \)-submodular. Superficially, this algorithm is slightly more complicated than Algorithm 1. We emphasize, however, that both algorithms have a very similar structure—starting from the set of all possible atoms, both algorithms iteratively remove one atom at a time until the remaining atoms define a unique labeling. The main difference between the algorithms is the steps taken to ensure the consistency of the set of remaining atoms.

6.1 Local Consistency, Incompatible Atoms

We introduce a property of local consistency, which will be used to establish the correctness of our second proposed algorithm. A set of atoms \(\mathcal {A}'\) is said to be locally consistent if, for every vertex \(s\in V\) and edge \(\{s,t\}\in {{\mathcal {E}}}\) there are \(i,j\in \{0,1\}\) such that the atoms \(\{\langle s,i\rangle \}\) and \(\{\langle s,i\rangle ,\langle t,j\rangle \}\) both belong to \({{\mathcal {A}}}'\) (i.e., that \({{\mathcal {A}}}'\) still allows that s will have some label). Clearly, any consistent set of atoms is also locally consistent. However, in general, local consistency does not imply consistency.Footnote 8

Furthermore, we introduce the notion of an incompatible atom, which will be needed for the exposition of the proposed algorithm. For a given set of \({{\mathcal {A}}}'\) of atoms, we say that an atom \(A\in {{\mathcal {A}}}'\) is (locally) incompatible (w.r.t. \({{\mathcal {A}}}'\)) if either

  1. 1.

    A is a unary atom so that \(A=\{\langle v,i\rangle \}\) for some vertex v, and there exists some edge \(\{v,w\}\) adjacent to v such that \({{\mathcal {A}}}'\) contains neither \(\{\langle v,i\rangle ,\langle w,0\rangle \}\) nor \(\{\langle v,i\rangle ,\langle w,1\rangle \}\); or

  2. 2.

    A is a binary atom so that \(A=\{\langle v,i\rangle ,\langle w,j\rangle \}\) for some edge \(\{v,w\}\), and at least one of \(\{\langle v,i\rangle \}\) and \(\{\langle w,j\rangle \}\) is not in \({{\mathcal {A}}}'\).

Note that a locally consistent set of atoms may still contain incompatible atoms.

6.2 The Second Algorithm

We now introduce the proposed algorithm, with quasi-linear time complexity, for finding a binary label assignment \(\ell :V\rightarrow \{0,1\}\) that globally minimizes the objective function \(E_\infty \) given by (1), under the condition that all pairwise terms in the objective function are \(\infty \)-submodular. If, additionally, all atoms have unique weights then the labeling returned by the algorithm is also the strict minimizer. Informally, the general outline of the proposed algorithm is as follows:

  • Start with a set S consisting of all possible atoms and an initially empty set I of atoms identified as incompatible. (Recall that the total number of atoms is \(\mathcal {O}(|V|+|{{\mathcal {E}}}|)\).)

  • For each atom A, in order of decreasing cost \(\varPhi (A)\):

    • If A is still in S, and is not the only remaining atom for that vertex/edge, remove A from S.

    • After the removal of A, S may contain incompatible atoms. Iteratively remove all such incompatible atoms until S contains no more incompatible atoms.

Before we formalize this algorithm, we introduce a specific preordering relation \(\gg \) on the atoms \({{\mathcal {A}}}\). For \(A_0,A_1\in {{\mathcal {A}}}\), we will write \(A_0 \gg A_1\) if either \(\varPhi (A_0)>\varPhi (A_1)\), or else \(\varPhi (A_0)=\varPhi (A_1)\) and \(A_1\) is a binary atom of the form \(\{\langle s,i\rangle ,\langle t,i\rangle \}\) (equal labeling) while \(A_0\) is not in this form.

With these preliminaries in place, we are now ready to introduce the proposed algorithm, for which pseudocode is given in Algorithm 2.

figure d

6.3 Computational Complexity

We now analyze the asymptotic computational complexity of Algorithm 2. First, let \(\eta :=|\mathcal {A}|=2|V|+4|\mathcal {E}|\). In image processing applications the graph \(\mathcal {G}\) is commonly sparse, in the sense that \(\mathcal {O}(|V|)=\mathcal {O}(|\mathcal {E}|)\). In this case, we have \(\mathcal {O}(\eta )=\mathcal {O}(|V|)\).

Creating the list \(\mathsf {H}\) requires us to sort all atoms in \(\mathcal {A}\). The sorting can be performed in \(\mathcal {O}(\eta \log \eta )\) time. In some cases, e.g., if all unary and binary terms are integer valued, the sorting may be possible to perform in \(\mathcal {O}(\eta )\) time using, e.g., radix or bucket sort.

We make the reasonable assumption that the following operations can all be performed in \(\mathcal {O}(1)\) time:

  • Remove an atom from \(\mathsf {H}\).

  • Remove an atom from \(\mathsf {A}(D)\).

  • Remove or insert elements in \(\mathsf {K}\).

  • Given an atom, find its corresponding edge or vertex.

  • Given a vertex, find all edges incident at that vertex.

  • Given an edge, find the vertices spanned by the edge.

The combined number of the executions of the main loop, lines 3-12, and of the internal loop, lines 7–12, equals to \(|{{\mathcal {A}}}|\), that is, \(\mathcal {O}(\eta )\). This is so, since any insertion of an atom into \(\mathsf {K}\) requires its prior removal from the list \(\mathsf {H}\). If the assumptions above are satisfied, it is easily seen that only \(\mathcal {O}(1)\) operations are needed between consecutive removals of an atom from \(\mathsf {H}\). Therefore, the amortized cost of the execution of the main loop is \(\mathcal {O}(\eta )\).

Thus, the total computational cost of the algorithm is bounded by the time required to sort \(\mathcal {O}(\eta )\) elements, i.e., at most \(\mathcal {O}(\eta \log \eta )\).

6.4 Proof of Correctness

Theorem 6

If all binary terms of the cost function \(\varPhi :{{\mathcal {D}}}\rightarrow [0,\infty )\) associated with graph \(\mathcal {G}=\langle V,\mathcal {E}\rangle \) are \(\infty \)-submodular, then \(\ell \) returned by Algorithm 2 is a labeling of V minimizing the objective function \(E_\infty \).

Let \(\mathsf {n}:=|V|+3|\mathcal {E}|\), the number of removals of an atom from \(\mathsf {A}\). For every \(D\in {{\mathcal {D}}}\) and \(k\in \{0,\ldots ,\mathsf {n}\}\) let \(\mathsf {A}_k[D]\) be equal to the value of \(\mathsf {A}[D]\) directly after the k-th removal of some atom(s) from \(\mathsf {A}\), which can happen only as a result of execution of either line 6 or line 10. (For \(k=0\) we mean, directly after the execution of line 2.) Let \({{\mathcal {A}}}_k=\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}_k[D]\).

Let \(1=k_1<\cdots <k_m\) be the list of all values of \(k\in \{1,\ldots ,\mathsf {n}\}\) such that \({{\mathcal {A}}}_k\) is a proper refinement of \({{\mathcal {A}}}_{k-1}\) resulting from the execution of line 6. Note that it is conceivable that the numbers \(k_j\) and \(k_{j+1}\) are consecutive—this happens when the execution of loop 8-12 directly after the execution of line 5 has been used to create \({{\mathcal {A}}}_{k_j}\) resulted in removal of no atoms from \({{\mathcal {A}}}_{k_j}\).

The proof of Theorem 6 is based on the following Lemma, for which a proof is given in Appendix Section.

Lemma 2

During the execution of Algorithm 2, the following properties hold for every \(k\le \mathsf {n}\).

(P0):

For every edge \(D=\{v,w\}\), if \(\mathsf {A}_k[D]\) is missing either \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,1\rangle ,\langle w,1\rangle \}\), then it must be also missing \(\{\langle v,1\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\).

(P1):

\(\mathsf {A}_k[D]\) contains at least one atom for every \(D\in {{\mathcal {D}}}\).

(P2):

\({{\mathcal {A}}}_k\) is locally consistent.

(P3):

\({{\mathcal {A}}}_k\) has no incompatible atoms directly before any execution of line 4.

Proof of Theorem 6

Beside Lemma 2, we still need to argue for two facts. First notice that the algorithm does not stop until all buckets \(\mathsf {A}_\mathsf {n}[D]\), \(D\in {{\mathcal {D}}}\), have precisely one element. Thus, since \({{\mathcal {A}}}_\mathsf {n}\) is locally consistent, \(\ell =\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) is indeed a function from V into \(\{0,1\}\).

To finish the proof, we need to show that \(\ell \) indeed minimizes energy \(E_\infty \). For this, first notice that at any time of the execution of the algorithm, any atom in \(\mathsf {H}\) is also in \(\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\). Indeed, these sets are equal immediately after the initialization and we remove from \(\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) only those atoms, that have been already removed from \(\mathsf {H}\). Now, let \(L:V\rightarrow \{0,1\}\) be a labeling minimizing \(E_\infty \). We claim that the following property holds any time during the execution of the algorithm:

(P):

if \(\varPhi (A')>E_\infty (L)\) for some \(A'\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\), then \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\).

Indeed, it certainly holds immediately after the initialization. This cannot be changed during the execution of line 6 when the assumption is satisfied, since then A considered there has just been removed from \(\mathsf {H}\supset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) and

$$\begin{aligned}&\varPhi (A) \ge \max _{H\in \mathsf {H}}\varPhi (H) \ge \max _{H\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]}\varPhi (H) \\&\quad \ge \varPhi (A')>E_\infty (L) =\max _{H\in {{\mathcal {A}}}[L]}\varPhi (H), \end{aligned}$$

so \(A\notin {{\mathcal {A}}}[L]\). Also, (P) is not affected by an execution of line 10, since the inclusion \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) is not affected by it: no atom in \({{\mathcal {A}}}[L]\) is incompatible with \({{\mathcal {A}}}[L]\) so also with \( \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\). This concludes the proof of (P).

Now, by the property (P), after the termination of the main loop, we have either \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\), in which case \(\ell =L\) have minimal \(E_\infty \) energy, or else

$$\begin{aligned} E_\infty (L)\ge \max _{H\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]}\varPhi (H)= \max _{H\in \mathsf {H}}{{\mathcal {A}}}[\ell ]=E_\infty (\ell ) \end{aligned}$$

once again ensuring optimality of \(\ell \). \(\square \)

Theorem 7

If the atoms in \(\mathcal {A}\) have unique weights, then the labeling \(\ell \) returned by Algorithm 2 is the unique strict optimizer.

Proof

The uniqueness part of the theorem is already shown in Lemma 1. The rest of the argument is essentially identical to that used in the proof of Theorem 5. \(\square \)

7 NP-Hardness of Multi-label \(E_\infty \)-optimization

We will now show that, for a number of labels \(K>2\), the problem of finding a labeling that minimizes \(E_\infty \) is NP-hard in the general case.

Recall that a K-coloring of a graph is a mapping \(c:V \rightarrow \{1,2,\ldots , K\}\) such that \(c(s) \not = c(t)\) for every edge \(\{s,t\} \in \mathcal {E}\). The K-coloring problem consists of determining whether a given undirected graph admits a K-coloring. Recall also that already 3-coloring problem is NP-complete [7, chapter 34].

To see that optimization of \(E_\infty \) is NP-hard for \(K>2\) labels, consider 3 labelings, where we associate the costs:

  • for every vertex v the cost of any label assignment is 0;

  • for any edge with distinct labeling of its vertices the cost is 0;

  • for any edge with the same labeling of its vertices the cost is 1.

For such assignments, the \(E_\infty \)-energy of a labeling is \(\le 0\) if, and only if, the labeling is a 3-coloring. The same argument can be repeated also for \(K>3\). Thus, the problem of \(E_\infty \)-optimization with \(K>2\) labels is indeed NP-hard.

Table 1 Summary of results: subclasses of the general max-norm optimization problem considered here, and algorithms for solving them

8 Conclusions

We have presented two algorithms for finding a binary vertex labeling of a graph that globally minimizes objective functions of the form \(E_\infty \). It is well known that for a limited subclass of such problems, globally optimal solutions can be found by computing an optimal spanning forest on a suitably constructed graph. Such optimal spanning forests can, in turn, be computed using very efficient, greedy algorithms. Despite the fact that this optimum spanning forest approach is commonly used in many image processing applications, the potential and limitations of this method in terms of more general optimization problems are, to the best of our knowledge, largely unexplored. The exact class of max-norm optimization problems that can be solved using efficient greedy algorithms, or even in polynomial time, has remained unknown. By the introduction of the two proposed algorithms, we show that the class of such problems that can be solved in (low-order) polynomial time is indeed larger than what was previously known. In Table 1, we provide a summary of the various subclasses of the general optimization problem considered in this paper, and algorithms for solving them.

An important observation here is the following: Optimization binary labeling problems with objective functions of the form \(E_1\) frequently occur in image processing and computer vision applications. The max-flow/min-cut approach proposed by Kolmogorov and Zabih [13] still remains one of the primary methods for solving such problems when all pairwise terms are submodular. When the local cost functionals include non-submodular terms, however, the same problem becomes NP-hard. As concluded in our discussion in Sect. 2.1, similar submodularity requirements hold also for the generalized objective functions \(E_p\) for any finite p. Practitioners looking to solve such optimization problems must therefore first verify that their local cost functional satisfies the appropriate submodularity conditions. If this is not the case, they must resort to approximate optimization methods that may or may not produce satisfactory results for a given problem instance. Here we show, by the introduction of Algorithm 1, that in the limit as p goes to infinity, the requirement for submodularity of the pairwise terms disappears. Indeed Algorithm 1 returns, in low-order polynomial time, a \(E_\infty \)-minimal binary labeling for any local cost functional. Thus, even when the local costs are such that the problem of minimizing \(E_p\) is NP-hard for some or all finite p, a labeling minimizing \(E_\infty \) can be found in low order polynomial time.

The motivation for our work comes from image processing applications, and the local cost functionals we consider naturally occurs in many image processing problems. The two proposed algorithms, however, are formulated for general graphs and may thus also have applications to other applied problems in computer science. Structurally, both the proposed algorithms resemble Kruskal’s algorithm [7, 14], and in this sense the proposed algorithms can be seen as generalizations of the optimum spanning forest approach to optimization.

Algorithm 1 has quadratic time complexity and is thus less efficient than Algorithm 2. It appears likely, however, that the time complexity of Algorithm 1 could be reduced further. Specifically, Algorithm 1 operates by solving a series of n 2-satisfiability problem. In the proposed algorithm each such problem is solved in isolation, but we observe that there is a high degree of similarity between each consecutive problem—each 2-satisfiability problem differs from the previous one only by the introduction of one additional disjunction of two literals. Exploring whether this redundancy can be utilized to formulate a more efficient version of Algorithm 1 is an interesting direction for future work.

Another natural extension of the work presented here is to consider optimization with more than two labels. In Sect. 7, we showed that for more than two labels finding a labeling that is optimal according to \(E_\infty \) is NP-hard in the general case. Nevertheless, as can be seen in Table 1, there are special cases of multilabel max-norm problems that can be solved using Prim’s algorithm. Determining the class of multilabel problems that can be solved in low-order polynomial time is an interesting direction for future work.

At first glance, the restriction to binary labeling may appear very limiting. We note, however, that many successful methods for approximate multi-label optimization rely on iteratively minimizing binary labeling problems via move-making strategies [4]. Thus, the ability to find optimal solutions for problems with two labels potentially has a high relevance also for the multi-label case.