Two Polynomial Time Graph Labeling Algorithms Optimizing Max-Norm-Based Objective Functions


Many problems in applied computer science can be expressed in a graph setting and solved by finding an appropriate vertex labeling of the associated graph. It is also common to identify the term “appropriate labeling” with a labeling that optimizes some application-motivated objective function. The goal of this work is to present two algorithms that, for the objective functions in a general format motivated by image processing tasks, find such optimal labelings. Specifically, we consider a problem of finding an optimal binary labeling for the objective function defined as the max-norm over a set of local costs of a form that naturally appears in image processing. It is well known that for a limited subclass of such problems, globally optimal solutions can be found via watershed cuts, that is, by the cuts associated with the optimal spanning forests of a graph. Here, we propose two new algorithms for optimizing a broader class of such problems. The first algorithm, that works for all considered objective functions, returns a globally optimal labeling in quadratic time with respect to the size of the graph (i.e., the number of its vertices and edges) or, for an image associated graph, the size of the image. The second algorithm is more efficient, with quasi-linear time complexity, and returns a globally optimal labeling provided that the objective function satisfies certain given conditions. These conditions are analogous to the submodularity conditions encountered in max-flow/min-cut optimization, where the objective function is defined as sum of all local costs. We will also consider a refinement of the max-norm measure, defined in terms of the lexicographical order, and examine the algorithms that could find minimal labelings with respect to this refined measure.


Many fundamental problems in image processing and computer vision, such as image filtering, segmentation, registration, and stereo vision, can naturally be formulated as optimization problems. Often, these optimization problems can be described as labeling problems in which we wish to assign to each image element (pixel or vertex of an associated graph) \(v\in V\) an element \(\ell (v)\) from some finite K-element set of labels, usually \(\{0,\ldots ,K-1\}\). The interpretation of these labels depends on the optimization problem at hand. In image segmentation, the labels might indicate object categories. In registration and stereo disparity problems, the labels represent correspondences between images, and in image reconstruction and filtering the labels represent intensities in the filtered image.

In what follows an undirected graph \(\mathcal {G}\) is identified with a pair \(\langle V, \mathcal {E}\rangle \), where V is its set of vertices and \(\mathcal {E}\) is the set of its edges. Each edge connecting vertices s and t is identified with a pair \(\{s,t\}\). We make the assumption that the vertices in V are linearly ordered, and let \( {\hat{{{\mathcal {E}}}}}:=\{\langle s,t\rangle \in V^2:\{s,t\}\in \mathcal {E}\ \& \ s<t\}\).

Our new algorithms have no restriction on the format of the graph to which they can be applied. However, in what follows we will often treat \(\mathcal {G}\) as associated with a digital image. In this case, V is the set of all pixels of the image, while \(\mathcal {E}\) is the set of pairs \(\{s,t\}\) of vertices/pixels that are adjacent according to some given adjacency relation.

In this paper, we seek the vertex label assignments \(\ell :V\rightarrow \{0,1,\ldots , K-1\}\) of the undirectedFootnote 1 graphs \(\mathcal {G}=(V, \mathcal {E})\) that minimize a given objective (energy) function \(E_\infty \) of the form

$$\begin{aligned} E_\infty (\ell ) : = \max \bigl \{\max _{s\in V} \phi _s(\ell (s)), \max _{\langle s,t\rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi _{st}(\ell (s),\ell (t))\bigr \}.\!\!\!\!\! \end{aligned}$$

The functions \(\phi _s(\cdot )\) are referred to as unary terms. The value of \(\phi _s(j)\) depends explicitly only on the label \(j\in \{0,1,\ldots , K-1\}\),

but typically is also based on some prior information. These terms are used to indicate a preference for a vertex/pixel s to be assigned a particular label j.

The functions \(\phi _{st}(\cdot ,\cdot )\) are referred to as pairwise or binary terms. The value of \(\phi _{st}(\cdot ,\cdot )\) depends simultaneously on the labels assigned to the vertices/pixels s and t, and thus introduces a dependency between the labels of different pixels. Typically, this dependency between pixels is used to express an expectation that the desired solution should have some degree of smoothness or regularity.

The unary and pairwise terms taken together form the local costs error measures we mentioned in the abstract (and forming the functional \(\varPhi \) defined in Sect. 3). The same local costs are used in the \(L_1\)-norm energy \(E_1\), that we discuss briefly in the next section.

Finding a labeling that globally minimizes an objective function of the form \(E_\infty \) is generally a challenging computational task—in Sect. 7, we show that this problem is in fact NP-hard in the general case, for \(K > 2\). As we will see, however, there exist restricted classes of local cost functionals for which efficient algorithms can be formulated.

In the conference version of this paper [16], we introduced an algorithm for finding a binary labeling (i.e., with \(K=2\)) and showed that the labeling it returns is always \(E_\infty \)-optimal as long as all pairwise local cost terms \(\phi _{st}\) are \(\infty \)-submodular, that is, that they satisfy the condition

$$\begin{aligned} \max \{\phi _{st}(0,0),\phi _{st}(1,1)\} \le \max \{\phi _{st}(1,0),\phi _{st}(0,1)\}.\!\!\! \end{aligned}$$

This algorithm, presented in Sect. 6, is very efficient, with quasi-linear time complexity.Footnote 2 An important question left open in our previous work [16] was whether it is possible to optimize objective function \(E_\infty \) in polynomial time without any additional assumptions on the local cost functional, like that of \(\infty \)-submodularity needed for the algorithm from [16]. Here, we answer this question affirmatively by presenting in Sect. 5 an algorithm that produces, in \(\mathcal {O}((|V|+|\mathcal {E}|)^2)\) time, a binary labeling that is globally \(E_\infty \)-optimal for any local cost functional.

Background and Related Work

\(L_p\) Norm Objective Functions and Minimal Graph Cuts

While the main focus of this paper is to find efficient algorithms for the direct optimization of objective functions of the form \(E_\infty \), we will start by discussing the more general problems of optimizing \(L_p\) norm objective functions for \(p\in [1,\infty ]\).

In their seminal work, Kolmogorov and Zabih [13] considered binary labeling problems for the \(L_1\)-norm-based objective function of the form

$$\begin{aligned} E_1(\ell ) : = \sum _{s\in V} \phi _s(\ell (s))+\sum _{\langle s,t\rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi _{st}(\ell (s),\ell (t)) \end{aligned}$$

and showed that a globally optimal binary labeling can be found by solving a max-flow/min-cut problem on a suitably constructed graph under the condition that all pairwise terms \(\phi _{st}\) are submodular, that is, that they satisfy the inequality

$$\begin{aligned} \phi _{st}(0,0)+\phi _{st}(1,1)\le \phi _{st}(0,1)+\phi _{st}(1,0). \end{aligned}$$

Looking at the objective functions \(E_1\) and \(E_\infty \), we can view them both as consisting of two parts:

  • The local error measures, in our case expressed by the unary and pairwise terms.

  • A global error measure, aggregating the local errors into a final score.

In the case of \(E_1\), the global error measure is obtained by summing all local error measures; in the case of \(E_\infty \), the global error measure is taken to be the maximum of all local error measures. If we assume for a moment that all local error measurements are nonnegative, then \(E_1\) can be seen as measuring the \(L_1\)-norm of a vectorFootnote 3 containing all local costs/errors. Similarly, \(E_\infty \) can be interpreted as the \(L_\infty \)- (or max-) norm of the same vector. The \(L_1\) and \(L_\infty \) norms are both the special cases of \(L_p\) norms, with \(p\in [1,\infty ]\), which for finite p are defined as

$$\begin{aligned} E_p(\ell ) : = \left( \sum _{s\in {V}} \phi _s^p(\ell (s))+\sum _{\langle s,t \rangle \in { {\hat{{{\mathcal {E}}}}}}} \phi ^p_{st}(\ell (s),\ell (t)) \right) ^{1/p},\!\!\! \end{aligned}$$

where \(\phi _s^p(\cdot )=(\phi _s(\cdot ))^p\) and \(\phi ^p_{st}(\cdot ,\cdot )=(\phi _{st}(\cdot ,\cdot ))^p\). The value \(p\in [1,\infty ]\) can be seen as a parameter controlling the balance between minimizing the overall cost versus minimizing the magnitude of the individual terms. For \(p=1\), the optimal labeling may contain arbitrarily large individual terms as long as the sum of the terms is small. As p increases, a larger penalty is assigned to solutions containing large individual terms. In the limit as p approaches infinity, \(E_p\) approaches \(E_\infty \) and the penalty assigned to a solution is determined by the largest individual term only. The limit behavior of \(L_p\) norm optimizers as p approaches \(\infty \) has also been studied in, e.g., [8, 18, 20]. Abbas and Swoboda [1] considered optimization of mixed optimization problems, where the objective function contains both \(L_1\) and \(L_\infty \) terms.

Labeling problems with objective functions of the form \(E_p\), for \(p\in [1,\infty )\), can be solved using minimal graph cuts, provided that all pairwise terms \(\phi ^p_{st}\) are p-submodular [17]. A binary term \(\phi \) is said to be p-submodular if the corresponding term \(\phi ^p\) is submodular, which is equivalent to the condition

$$\begin{aligned} (\phi _{st}^p(0,0)+\phi _{st}^p(1,1))^{1/p} \le (\phi _{st}^p(0,1)+\phi _{st}^p(1,0))^{1/p}. \end{aligned}$$

In the limit, as p goes to infinity, this inequality becomes

$$\begin{aligned} \max \{\phi _{st}(0,0),\phi _{st}(1,1)\} \le \max \{\phi _{st}(1,0),\phi _{st}(0,1)\}, \end{aligned}$$

that is, the \(\infty \)-submodularity condition (2). As observed by Malmberg and Strand [17], 1-submodularity does not necessarily imply p-submodularity.Footnote 4 The following theorem was shown by Malmberg and Strand [17]:

Theorem 1

If a binary term \(\phi \) is 1-submodular and \(\infty \)-submodular, then it is also p-submodular for any real \(p \ge 1\).

We note here that Theorem 1 implies also the following seemingly stronger result.

Corollary 1

Let \(\phi \) be a binary term. Then for every \(\rho \in [1,\infty )\) the following conditions are equivalent.

  1. (i)

    \(\phi \) is \(\rho \)-submodular and \(\infty \)-submodular.

  2. (ii)

    \(\phi \) is p-submodular for every \(p\in [\rho ,\infty )\).


To see that (ii) implies (i) notice that the p-submodularity inequality (6) can be written as

$$\begin{aligned} \Vert \langle \phi _{st}(0,0),\phi _{st}(1,1)\rangle \Vert _p \le \Vert \langle \phi _{st}(0,1),\phi _{st}(1,0)\rangle \Vert _p. \end{aligned}$$

Since the \(L_p\) norm converges to the \(L_\infty \) norm, as p goes to infinity, the limit of both sides of the above inequality becomes

$$\begin{aligned} \Vert \langle \phi _{st}(0,0),\phi _{st}(1,1)\rangle \Vert _\infty \le \Vert \langle \phi _{st}(0,1),\phi _{st}(1,0)\rangle \Vert _\infty , \end{aligned}$$

that is, the \(\infty \)-submodularity condition (2).

To see that (i) implies (ii) assume that \(\phi _{st}\) satisfies (i). Then \(\phi _{st}^\rho \) is both 1-submodular (raise both sides of the inequality (2) with \(p=\rho \) to the power \(\rho \)) and \(\infty \)-submodular (as the map \(x^\rho \) is increasing on \((0,\infty )\)). In particular, \(\phi _{st}\) satisfies the assumptions of Theorem 1. Therefore, for every \(p\in [\rho ,\infty )\) it is \(\frac{p}{\rho }\)-submodular, that is, satisfies

$$\begin{aligned} (\phi _{st}^{\rho \frac{p}{\rho }}(0,0)+\phi _{st}^{\rho \frac{p}{\rho }}(1,1))^{\rho /p} \le (\phi _{st}^{\rho \frac{p}{\rho }}(0,1)+\phi _{st}^{\rho \frac{p}{\rho }}(1,0))^{\rho /p}. \end{aligned}$$

But this clearly implies p-submodularity of \(\phi _{st}\). \(\square \)

Optimization of \(E_\infty \) by Classical Algorithms

In Sect. 4, we will show that if the binary terms \(\phi \) satisfy (i) of Corollary 1, then an optimal labeling for the associated energy \(E_\infty \) can be found by solving an appropriate max-flow/min-cut problem.

Moreover, it turns out that in some problem instances a labeling that is globally optimal with respect to \(E_\infty \) can be found using very efficient, greedy algorithms. Specifically, if

  1. (D)

    all pairwise terms are such that \(\phi _{st}(1,0)=\phi _{st}(0,1)\) and \(\phi _{st}(0,0)=\phi _{st}(1,1)=0\), while all unary terms have values in \(\{0,\infty \}\),

then an optimal labeling for the associated energy \(E_\infty \) can be found by computing the partitioning induced by an optimum spanning forest on a suitably constructed graph using, e.g., Prim’s algorithm [7, 19]Footnote 5. See more on this in Sect. 4. This property of optimum spanning forests has been observed by several authors [2, 6, 8]. This result has a high practical value since the computation time for constructing an optimal spanning forest is substantially lower than the computation time for solving a max-flow/min-cut problem, asymptotically as well as in practice [8].

Wolf et al. [21,22,23] recently proposed various extension of this greedy approach and also reported state-of-the-art results on various image segmentation benchmarks. We note also that the notion of partitioning an image-induced graph by computing an optimum spanning forest is tightly connected to the classic watershed image segmentation method [9, 10].

Based on the above, an interesting question is therefore whether it is possible to use similar greedy techniques to optimize the objective function \(E_\infty \) beyond the special case when the local costs satisfy property (D). The results presented in this paper answer this question affirmatively and show that the class of \( E_\infty \) optimization problems that are solvable by the efficient greedy algorithms is larger than what was previously known.

Algorithms for Direct Optimization of \(E_\infty \): Preliminaries

In Sects. 5 and 6, we will introduce two novel algorithms, each finding a binary labeling minimizing \(E_{\infty }\).

The exposition of these algorithms relies on the notion of unary and binary solution atoms, which we introduce in this section. Informally, a unary atom represents one possible label configuration for a single vertex, and a binary atom represents a possible label configuration for a pair of adjacent vertices. Thus, for a binary labeling problem, there are two atoms associated with every vertex and four atoms for every edge. The total number of atoms for a binary labeling problem is thus \(\mathcal {O}(|V|+|{{\mathcal {E}}}|)\).

Formally, we let \(\mathcal {V}=\{\{v\}:v\in V\}\), put \(\mathcal {D}=\mathcal {V}\cup \mathcal {E}\), and let \(\mathcal {A}\) be the family of all binary maps from \(D\in \mathcal {D}\) into \(\{0,1\}\). An atom, in this notation, is an element of \(\mathcal {A}\). If we identify, as it is common, maps with their graphs then each unary atom associated with a vertex \(s\in V\) has form \(\{\langle s,i\rangle \}\), with \(i\in \{0,1\}\). Similarly, each binary atom associated with an edge \(\{s,t\}\in \mathcal {E}\) has the form \(\{\langle s,i\rangle ,\langle t,j\rangle \}\), with \(i,j\in \{0,1\}\).

Notice, that the maps \(\phi _s\) and \(\phi _{st}\) used for the unary and binary terms in (1) can be combined to form a single function \(\varPhi :\mathcal {A}\rightarrow [0,\infty )\) defined, for every \(A\in \mathcal {A}\), as

$$\begin{aligned} \varPhi (A):= {\left\{ \begin{array}{ll} \phi _s(i) &{} \text{ for } A=\{\langle s,i\rangle \},\\ \phi _{s,t}(i,j) &{} \text{ for } A=\{\langle s,i\rangle ,\langle t,j\rangle \}. \end{array}\right. } \end{aligned}$$

For a given labeling \(\ell \), we define \(\phi _\ell :\mathcal {D}\rightarrow [0,\infty )\), for every \(D\in \mathcal {D}\), as \(\phi _\ell (D):= \varPhi (\ell \restriction D)\), that is,

$$\begin{aligned} \phi _\ell (D):= {\left\{ \begin{array}{ll} \phi _s(\ell (s)) &{} \text{ for } D=\{s\}\in \mathcal {V},\\ \phi _{s,t}(\ell (s),\ell (t)) &{} \text{ for } D=\{s,t\}\in \mathcal {E}, \end{array}\right. } \end{aligned}$$

where \(\ell \restriction D\) is the restriction of \(\ell \) to D. With this notation, we may write the objective function \(E_\infty \) as

$$\begin{aligned} E_\infty (\ell )=\Vert \phi _\ell \Vert _\infty = \max _{D\in \mathcal {D}} {\phi _\ell (D)}. \end{aligned}$$

Similarly, \(E_p(\ell )=\Vert \phi _\ell \Vert _p\) for any \(p\in [1,\infty )\).


Conceptually, both the proposed algorithms work as follows: Starting from the set of all possible unary and binary atoms, the algorithm iteratively removes one atom at a time until the remaining atoms define a unique labeling. A key issue in this process is to ensure that, at all steps of the algorithm, at least one labeling can be constructed from the set of remaining atoms.

Let \(\ell \) be a binary labeling. We define \(\mathcal {A}(\ell )\), the atoms for \(\ell \), as the family

$$\begin{aligned} \mathcal {A}(\ell ): =\{\ell \restriction D:D\in \mathcal {D}\}. \end{aligned}$$

Notice that \(\ell \) can be easily recovered from \(\mathcal {A}(\ell )\) as its union: \(\ell =\bigcup \mathcal {A}(\ell )\).

Definition 1

Let \(\mathcal {A}' \subset \mathcal {A}\) be a set of atoms. We say that \(\mathcal {A}'\) is consistent if there exists at least one labeling \(\ell \) such that \(\mathcal {A}(\ell )\subseteq \mathcal {A}'\).

We will now derive one of our main results, namely that the problem of determining whether a given set of atoms is consistent can be formulated as a 2-satisfiability problem. The 2-satisfiability problem is a well-studied problem in computer science, and several efficient algorithms exists for its solution. This result quite directly leads to Algorithm 1, presented in Sect. 5, for finding a labeling minimizing \(E_\infty \).

For a set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms denote by \(\bar{\mathcal {A}' }\) the complement of \(\mathcal {A}' \) relative to \(\mathcal {A}\), that is, \(\bar{\mathcal {A}' }:={\mathcal {A}}\setminus {\mathcal {A}'}\). Then \(\mathcal {A}'\) is consistent if, and only if, there exists a labeling \(\ell \) such that \(\mathcal {A}(\ell ) \cap \bar{\mathcal {A}'} = \emptyset \). We will show that the existence of such labeling \(\ell \) can be determined by solving a 2-satisfiability problem.

For this, let’s treat any vertex \(v\in V\) of our graph as a variable of propositional calculus, that is, a variable that can take two possible values: TRUE, which will be identified with number 1, and FALSE, which will be identified with 0. Upon such identification, any labeling \(\ell :V\rightarrow \{0,1\}\) can be treated as a truth functional.

Now, with any unary atom \(A=\{\langle s,i\rangle \}\), with \(i\in \{0,1\}\), we associate a propositional calculus formula in a very simple format known as literal (i.e., a variable or its negation):

$$\begin{aligned} \psi _A(s):={\left\{ \begin{array}{ll} \lnot s&{} \text{ if } i=1,\\ s &{} \text{ if } i=0. \end{array}\right. } \end{aligned}$$

Less formally, but more concisely, \(\psi _A(s):=``s\ne i\).” Notice that \(\ell :V\rightarrow \{0,1\}\) disagrees with A if, and only if, \(\psi _A\) is satisfied by \(\ell \) treated as a truth functional.

Similarly, for every binary atom \(A=\{\langle s,i\rangle ,\langle t,j\rangle \}\) we define

$$\begin{aligned} \psi _A(s,t):=\psi _{\{\langle s,i\rangle \}}(s)\vee \psi _{\{\langle t,j\rangle \}}(t) \end{aligned}$$

or, equivalently, as \(``(s\ne i) \vee (t\ne j)\).” Once again, \(\ell :V\rightarrow \{0,1\}\) disagrees with A if, and only if, \(\psi _A\) is satisfied by \(\ell \) treated as a truth functional.

Finally, for a set \(\mathcal {A}' =\{A_1, A_2, \ldots , A_m\}\) of atoms define

$$\begin{aligned} \psi _{\mathcal {A}'}:= \bigwedge \limits _{i=1}^m \psi _{A_i}=\psi _{A_1}\wedge \cdots \wedge \psi _{A_m}. \end{aligned}$$

Also, \(\ell :V\rightarrow \{0,1\}\) disagrees with every \(A\in \mathcal {A}' \) if, and only if, \(\psi _{\mathcal {A}'}\) is satisfied by \(\ell \). Notice also that the formula \(\psi _{\mathcal {A}'}\) is in the so-called 2-conjunctive normal form, that is, it is a conjunction of formulas \(\psi _{A_i}\), each of which is a disjunction of at most two literals.

The above discussion leads to the following result.

Theorem 2

A set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms is consistent if, and only if, the 2-satisfiability problem for a formula \(\psi _{\bar{\mathcal {A}'}}\) has a positive solution.


This follows from the equivalence of the following conditions, each consecutive pair of which was argued above.

  • \(\mathcal {A}' \subseteq \mathcal {A}\) is consistent.

  • \(\mathcal {A}(\ell ) \cap \bar{\mathcal {A}'} = \emptyset \) for some \(\ell :V\rightarrow \{0,1\}\).

  • There is an \(\ell :V\rightarrow \{0,1\}\) which disagrees with every \(A\in \bar{\mathcal {A}'}\).

  • There is an \(\ell :V\rightarrow \{0,1\}\) such that \(\psi _{\bar{\mathcal {A}'}}\) is satisfied by \(\ell \).

  • The 2-satisfiability problem for a formula \(\psi _{\bar{\mathcal {A}'}}\) has a positive solution.\(\square \)

Recall that the solution to the 2-satisfiability problem for a formula in the 2-conjunctive normal form that is a conjunction of n 2-disjunctions can be found in \(\mathcal {O}(n)\) time, using, e.g., the algorithm by Aspvall et al. [3]. Thus, for any set \(\mathcal {A}' \subseteq \mathcal {A}\) of atoms, the question

Is \(\mathcal {A}'\) consistent?

can be answered in a linear time with respect to the number \(n:=|\bar{\mathcal {A}'}|\) of elements in \(\bar{\mathcal {A}'}={\mathcal {A}}\setminus {\mathcal {A}'}\) by deciding the satisfiability of \(\psi _{\bar{\mathcal {A}'}}\).

Strict Optimality

In this section, we will introduce a refinement of the \(L_\infty \) norm measure. This will help us in the discussion of the two proposed algorithms, which will be introduced in the next two sections.

A potential drawback of the \(L_\infty \)-norm is that it does not distinguish between solutions with high or low errors below the maximum error. To resolve this problem, Levi and Zorin introduced, in a 2014 paper [15], the concept of strict minimizers.Footnote 6 In this framework, two solutions are compared by ordering all elements (in our case, binary and unary terms) non-increasingly by their local error value and then performing their lexicographical comparison.

Formally, using the notation from Sect. 3, let \(\ell _1\) and \(\ell _2\) be two labelings. Furthermore, let \(\langle A_1, A_2, \ldots , A_k\rangle \) and \(\langle B_1, B_2, \ldots , B_k\rangle \) be the sequences of all atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\), respectively, each ordered by the decreasing costs of atoms, that is, with \(\varPhi (A_1)\ge \cdots \ge \varPhi (A_k)\) and \(\varPhi (B_1)\ge \cdots \ge \varPhi (B_k)\). We say that \(\ell _1\)precedes\(\ell _2\)lexicographically and denote this as \(\ell _1 \prec \ell _2\), provided there exists an \(i\in \{1,2,\ldots ,k\}\) such that \(\varPhi (A_i) \ne \varPhi (B_i)\) and for the smallest such i we have \(\varPhi (A_i) < \varPhi (B_i)\). Also, we write \(\ell _1 \preceq \ell _2\) provided either \(\ell _1 \prec \ell _2\) or \(\varPhi (A_i) = \varPhi (B_i)\) for all \(i\in \{1,2,\ldots ,k\}\).

Definition 2

A labeling \(\ell \) is said to be strictly minimal provided \(\ell \preceq \ell '\) for any other labeling \(\ell '\).

From this definition, it is clear that any strict minimizer is also an \(L_\infty \)-optimal solution. Thus, the set of all strict minimizers is a subset of all \(L_\infty \)-norm optimal solutions. In fact, the limit, as \(p\rightarrow \infty \), of \(L_p\)-norm minimizers discussed above, is not only an \(L_\infty \)-minimizer but also a strict minimizer [15]. (For the local cost functions satisfying the property (D), it was proved earlier, in a 2012 paper [6] of Ciesielski et al.)Footnote 7

The above discussion indicates that it would be desirable to have an efficient algorithm that not only finds \(L_\infty \)-minimizers, but also strict minimizers. Unfortunately, in the general setting that we examine here, the problem of finding strict minimizers is NP-hard. We will show this at the end of this section. Nevertheless, there are two special situations in which efficient algorithms for finding strict minimizers do exist. The first case is described in the next subsection. The second one, discussed in Sect. 5.1 and solved by the algorithm presented there, is when all local terms have distinct weights.

When all \(\phi _{st}\) are p-Submodular for Large Enough p

For a finite set \(Z\subset [0,\infty )\) and \(k\ge 1\) let \(\delta _Z^k:=\log _b k\), where

$$ \begin{aligned} b:=\min \left\{ \frac{s}{r}:0<r<s \ \& \ r,s\in Z\right\} . \end{aligned}$$

We will use the following result, that identifies the strict optimality with the optimality with respect to \(E_p\) for p large enough. For the local costs maps satisfying (D), this was first proved in [6, theorem 5.3].

Proposition 1

Let \(|V|=k\) and assume that all local cost maps \(\phi _{s}\) and \(\phi _{s,t}\) have values in a finite set \(Z\subset [0,\infty )\). If \(p\ge \delta _Z^k\), then a binary labeling \(\ell \) is strictly minimal if, and only if, it is minimal with respect to \(E_p\).


To see this, notice first that for every \(p\ge \delta _Z^k\)

$$\begin{aligned} \text{ if } \ell _1 \prec \ell _2,\hbox { then }E_p(\ell _1)<E_p(\ell _2). \end{aligned}$$

Indeed, using the notation as in the definition of \(\prec \), let i be the smallest such that \(\varPhi (A_i) < \varPhi (B_i)\). If \(\varPhi (A_i) =0\), then \(E_p^p(\ell _1)=\sum _{j=1}^{i-1}\varPhi ^p(A_j)<\sum _{j=1}^{k}\varPhi ^p(B_j)=E_p^p(\ell _2)\) justifying (8). So, assume that \(\varPhi (A_i)>0\). Then, for b defined as above, we have \(b\le \frac{\varPhi (B_i)}{\varPhi (A_i)}\) and

$$\begin{aligned} \log _b k=\delta _Z^k\le p\le p \log _b \frac{\varPhi (B_i)}{\varPhi (A_i)}=\log _b \frac{\varPhi ^p(B_i)}{\varPhi ^p(A_i)} \end{aligned}$$

so that \(k\varPhi ^p(A_i) < \varPhi ^p(B_i)\). Therefore,

$$\begin{aligned} E_p^p(\ell _1)\le \sum _{j=1}^{i-1}\varPhi ^p(A_j)+k\varPhi ^p(A_i) <\sum _{j=1}^{k}\varPhi ^p(B_j)=E_p^p(\ell _2), \end{aligned}$$

completing the argument for (8).

To prove the proposition, choose \(p\ge \delta _Z^k\) and labelings \(\ell _1\) and \(\ell _2\). If \(\ell _1\) is strictly minimal, then either \(\ell _1 \prec \ell _2\), in which case (8) implies that \(E_p(\ell _1)<E_p(\ell _2)\), or \(\langle \varPhi (A_1), \ldots , \varPhi (A_k)\rangle =\langle \varPhi (B_1), \ldots , \varPhi (B_k)\rangle \), in which case clearly \(E_p(\ell _1)=E_p(\ell _2)\). Thus, strict minimality of \(\ell _1\) indeed implied its minimality with respect to \(E_p\).

Conversely, if \(\ell _1\) is minimal with respect to \(E_p\), then we must have \(\ell _1 \preceq \ell _2\), since otherwise we would have \(\ell _2 \prec \ell _1\) and, by (8), \(E_p(\ell _2)<E_p(\ell _1)\), a contradiction. \(\square \)

A number p for which the proposition holds is referred to by Wolf et al. [21] as a dominant power. Its existence is proved in that paper; however, no estimate similar to that of \(\delta _Z\) is provided there. The estimate \(\delta _Z\) can be found, in a similar settings, in [6, theorem 5.3]; however, this result does not explicitly relate this number with the lexicographical order.

The proposition immediately implies the next theorem.

Theorem 3

Let \(\delta _Z\) be as in Proposition 1 and assume that \(p\in [\delta _Z,\infty )\) is such that all terms \(\phi _{st}\) are submodular. Then any labeling \(\ell \) minimizing \(E_p\) is a strict minimizer. In particular, if there is a \(\rho \in [1,\infty )\) such that \(\phi \) is \(\rho \)-submodular and \(\infty \)-submodular, then there is a \(p\in [\rho ,\infty )\) such that any \(E_p\)-optimizing label \(\ell \) returned by max-flow/min-cut algorithm is a strict optimizer.

We observe that in practice, the dominant power p may be large. This may give rise to numerical issues when solving the max-flow/min-cut problem, as each local cost is raised to the power p. The novel algorithms proposed in Sects. 5 and 6 do not suffer from his potential issue.

NP-Hardness of Finding Strict Optimizers

We will now show that, in the general case, the problem of finding strict optimizers is indeed NP-hard. This is justified by an example from Kolmogorov and Zabih [13, Appendix A] that shows that \(L_1\)-optimality for non-submodular energies is NP-hard.

Recall, that the set U of vertices of a graph \(\mathcal {G}=\langle V, \mathcal {E}\rangle \) is independent when it contains no two vertices connected by an edge. It is known that the problem of finding maximal independent set of vertices of an arbitrary graph is NP-hard [7, chapter 34].

In the example, associate the following local costs:

  • for every vertex v of label i, give the cost \(1-i\);

  • for every edge with both vertices of label 1, let the cost be \(N:=|V|+1\);

  • with any other edge, associate the cost 0.

Notice that the max-cost of any labeling \(\ell \) is \(<N\) if, and only if, the set \(U:=\ell ^{-1}(1)\) is independent. Among all labelings \(\ell \) associated with an independent U, the max cost is 1. Moreover, the labeling \(\ell \) is a strict minimizer when the number of cost 1 atoms for U, which is \(|V|-|U|\), is minimal, that is, when the size of U is maximal.

In other words, if for a graph \(\mathcal {G}\) we use the local costs assignments as above, then \(\ell \) is a strict minimizer if, and only if, \(U:=\ell ^{-1}(1)\) is a maximal independent set of vertices. So, our problem is indeed NP-hard, similarly as the problem of finding maximal independent set of vertices.

A Quadratic Time Algorithm for Direct Optimization of \(E_{\infty }\)

With these preliminaries in place, we are now ready to introduce a general method for finding a binary labeling that globally optimizes \(E_{\infty }\). Pseudocode for this method is given in Algorithm 1.


If n is the number of elements, atoms, in \(\mathcal {A}\), then Algorithm 1 terminates after \(O(n^2)\) operations. This is the case, since the execution of line 1 has complexity \(O(n \ln n)\) (as it requires ordering of \(\mathsf {H}\)) while the loop 2–4 is executed n times and each its execution requires O(n) operations, as we indicated after Theorem 2.

Theorem 4

An \(\ell \) returned by Algorithm 1 is a labeling minimizing energy \(E_\infty \).


The main loop 2–4 is executed precisely n-times, where \(n:=|\mathcal {A}|\).

For every \(k\in \{0,1,\ldots ,n\}\) let \(\mathsf {H}_k\) and \(\mathsf {L}_k\) be the states of \(\mathsf {H}\) and \(\mathsf {L}\), respectively, directly after the kth execution of the loop 2–4. First notice that, for every \(k\in \{0,1,\ldots ,n\}\),


\(\mathsf {H}_k\cup \mathsf {L}_k\) is consistent.

Clearly \(\mathsf {H}_0\cup \mathsf {L}_0= \mathcal {A}\), is consistent. Also, for every \(k<n\), if \(\mathsf {H}_k\cup \mathsf {L}_k\) is consistent, then so is \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\). Indeed, if during the \((k+1)\)st execution of line 3 an atom A is removed from \(\mathsf {H}_k\), then \(\mathsf {H}_{k+1}=\mathsf {H}_{k}\setminus \{A\}\). If \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent, then \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and (\(C_{k+1}\)) holds. Otherwise, line 4 ensures that \(\mathsf {L}_{k+1}=\mathsf {L}_k\cup \{A\}\) and \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}=\mathsf {H}_k\cup \mathsf {L}_k\) is consistent by (\(C_k\)).

The above shows that \(\mathsf {H}_n\cup \mathsf {L}_n=\mathsf {L}_n\) is consistent, that is, there exists a labeling \(\ell ':V\rightarrow \{0,1\}\) so that \(\mathcal {A}(\ell ')\subseteq \mathsf {L}_n\). To finish the proof that \(\ell =\bigcup \mathsf {L}_n\) is a labeling, we need to show that \(\mathcal {A}(\ell ')= \mathsf {L}_n\).

So see this, first notice that \(\mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\subseteq \mathsf {H}_{k}\cup \mathsf {L}_{k}\) for every \(k<n\). So, \(\mathcal {A}(\ell ')\subseteq \mathsf {L}_n\subseteq \mathsf {H}_{k}\cup \mathsf {L}_{k}\). To see that \(\mathsf {L}_n\subseteq \mathcal {A}(\ell ')\), assume by way of contradiction that there is an \(A\in \mathsf {L}_n\setminus \mathcal {A}(\ell ')\). Then, A is removed from \(\mathsf {H}\) during some, say kth, execution of line 2. So, \(A\notin \mathsf {H}_{k+1}\). Also, if \(A\notin \mathcal {A}(\ell ')\), then \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent, as it contains \(\mathcal {A}(\ell ')\). Therefore, \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and \(A\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n\), a contradiction. This means that \(\mathcal {A}(\ell ')= \mathsf {L}_n\).

Finally, by way of contradiction, assume that \(\ell =\bigcup \mathsf {L}_n\) does not minimize \(E_\infty \), that is, that there is a labeling \(\ell '\) with \(c:=E_\infty (\ell ')<E_\infty (\ell )\). Then, there is an \(A\in \mathcal {A}(\ell )\) of cost \(>c\). Let \(k\le n\) be such that A is removed from \(\mathsf {H}\) during the kth execution of line 2. Then \(A\notin \mathsf {H}_{k+1}\). Also, by the ordering of \(\mathsf {H}\), we have \(\mathcal {A}(\ell ')\subset \mathsf {H}_{k+1}\). So, \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent and \(\mathsf {L}_{k+1}=\mathsf {L}_k\). In particular, \(A\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n=\mathcal {A}(\ell )\), contradicting the fact that \(A\in \mathcal {A}(\ell )\). \(\square \)

Atoms with Unique Weights

We say that the atoms (in \(\mathcal {A}\)) have unique weights provided the map \(\varPhi :\mathcal {A}\rightarrow [0,\infty )\) is injective, that is, when \(\varPhi (A_1)\ne \varPhi (A_2)\) for every distinct \(A_1,A_2\in \mathcal {A}\). Our main result here is the following

Theorem 5

If the atoms in \(\mathcal {A}\) have unique weights, then the labeling \(\ell \) returned by Algorithm 1 is the unique strict optimizer.

First we prove the uniqueness part of the theorem, in form of the following lemma.

Lemma 1

If the atoms in \(\mathcal {A}\) have unique weights, then the strictly optimal labeling is unique.


Let \(\ell _1\) and \(\ell _2\) be strictly optimal labelings. We will show that \(\ell _1=\ell _2\).

To see this, consider the sequences of the atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\), respectively, each ordered by decreasing cost. Then, since both labelings are strictly optimal, the decreasing sequences of the costs of the atoms in \(\mathcal {A}(\ell _1)\) and \(\mathcal {A}(\ell _2)\) must be identical. However, since every atom has a unique weight, this means that the sets of atoms in \(\mathcal {A}(\ell _1)\) and in \(\mathcal {A}(\ell _2)\) must themselves be identical. In particular \(\mathcal {A}(\ell _1)=\mathcal {A}(\ell _2)\) and therefore \(\ell _1=\bigcup \mathcal {A}(\ell _1)=\bigcup \mathcal {A}(\ell _2)=\ell _2\), as needed. \(\square \)

Proof of Theorem 5

We will use the same notation as in the proof of Theorem 4. Let \(\ell \) and \(\ell '\) be distinct labelings such that \(\ell \) is strictly optimal and, by way of contradiction, assume that Algorithm 1 returns labeling \(\ell '\) rather than \(\ell \). Fix the sequences \(\langle A_1, A_2, \ldots , A_m\rangle \) and \(\langle B_1, B_2, \ldots , B_m\rangle \) of all atoms in \(\mathcal {A}(\ell )\) and \(\mathcal {A}(\ell ')\), respectively, each ordered by the decreasing costs of atoms. By Lemma 1 , we have \(\ell \prec \ell '\). Therefore, there exists an \(i\in \{1,2,\ldots ,m\}\) such that \(\varPhi (A_i) < \varPhi (B_i)\) and \(\varPhi (A_j) =\varPhi (B_j)\) for all \(j<i\).

Let \(k\le n\) be such that \(B_i\) is removed from \(\mathsf {H}\) during the kth execution of line 2. Then, \(\{B_1, B_2, \ldots , B_m\}=\mathcal {A}(\ell ')\subset \mathsf {L}_n\subset \mathsf {H}_{k}\cup \mathsf {L}_{k}\). In fact, by the ordering principle of \(\mathsf {H}\) we have \(\{A_1, \ldots , A_{i-1}\}=\{B_1, \ldots , B_{i-1}\}\subset \mathsf {H}_{k}\) and \(\{A_i, \ldots , A_{n}\}\subset \mathsf {L}_{k}\). In particular, \(\mathsf {H}_{k+1}\cup \mathsf {L}_k\) is consistent since it contains \(\{A_1, A_2, \ldots , A_m\}=\mathcal {A}(\ell )\). Thus, \(\mathsf {L}_{k+1}=\mathsf {L}_k\) and \(B_i\notin \mathsf {H}_{k+1}\cup \mathsf {L}_{k+1}\supset \mathsf {L}_n=\mathcal {A}(\ell ')\), a contradiction that finishes the proof of Theorem 5. \(\square \)

The requirement in Theorem 5 (and the forthcoming Theorem 7) that all atoms in \({{\mathcal {A}}}\) have unique weights may appear restrictive, and for real-world problems, this condition may or may not hold. We will therefore now discuss how these theorems may be interpreted when all atoms weights are not unique. First we observe that when all atom weights are not unique, it is straightforward to define a new local cost function \({\hat{\varPhi }}\) with unique weights and such that, for any atoms \(A,A'\in {\mathcal A}\), \(\varPhi (A)<\varPhi (A')\) implies \({\hat{\varPhi }}(A)<{\hat{\varPhi }}(A')\). Such weights may, e.g., be defined by the following simple procedure:

  • Fix, by some method (e.g., a sorting algorithm), an increasing order of the atoms in \({{\mathcal {A}}}\) by weight, i.e., find a map \(O:{{\mathcal {A}}}\rightarrow Z\) such that \(O(A_1) \not = O(A_2)\) for every distinct \(A_1, A_2, \in \mathcal {A}\) and \(O(A_1) < O(A_2) \Rightarrow \varPhi (A_1) \le \varPhi (A_j)\) for all \(A_1, A_2 \in \mathcal {A}\).

  • For all \(A \in \mathcal {A}\), define \({\hat{\varPhi }}(A) := O(A)\).

By design, all atoms associated with the local costs \({\hat{\phi }}\) have unique weights and thus running Algorithm 1 (or Algorithm 2 in case of Theorem 7) with these weights will return a strict optimizer with respect to the local costs \({\hat{\varPhi }}\).

We observe that if the original atom weights are all unique, then the ordering O is also unique and running either of our new algorithms with the new local costs \({\hat{\varPhi }}\) induced by O would yield an identical result as with the original weights. Furthermore, we observe that the procedure above is essentially what happens during the execution of the algorithms: By ordering the max-priority queue \(\mathsf {H}\), we are establishing a specific (implementation dependent) ordering of the atoms that is increasing by weight just like the ordering O defined in the procedure above. Thus, even when all atoms do not have unique weights, the algorithms will return labelings that are strictly optimal with respect to some increasing order of the atoms by weight. When all atom weights are not unique, however, this ordering will not be unique but will depend on the specific implementation of the max-priority queue \(\mathsf {H}\).

A Quasi-Linear Time Algorithm for Direct Optimization of \(E_\infty \) When All Binary Terms are \(\infty \)-Submodular

We now present a more efficient algorithm, previously reported in the conference version of this manuscript [16], for the case when all binary terms are \(\infty \)-submodular. Superficially, this algorithm is slightly more complicated than Algorithm 1. We emphasize, however, that both algorithms have a very similar structure—starting from the set of all possible atoms, both algorithms iteratively remove one atom at a time until the remaining atoms define a unique labeling. The main difference between the algorithms is the steps taken to ensure the consistency of the set of remaining atoms.

Local Consistency, Incompatible Atoms

We introduce a property of local consistency, which will be used to establish the correctness of our second proposed algorithm. A set of atoms \(\mathcal {A}'\) is said to be locally consistent if, for every vertex \(s\in V\) and edge \(\{s,t\}\in {{\mathcal {E}}}\) there are \(i,j\in \{0,1\}\) such that the atoms \(\{\langle s,i\rangle \}\) and \(\{\langle s,i\rangle ,\langle t,j\rangle \}\) both belong to \({{\mathcal {A}}}'\) (i.e., that \({{\mathcal {A}}}'\) still allows that s will have some label). Clearly, any consistent set of atoms is also locally consistent. However, in general, local consistency does not imply consistency.Footnote 8

Furthermore, we introduce the notion of an incompatible atom, which will be needed for the exposition of the proposed algorithm. For a given set of \({{\mathcal {A}}}'\) of atoms, we say that an atom \(A\in {{\mathcal {A}}}'\) is (locally) incompatible (w.r.t. \({{\mathcal {A}}}'\)) if either

  1. 1.

    A is a unary atom so that \(A=\{\langle v,i\rangle \}\) for some vertex v, and there exists some edge \(\{v,w\}\) adjacent to v such that \({{\mathcal {A}}}'\) contains neither \(\{\langle v,i\rangle ,\langle w,0\rangle \}\) nor \(\{\langle v,i\rangle ,\langle w,1\rangle \}\); or

  2. 2.

    A is a binary atom so that \(A=\{\langle v,i\rangle ,\langle w,j\rangle \}\) for some edge \(\{v,w\}\), and at least one of \(\{\langle v,i\rangle \}\) and \(\{\langle w,j\rangle \}\) is not in \({{\mathcal {A}}}'\).

Note that a locally consistent set of atoms may still contain incompatible atoms.

The Second Algorithm

We now introduce the proposed algorithm, with quasi-linear time complexity, for finding a binary label assignment \(\ell :V\rightarrow \{0,1\}\) that globally minimizes the objective function \(E_\infty \) given by (1), under the condition that all pairwise terms in the objective function are \(\infty \)-submodular. If, additionally, all atoms have unique weights then the labeling returned by the algorithm is also the strict minimizer. Informally, the general outline of the proposed algorithm is as follows:

  • Start with a set S consisting of all possible atoms and an initially empty set I of atoms identified as incompatible. (Recall that the total number of atoms is \(\mathcal {O}(|V|+|{{\mathcal {E}}}|)\).)

  • For each atom A, in order of decreasing cost \(\varPhi (A)\):

    • If A is still in S, and is not the only remaining atom for that vertex/edge, remove A from S.

    • After the removal of A, S may contain incompatible atoms. Iteratively remove all such incompatible atoms until S contains no more incompatible atoms.

Before we formalize this algorithm, we introduce a specific preordering relation \(\gg \) on the atoms \({{\mathcal {A}}}\). For \(A_0,A_1\in {{\mathcal {A}}}\), we will write \(A_0 \gg A_1\) if either \(\varPhi (A_0)>\varPhi (A_1)\), or else \(\varPhi (A_0)=\varPhi (A_1)\) and \(A_1\) is a binary atom of the form \(\{\langle s,i\rangle ,\langle t,i\rangle \}\) (equal labeling) while \(A_0\) is not in this form.

With these preliminaries in place, we are now ready to introduce the proposed algorithm, for which pseudocode is given in Algorithm 2.


Computational Complexity

We now analyze the asymptotic computational complexity of Algorithm 2. First, let \(\eta :=|\mathcal {A}|=2|V|+4|\mathcal {E}|\). In image processing applications the graph \(\mathcal {G}\) is commonly sparse, in the sense that \(\mathcal {O}(|V|)=\mathcal {O}(|\mathcal {E}|)\). In this case, we have \(\mathcal {O}(\eta )=\mathcal {O}(|V|)\).

Creating the list \(\mathsf {H}\) requires us to sort all atoms in \(\mathcal {A}\). The sorting can be performed in \(\mathcal {O}(\eta \log \eta )\) time. In some cases, e.g., if all unary and binary terms are integer valued, the sorting may be possible to perform in \(\mathcal {O}(\eta )\) time using, e.g., radix or bucket sort.

We make the reasonable assumption that the following operations can all be performed in \(\mathcal {O}(1)\) time:

  • Remove an atom from \(\mathsf {H}\).

  • Remove an atom from \(\mathsf {A}(D)\).

  • Remove or insert elements in \(\mathsf {K}\).

  • Given an atom, find its corresponding edge or vertex.

  • Given a vertex, find all edges incident at that vertex.

  • Given an edge, find the vertices spanned by the edge.

The combined number of the executions of the main loop, lines 3-12, and of the internal loop, lines 7–12, equals to \(|{{\mathcal {A}}}|\), that is, \(\mathcal {O}(\eta )\). This is so, since any insertion of an atom into \(\mathsf {K}\) requires its prior removal from the list \(\mathsf {H}\). If the assumptions above are satisfied, it is easily seen that only \(\mathcal {O}(1)\) operations are needed between consecutive removals of an atom from \(\mathsf {H}\). Therefore, the amortized cost of the execution of the main loop is \(\mathcal {O}(\eta )\).

Thus, the total computational cost of the algorithm is bounded by the time required to sort \(\mathcal {O}(\eta )\) elements, i.e., at most \(\mathcal {O}(\eta \log \eta )\).

Proof of Correctness

Theorem 6

If all binary terms of the cost function \(\varPhi :{{\mathcal {D}}}\rightarrow [0,\infty )\) associated with graph \(\mathcal {G}=\langle V,\mathcal {E}\rangle \) are \(\infty \)-submodular, then \(\ell \) returned by Algorithm 2 is a labeling of V minimizing the objective function \(E_\infty \).

Let \(\mathsf {n}:=|V|+3|\mathcal {E}|\), the number of removals of an atom from \(\mathsf {A}\). For every \(D\in {{\mathcal {D}}}\) and \(k\in \{0,\ldots ,\mathsf {n}\}\) let \(\mathsf {A}_k[D]\) be equal to the value of \(\mathsf {A}[D]\) directly after the k-th removal of some atom(s) from \(\mathsf {A}\), which can happen only as a result of execution of either line 6 or line 10. (For \(k=0\) we mean, directly after the execution of line 2.) Let \({{\mathcal {A}}}_k=\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}_k[D]\).

Let \(1=k_1<\cdots <k_m\) be the list of all values of \(k\in \{1,\ldots ,\mathsf {n}\}\) such that \({{\mathcal {A}}}_k\) is a proper refinement of \({{\mathcal {A}}}_{k-1}\) resulting from the execution of line 6. Note that it is conceivable that the numbers \(k_j\) and \(k_{j+1}\) are consecutive—this happens when the execution of loop 8-12 directly after the execution of line 5 has been used to create \({{\mathcal {A}}}_{k_j}\) resulted in removal of no atoms from \({{\mathcal {A}}}_{k_j}\).

The proof of Theorem 6 is based on the following Lemma, for which a proof is given in Appendix Section.

Lemma 2

During the execution of Algorithm 2, the following properties hold for every \(k\le \mathsf {n}\).


For every edge \(D=\{v,w\}\), if \(\mathsf {A}_k[D]\) is missing either \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,1\rangle ,\langle w,1\rangle \}\), then it must be also missing \(\{\langle v,1\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\).


\(\mathsf {A}_k[D]\) contains at least one atom for every \(D\in {{\mathcal {D}}}\).


\({{\mathcal {A}}}_k\) is locally consistent.


\({{\mathcal {A}}}_k\) has no incompatible atoms directly before any execution of line 4.

Proof of Theorem 6

Beside Lemma 2, we still need to argue for two facts. First notice that the algorithm does not stop until all buckets \(\mathsf {A}_\mathsf {n}[D]\), \(D\in {{\mathcal {D}}}\), have precisely one element. Thus, since \({{\mathcal {A}}}_\mathsf {n}\) is locally consistent, \(\ell =\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) is indeed a function from V into \(\{0,1\}\).

To finish the proof, we need to show that \(\ell \) indeed minimizes energy \(E_\infty \). For this, first notice that at any time of the execution of the algorithm, any atom in \(\mathsf {H}\) is also in \(\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\). Indeed, these sets are equal immediately after the initialization and we remove from \(\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) only those atoms, that have been already removed from \(\mathsf {H}\). Now, let \(L:V\rightarrow \{0,1\}\) be a labeling minimizing \(E_\infty \). We claim that the following property holds any time during the execution of the algorithm:


if \(\varPhi (A')>E_\infty (L)\) for some \(A'\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\), then \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\).

Indeed, it certainly holds immediately after the initialization. This cannot be changed during the execution of line 6 when the assumption is satisfied, since then A considered there has just been removed from \(\mathsf {H}\supset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) and

$$\begin{aligned}&\varPhi (A) \ge \max _{H\in \mathsf {H}}\varPhi (H) \ge \max _{H\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]}\varPhi (H) \\&\quad \ge \varPhi (A')>E_\infty (L) =\max _{H\in {{\mathcal {A}}}[L]}\varPhi (H), \end{aligned}$$

so \(A\notin {{\mathcal {A}}}[L]\). Also, (P) is not affected by an execution of line 10, since the inclusion \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\) is not affected by it: no atom in \({{\mathcal {A}}}[L]\) is incompatible with \({{\mathcal {A}}}[L]\) so also with \( \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\). This concludes the proof of (P).

Now, by the property (P), after the termination of the main loop, we have either \({{\mathcal {A}}}[L]\subset \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\), in which case \(\ell =L\) have minimal \(E_\infty \) energy, or else

$$\begin{aligned} E_\infty (L)\ge \max _{H\in \bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]}\varPhi (H)= \max _{H\in \mathsf {H}}{{\mathcal {A}}}[\ell ]=E_\infty (\ell ) \end{aligned}$$

once again ensuring optimality of \(\ell \). \(\square \)

Theorem 7

If the atoms in \(\mathcal {A}\) have unique weights, then the labeling \(\ell \) returned by Algorithm 2 is the unique strict optimizer.


The uniqueness part of the theorem is already shown in Lemma 1. The rest of the argument is essentially identical to that used in the proof of Theorem 5. \(\square \)

NP-Hardness of Multi-label \(E_\infty \)-optimization

We will now show that, for a number of labels \(K>2\), the problem of finding a labeling that minimizes \(E_\infty \) is NP-hard in the general case.

Recall that a K-coloring of a graph is a mapping \(c:V \rightarrow \{1,2,\ldots , K\}\) such that \(c(s) \not = c(t)\) for every edge \(\{s,t\} \in \mathcal {E}\). The K-coloring problem consists of determining whether a given undirected graph admits a K-coloring. Recall also that already 3-coloring problem is NP-complete [7, chapter 34].

To see that optimization of \(E_\infty \) is NP-hard for \(K>2\) labels, consider 3 labelings, where we associate the costs:

  • for every vertex v the cost of any label assignment is 0;

  • for any edge with distinct labeling of its vertices the cost is 0;

  • for any edge with the same labeling of its vertices the cost is 1.

For such assignments, the \(E_\infty \)-energy of a labeling is \(\le 0\) if, and only if, the labeling is a 3-coloring. The same argument can be repeated also for \(K>3\). Thus, the problem of \(E_\infty \)-optimization with \(K>2\) labels is indeed NP-hard.

Table 1 Summary of results: subclasses of the general max-norm optimization problem considered here, and algorithms for solving them


We have presented two algorithms for finding a binary vertex labeling of a graph that globally minimizes objective functions of the form \(E_\infty \). It is well known that for a limited subclass of such problems, globally optimal solutions can be found by computing an optimal spanning forest on a suitably constructed graph. Such optimal spanning forests can, in turn, be computed using very efficient, greedy algorithms. Despite the fact that this optimum spanning forest approach is commonly used in many image processing applications, the potential and limitations of this method in terms of more general optimization problems are, to the best of our knowledge, largely unexplored. The exact class of max-norm optimization problems that can be solved using efficient greedy algorithms, or even in polynomial time, has remained unknown. By the introduction of the two proposed algorithms, we show that the class of such problems that can be solved in (low-order) polynomial time is indeed larger than what was previously known. In Table 1, we provide a summary of the various subclasses of the general optimization problem considered in this paper, and algorithms for solving them.

An important observation here is the following: Optimization binary labeling problems with objective functions of the form \(E_1\) frequently occur in image processing and computer vision applications. The max-flow/min-cut approach proposed by Kolmogorov and Zabih [13] still remains one of the primary methods for solving such problems when all pairwise terms are submodular. When the local cost functionals include non-submodular terms, however, the same problem becomes NP-hard. As concluded in our discussion in Sect. 2.1, similar submodularity requirements hold also for the generalized objective functions \(E_p\) for any finite p. Practitioners looking to solve such optimization problems must therefore first verify that their local cost functional satisfies the appropriate submodularity conditions. If this is not the case, they must resort to approximate optimization methods that may or may not produce satisfactory results for a given problem instance. Here we show, by the introduction of Algorithm 1, that in the limit as p goes to infinity, the requirement for submodularity of the pairwise terms disappears. Indeed Algorithm 1 returns, in low-order polynomial time, a \(E_\infty \)-minimal binary labeling for any local cost functional. Thus, even when the local costs are such that the problem of minimizing \(E_p\) is NP-hard for some or all finite p, a labeling minimizing \(E_\infty \) can be found in low order polynomial time.

The motivation for our work comes from image processing applications, and the local cost functionals we consider naturally occurs in many image processing problems. The two proposed algorithms, however, are formulated for general graphs and may thus also have applications to other applied problems in computer science. Structurally, both the proposed algorithms resemble Kruskal’s algorithm [7, 14], and in this sense the proposed algorithms can be seen as generalizations of the optimum spanning forest approach to optimization.

Algorithm 1 has quadratic time complexity and is thus less efficient than Algorithm 2. It appears likely, however, that the time complexity of Algorithm 1 could be reduced further. Specifically, Algorithm 1 operates by solving a series of n 2-satisfiability problem. In the proposed algorithm each such problem is solved in isolation, but we observe that there is a high degree of similarity between each consecutive problem—each 2-satisfiability problem differs from the previous one only by the introduction of one additional disjunction of two literals. Exploring whether this redundancy can be utilized to formulate a more efficient version of Algorithm 1 is an interesting direction for future work.

Another natural extension of the work presented here is to consider optimization with more than two labels. In Sect. 7, we showed that for more than two labels finding a labeling that is optimal according to \(E_\infty \) is NP-hard in the general case. Nevertheless, as can be seen in Table 1, there are special cases of multilabel max-norm problems that can be solved using Prim’s algorithm. Determining the class of multilabel problems that can be solved in low-order polynomial time is an interesting direction for future work.

At first glance, the restriction to binary labeling may appear very limiting. We note, however, that many successful methods for approximate multi-label optimization rely on iteratively minimizing binary labeling problems via move-making strategies [4]. Thus, the ability to find optimal solutions for problems with two labels potentially has a high relevance also for the multi-label case.


  1. 1.

    Actually, the energy formula (1) is expressed in terms of the directed graph \(\hat{\mathcal {G}}=\langle V, {\hat{{{\mathcal {E}}}}}\rangle \). But, for any \(\{s,t\}\in {{\mathcal {E}}}\), we consider the value of \(\phi _{st}(\ell (s),\ell (t))\) as depending only on \(\ell \restriction \{s,t\}=\{\langle s,\ell (s)\rangle ,\langle t,\ell (t)\rangle \}\)—the restriction of \(\ell \) to the indirect edge \(\{s,t\}\). (See Sect. 3, where we explicitly express \(E_\infty (\ell )\) in terms of the numbers \(\varPhi (\ell \restriction \{s,t\})=\phi _{st}(\ell (s),\ell (t))\) and \(\varPhi (\ell \restriction \{s\})=\phi _{s}(\ell (s))\).) So, the directedness of the graph is not really used in (1).

  2. 2.

    Formally, the asymptotic time complexity is bounded by the time required to sort \(\mathcal {O}(|V|+|\mathcal {E}|)\) values. Here, |X| denotes the cardinality of the set X.

  3. 3.

    Formally, this vector is identified with the function \(\phi _\ell \) defined in the next section.

  4. 4.

    As an example, consider the two-label pairwise term \(\phi _{st}\) given by \(\phi _{st}(0, 0) = 3\), \(\phi _{st}(1, 1) = 0\), and \(\phi _{st}(0, 1) =\)\(\phi _{st}(1, 0) = 2\). It is easily verified that \(\phi _{st}\) is 1-submodular but not 2-submodular.

  5. 5.

    We note that this algorithm is also sometimes referred to as the Jarnìk-Prim-Dikstra algorithm, as it was independently discovered by these three authors [11, 12, 19]

  6. 6.

    See also the 2010 paper by Ciesielski and Udupa [5] where strict optimization was earlier considered in a similar setting.

  7. 7.

    Specifically, [6, theorem 5.3] states that for \(q>0\) large enough we have \({\mathcal P}^q(S,T)=\hat{{\mathcal {P}}}_{\max }(S,T)\), where parameters S and T indicate that the unary local cost maps ensure that for any optimal label \(\ell \) we have \(S\subset \ell ^{-1}(1)\) and \(T\subset \ell ^{-1}(0)\) (i.e., \(\psi _s(i)=\infty \) if, and only if, either \(i=0\) and \(s\in S\) or else \(i=1\) and \(s\in T\)), \({\mathcal P}^q(S,T)\) is the set of all labelings minimizing \(E_q\), while \(\hat{{\mathcal {P}}}_{\max }(S,T)\) is the set of all strictly optimal labelings.

  8. 8.

    For example, if g is a complete graph with three vertices \(V=\{a,b,c\}\) and \({{\mathcal {A}}}\) consists of all unary atoms and the binary atoms \(\{\langle a,i\rangle ,\langle b,i\rangle \}\), \(\{\langle a,i\rangle ,\langle c,i\rangle \}\), \(\{\langle b,i\rangle ,\langle c,1-i\rangle \}\) for \(i\in \{0,1\}\), then \({{\mathcal {A}}}\) is locally consistent, but not (globally) consistent.


  1. 1.

    Abbas, A., Swoboda, P.: Bottleneck potentials in Markov random fields. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3175–3184 (2019)

  2. 2.

    Allène, C., Audibert, J.Y., Couprie, M., Cousty, J., Keriven, R., et al.: Some links between min-cuts, optimal spanning forests and watersheds. Math. Morphol. Its Appl. Image Signal Process. 253–264 (2007)

  3. 3.

    Aspvall, B., Plass, M.F., Tarjan, R.E.: A linear-time algorithm for testing the truth of certain quantified Boolean formulas. Inf. Process. Lett. 8(3), 121–123 (1979)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  5. 5.

    Ciesielski, K.C., Udupa, J.K.: Affinity functions in fuzzy connectedness based image segmentation I: equivalence of affinities. Comput. Vis. Image Underst 114(1), 146–154 (2010)

    Article  Google Scholar 

  6. 6.

    Ciesielski, K.C., Udupa, J.K., Falcão, A.X., Miranda, P.A.: Fuzzy connectedness image segmentation in graph cut formulation: a linear-time algorithm and a comparative analysis. J. Math. Imaging Vis. 44(3), 375–398 (2012)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)

    Google Scholar 

  8. 8.

    Couprie, C., Grady, L., Najman, L., Talbot, H.: Power watershed: a unifying graph-based optimization framework. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1384–1399 (2011)

    Article  Google Scholar 

  9. 9.

    Cousty, J., Bertrand, G., Najman, L., Couprie, M.: Watershed cuts: minimum spanning forests and the drop of water principle. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1362–1374 (2009)

    Article  Google Scholar 

  10. 10.

    Cousty, J., Bertrand, G., Najman, L., Couprie, M.: Watershed cuts: thinnings, shortest path forests, and topological watersheds. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 925–939 (2009)

    Article  Google Scholar 

  11. 11.

    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Jarnìk, V.: O jistém problému minimálím (On a certain problem of minimization). Práce moravské přírodovědecké společnosti 6(4), 57–63 (1930)

    Google Scholar 

  13. 13.

    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  Google Scholar 

  14. 14.

    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Levi, Z., Zorin, D.: Strict minimizers for geometric optimization. ACM Trans. Gr. (TOG) 33(6), 185 (2014)

    MATH  Google Scholar 

  16. 16.

    Malmberg, F., Ciesielski, K.C., Strand, R.: Optimization of max-norm objective functions in image processing and computer vision. In: International Conference on Discrete Geometry for Computer Imagery, pp. 206–218. Springer (2019)

  17. 17.

    Malmberg, F., Strand, R.: When can \(l_p\)-norm objective functions be minimized via graph cuts? In: International Workshop on Combinatorial Image Analysis. Springer (2018)

  18. 18.

    Najman, L.: Extending the power watershed framework thanks to \(\gamma \)-convergence. SIAM J. Imaging Sci. 10(4), 2275–2292 (2017)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Techn. J. 36(6), 1389–1401 (1957)

    Article  Google Scholar 

  20. 20.

    Sinop, A.K., Grady, L.: A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)

  21. 21.

    Wolf, S., Bailoni, A., Pape, C., Rahaman, N., Kreshuk, A., Köthe, U., Hamprecht, F.A.: The mutex watershed and its objective: efficient, parameter-free image partitioning. arXiv preprint arXiv:1904.12654 (2019)

  22. 22.

    Wolf, S., Pape, C., Bailoni, A., Rahaman, N., Kreshuk, A., Kothe, U., Hamprecht, F.: The mutex watershed: efficient, parameter-free image partitioning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 546–562 (2018)

  23. 23.

    Wolf, S., Schott, L., Kothe, U., Hamprecht, F.: Learned watershed: end-to-end learning of seeded segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2011–2019 (2017)

Download references


Open access funding provided by Uppsala University. The authors would like to thank Robin Strand for valuable discussions on the ideas presented in this manuscript.

Author information



Corresponding author

Correspondence to Filip Malmberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Lemma 2

Appendix: Proof of Lemma 2

In this appendix, we provide a proof of Lemma 2. It is enough to prove that if for some \(\kappa \le \mathsf {n}\) the properties (P0)-(P3) hold for every \(k<\kappa \), then they also hold for \(\kappa \). Clearly, these properties hold immediately after the execution of line 2, that is, for \(\kappa =0\). So, we can assume that \(\kappa >0\). We need to show that (P0)-(P3) are preserved by each operation of the algorithm. More specifically, by the execution of lines 6 or 10, since the status of each of these properties can change only when an atom is removed from \(\mathsf {A}\) during their execution.

Proof of (P0)

Fix an edge \(D=\{v,w\}\) and assume that (P0) holds for this D and all \(k<\kappa \). Now, if \(\mathsf {A}_{\kappa -1}[D]\) has less than 4 elements, then by the inductive assumption it must be already missing either \(\{\langle v,1\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\), and so the same will be true for \(\mathsf {A}_{\kappa }[D]\), as needed. So, assume that \(\mathsf {A}_{\kappa -1}[D]\) has still all 4 elements. This means that these 4 elements are present in \(\mathsf {H}\) and, by (2) and the choice of the ordering of \(\mathsf {H}\), the atoms \(\{\langle v,1\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\) must precede in \(\mathsf {H}\) any of the atoms \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,1\rangle ,\langle w,1\rangle \}\). In particular, if \(\kappa =k_j\) for some j, then \(\mathsf {A}_{\kappa }[D]\) is obtained as a result of execution of line 3 and the ordering of \(\mathsf {H}\) ensures that \(\mathsf {A}_{\kappa }[D]\) still satisfies (P0). So, assume that \(\kappa =k_j\) for no j; that is, that \(\mathsf {A}_{\kappa }[D]\) is obtained from \(\mathsf {A}_{\kappa -1}[D]\) by the execution of line 10. Since one of the atoms from \(\mathsf {A}_{\kappa -1}[D]\) was removed as a result of this execution, for one of vertices of D, say v, the bucket \(\mathsf {A}_{\kappa -1}[\{v\}]\) must be missing one of its atoms, say \(\{\langle v,i\rangle \}\). But this means that \(\mathsf {A}_{\kappa -1}[D]\) must have been missing both \(\{\langle v,i\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,i\rangle ,\langle w,1\rangle \}\), so indeed \(\mathsf {A}_{\kappa }[D]\) satisfies (P0). \(\square \)

Proof of (P1)-(P3)

This will be proved by the simultaneous induction on \(\kappa \).

(P1) must be preserved by the execution of line 10, by the inductive assumption (P2) that \({{\mathcal {A}}}_{\kappa -1}\) is locally consistent. It also cannot be destroyed by the execution of line 6, since this is prevented by the condition of line 5. Thus, \(\mathsf {A}_{\kappa }[D]\) still has the property (P1).

To see (P3) we can assume that \(\kappa =k_j\) for some \(j>0\). Clearly (P3) holds for \(k=k_{j-1}\). Thus, we need only to show that removal of an atom A in line 6 and consecutive execution of loop 7–12 preserves (P3). Indeed, the potential incompatibility can occur only in relation of the vertices associated with the atoms removed from \(\bigcup _{D\in {{\mathcal {D}}}}\mathsf {A}[D]\). However, each time such an atom is removed, all adjacent atoms are inserted into the queue \(\mathsf {K}\) and the execution of the loop 7–12 does not end until all such potential incompatibilities are taken care off.

The proof of the preservation of (P2) is more involved. Let j be the largest such that \(k_j\le \kappa \). First notice that if \(\kappa =k_j\), then (P2) holds. Indeed, by the inductive assumptions (P2) and (P3), \({{\mathcal {A}}}_{\kappa -1}\) is locally consistent and has no incompatible atoms. Since \({{\mathcal {A}}}_{\kappa }\ne {{\mathcal {A}}}_{\kappa -1}\), the bucket \(\mathsf {A}[D]\) must have contained two or more atoms prior to the removal of A in line 6. Since \({{\mathcal {A}}}_{\kappa -1}\) did not contain any incompatible atoms, \({{\mathcal {A}}}_{\kappa }={{\mathcal {A}}}_{\kappa -1}\setminus \{A\}\) must remain locally consistent. So, we can assume that \(\mu :=\kappa -k_j\) is nonzero. We will examine families \({{\mathcal {A}}}_{k_j},{{\mathcal {A}}}_{k_j+1}, \ldots , {{\mathcal {A}}}_{k_j+\mu }= {{\mathcal {A}}}_\kappa \).

Let \(A=A_0,\ldots ,A_\mu \) be the order in which the atoms were removed from \(\mathsf {K}\) during of this time execution of loop 8-12. Also, let \(x_0,\ldots ,x_\mu \) be the vertices/edges associated with the atoms \(A_0,\ldots ,A_\mu \), respectively. We will show, by induction on \(\nu \le \mu \), the following property (\(I_\nu \)), which in particular imply that \({{\mathcal {A}}}_{k_j+\nu }\) is locally consistent.

To state (\(I_\nu \)) first notice that if an atom for a vertex v is among \(x_0,\ldots ,x_{\nu -1}\), then \({{\mathcal {A}}}_{k_j+\nu }\) must contain precisely one of two atoms \(\{\langle v,0\rangle \}\) and \(\{\langle v,1\rangle \}\). (By (P1), it must contain at least one of these atoms). It cannot contain both, since this would mean that no v-atom was removed so far and hence \(A_{k_j+\nu }\) could not have been removed from \({{\mathcal {A}}}_{k_j+\nu -1}\).) In particular, this means that there is an \(i_v\in \{0,1\}\) for which \({{\mathcal {A}}}_{k_j+\nu }\) already ensures that the final value of \(\ell (v)\) is \(i_v\). This means, that \(\mathsf {A}_{k_j+\nu }[\{v\}]=\bigl \{\{\langle v,i_v\rangle \}\bigr \}\).

We will prove, by induction on \(\nu \le \mu \), that

(\(I_\nu \)):

\({{\mathcal {A}}}_{k_j+\nu }\) is locally consistent and if vertices v and w are among \(x_0,\ldots ,x_\nu \), then \(i_v=i_w\).

Of course, this will finish the proof of (P2).

Clearly, (\(I_0\)) holds, as we already shown that \({{\mathcal {A}}}_{k_j}\) is locally consistent, and the other condition is satisfied in void. So, fix \(\nu \in \{1,\ldots ,\mu \}\) such that (\(I_\xi \)) holds for all \(\xi <\nu \). We will show that (\(I_{\nu }\)) holds as well.

For this, assume first that \(x_{\nu }\) is an edge \(\{v,w\}\). We need to show only that \({{\mathcal {A}}}_{k_j+\nu }\) remains locally consistent, the other part of (\(I_{\nu }\)) being ensured in this case by (\(I_{\nu -1}\)). Since \(x_{\nu }=\{v,w\}\), there must exist a \(j<\nu \) such that \(x_j\) is a vertex and \(x_j\in \{v,w\}\). For simplicity, we assume that \(x_j=v\) and that \(i_v=0\); the other cases being similar.

We need to show that \({{\mathcal {A}}}_{k_j+\nu }\), obtained from \({{\mathcal {A}}}_{k_j+\nu -1}\) by removing from it the atoms \(\{\langle v,1\rangle ,\langle w,0\rangle \}\) and \(\{\langle v,1\rangle ,\langle w,1\rangle \}\), cannot be locally inconsistent.

Note that such removal from locally consistent set \({{\mathcal {A}}}_{k_j+\nu -1}\) can potentially influence local consistency of \({{\mathcal {A}}}_{k_j+\nu }\) only of \(\{v,w\}\) with respect to the vertices v and w. However, since \(\mathsf {A}_{k_j+\nu -1}[\{v\}]=\bigl \{ \{\langle v,0\rangle \}\bigr \}\), this is also equal to \(\mathsf {A}_{k_j+\nu }[\{v\}]\). Also, both \({{\mathcal {A}}}_{k_j+\nu -1}\) and \({{\mathcal {A}}}_{k_j+\nu }\) must contain either \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\). So, \({{\mathcal {A}}}_{k_j+\nu }\) it cannot have local inconsistency of \(\{v,w\}\) with v. Therefore, we must show only that \({{\mathcal {A}}}_{k_j+\nu }\) contains no local inconsistency between \(\{v,w\}\) and w.

To see this, first notice that there will be no such inconsistency when

$$\begin{aligned} \mathsf {A}_{k_j-1}[\{w\}]\subsetneq \bigl \{ \{\langle w,0\rangle \},\{\langle w,1\rangle \}\bigr \}. \end{aligned}$$

Indeed, then \(\mathsf {A}_{k_j-1}[\{w\}]=\bigl \{ \{\langle w,i\rangle \}\) for some \(i\in \{0,1\}\) and, by the property (P3), \({{\mathcal {A}}}_{k_j-1}\supset {{\mathcal {A}}}_{k_j+\mu }\) cannot contain atom \(\{\langle v,0\rangle ,\langle w,1-i\rangle \}\). Hence \({{\mathcal {A}}}_{k_j+\mu }\) must contain \(\{\langle v,0\rangle ,\langle w,i\rangle \}\) and local consistency is preserved.

To finish the argument consider the following three cases.

\(\mathsf {A}_{k_j+\nu }[\{w\}]=\bigl \{\{\langle w,0\rangle \},\{\langle w,1\rangle \}\): Then \({{\mathcal {A}}}_{k_j+\nu }\) is indeed locally consistent, since it contains either \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,0\rangle ,\langle w,1\rangle \}\).

\(\mathsf {A}_{k_j+\nu }[\{w\}]=\bigl \{ \{\langle w,1\rangle \}\bigr \}\): Then also \(\mathsf {A}_{k_j+\nu -1}[\{w\}]=\bigl \{\{\langle w,1\rangle \}\bigr \}\) and w cannot be among \(x_0,\ldots ,x_{\nu -1}\), since this would contradict the second part of (\(I_{\nu -1}\)). In particular, (9) holds and so local consistency is preserved.

\(\mathsf {A}_{k_j+\nu }[\{w\}]=\bigl \{ \{\langle w,0\rangle \}\bigr \}\): We can assume that (9) does not hold. Then there exists \(p\in \{0,\ldots ,\nu -1\}\) such that \(x_j=w\). Therefore, \({{\mathcal {A}}}_{k_j+p}\supset {{\mathcal {A}}}_{k_j+\nu }\) cannot contain \(\{\langle v,0\rangle ,\langle w,1\rangle \}\). So, \({{\mathcal {A}}}_{k_j+\nu }\) must contain \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) and local consistency is preserved.

Before we proceed further, note that for every \(\nu \le \mu \),

(\(J_\nu \)):

for every vertex v there is at most one edge \(D=\{v,w\}\) such that \(\mathsf {A}_{k_j+\nu }[\{v\}]\) contains an atom incompatible with all atoms in \(\mathsf {A}_{k_j+\nu }[D]\).

Indeed, by (P3), this clearly holds for \(\nu =0\). Also, if \(x_\nu \) is an edge, then the ordering conditions we imposed on the queue \(\mathsf {K}\) ensure that the atoms of no other edge can be added to \(\mathsf {K}\) and subsequently modified, before each vertex (adjacent to \(x_\nu \)) that can have incompatible atoms with that for \(x_\nu \) is added to \(\mathsf {K}\) and subsequently modified, so that the potential incompatibilities are removed.

Finally, consider \(x_{\nu }\) being a vertex v. Then we must have had \(\mathsf {A}_{k_j+\nu -1}[\{v\}]= \big \{ \{\langle v,0\rangle \}, \{\langle v,1\rangle \}\big \}\). Moreover, \(\mathsf {A}_{k_j+p}[D]\subsetneq \mathsf {A}_{k_j+p-1}[D]\). Also, by (\(J_\nu \)), such p is unique. Therefore, \({{\mathcal {A}}}_{k_j+\nu }\) must be locally consistent, since the only potential local inconsistency in \({{\mathcal {A}}}_{k_j+\nu }\) could be between v and \(\{v,w\}\). But our choice of \(\mathsf {A}_{k_j+\nu }[\{v\}]\subset \mathsf {A}_{k_j+\nu -1}[\{v\}]=\big \{ \{\langle v,0\rangle ,\langle v,1\rangle \}\big \}\) ensures that such inconsistency cannot occur.

Notice also that the second part of (\(I_\nu \)) holds as well. Indeed, this is satisfied in void when there is no vertex among \(x_0,\ldots ,x_{\nu -1}\). So, assume that such vertex exists. Then, w, the second vertex of the above chosen edge \(x_p=D=\{v,w\}\), must be among such \(x_0,\ldots ,x_{\nu -1}\). Indeed, if \(p=0\) then we must have \(\nu =2\) and \(x_1=w\). Since \(i_w=0\), we must have \(\mathsf {A}_{k_j}[D]\subset \bigl \{ \{\langle v,0\rangle ,\langle w,0\rangle \}, \{\langle v,1\rangle ,\langle w,0\rangle \} \bigr \}\). Also, as \(\mathsf {A}_{k_j+2}[\{v\}]\subsetneq \mathsf {A}_{k_j+1}[\{v\}]\), the bucket \(\mathsf {A}_{k_j+1}[D]=\mathsf {A}_{k_j}[D]\) must contain precisely only one of the atoms \(\{\langle v,0\rangle ,\langle w,0\rangle \}\) or \(\{\langle v,1\rangle ,\langle w,0\rangle \}\). However, \(\mathsf {A}_{k_j}[D]\) cannot be equal to the set \(\big \{ \{\langle v,1\rangle ,\langle w,0\rangle \}\big \}\), since, by (P0), this would mean that \(\mathsf {A}_{k_j-1}[D]=\bigl \{\{\langle v,0\rangle ,\langle w,1\rangle \},\{\langle v,1\rangle ,\langle w,1\rangle \}\bigr \}\). But this contradicts (P3). So, \(\mathsf {A}_{k_j+1}[D]=\bigl \{ \{\langle v,0\rangle ,\langle w,0\rangle \}\bigr \}\), and indeed \(i_v=0\).

Finally, assume that \(p>0\). Then \(w=x_q\) for some \(q\in \{0,\ldots ,p-1\}\) and so \(\mathsf {A}_{k_j+q}[\{w\}]=\bigl \{\{\langle w,0\rangle \}\bigr \}\). Thus, \(\mathsf {A}_{k_j+p}[D]\subset \bigl \{\{\langle v,0\rangle ,\langle w,0\rangle \},\{\langle v,1\rangle ,\langle w,0\rangle \}\bigr \}\) and \(\mathsf {A}_{k_j+\nu -1}[D]\) must contain precisely one of these atoms to ensure that the inclusion \(\mathsf {A}_{k_j+\nu }[\{v\}]\subsetneq \mathsf {A}_{k_j+\nu -1}[\{v\}]\) holds. We need to show that the equality \(\mathsf {A}_{k_j+p}[D]=\bigl \{\{\langle v,1\rangle ,\langle w,0\rangle \}\bigr \}\) is impossible. Indeed, this would imply that \(\mathsf {A}_{k_j+q-1}[D]\subset \bigl \{\{\langle v,1\rangle ,\langle w,0\rangle \},\{\langle v,0\rangle ,\langle w,1\rangle \}, \{\langle v,1\rangle ,\langle w,1\rangle \}\} \bigr \}\) and using the property (P0), also that \(\mathsf {A}_{k_j+q-1}[D]\subset \bigl \{\{\langle v,1\rangle ,\langle w,0\rangle \},\{\langle v,1\rangle ,\langle w,1\rangle \}\bigr \}\). However, this means that \({{\mathcal {A}}}_{k_j+q-1}\) already decided the value of \(\lambda (v)\) as 1. Since the value of \(\lambda (w)\) was previously decided, the reasoning as for (\(J_\nu \)) shows that v should appear already in \(x_0,\ldots ,x_{q}\), while \(q<\nu \) contradicts this. This finishes the proof of (P1)-(P3). \(\square \)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Malmberg, F., Ciesielski, K.C. Two Polynomial Time Graph Labeling Algorithms Optimizing Max-Norm-Based Objective Functions. J Math Imaging Vis 62, 737–750 (2020).

Download citation


  • Energy minimization
  • Pixel labeling
  • Minimum cut
  • NP-hard

Mathematics Subject Classification

  • 68Q25
  • 68W40
  • 68R10