Minimizing Energies with Hierarchical Costs
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s11263-012-0531-x
- Cite this article as:
- Delong, A., Gorelick, L., Veksler, O. et al. Int J Comput Vis (2012) 100: 38. doi:10.1007/s11263-012-0531-x
- 10 Citations
- 665 Views
Abstract
Computer vision is full of problems elegantly expressed in terms of energy minimization. We characterize a class of energies with hierarchical costs and propose a novel hierarchical fusion algorithm. Hierarchical costs are natural for modeling an array of difficult problems. For example, in semantic segmentation one could rule out unlikely object combinations via hierarchical context. In geometric model estimation, one could penalize the number of unique model families in a solution, not just the number of models—a kind of hierarchical MDL criterion. Hierarchical fusion uses the well-known α-expansion algorithm as a subroutine, and offers a much better approximation bound in important cases.
Keywords
Energy minimizationHierarchical modelsGraph cutsMarkov random fields (MRFs)Segmentation1 Introduction
Energy minimization is of strong practical and theoretical importance to computer vision. An energy expresses our criteria for a good solution—low energies are good, high energies are bad—independent of any algorithm. Algorithms are however hugely important in practice. Even for low-level vision problems we are confronted by energies that are computationally hard (often NP-hard) to minimize. As a consequence, a significant portion of computer vision research is dedicated to identifying energies that are useful and yet reasonably tractable. Our work is of precisely this nature.
Computer vision is full of ‘labeling’ problems cast as energy minimization. For example, the data to be labeled could be pixels, interest points, point correspondences, or mesh data such as from a range scanner. Depending on the application, the labels could be either semantic (object classes, types of tissue) or describe geometry/appearance (depth, orientation, shape, texture).
There are many labeling problems for which the labels naturally form groups. In computer vision, a recent trend is the use of ‘context’ to resolve ambiguities in object recognition (e.g. Choi et al. 2010; Ladický et al. 2010a; Zhou et al. 2011). The idea is that certain groups of labels are self-consistent because they tend to appear together, e.g. the {car,road,sky} labels all belong to the “outdoors” context, while {table,chair,wall} all belong to the “indoors” context. In computer graphics, one may wish to automatically classify the faces of a 3D mesh into semantic parts for the benefit of artists and animators (Kalogerakis et al. 2010). The part labels arm, tail, and wheel naturally belong to different groups based on their context (humanoid, quadruped, vehicle). In operations research, facility location can be cast as a labeling problem, and hierarchical variants have been studied (Svitkina and Tardos 2006; Sahin and Süral 2007). All of these disparate labeling problems are similar from an optimization point of view.
When labels are explicitly grouped in a hierarchy, the costs in the energy are naturally structured. In this work, we characterize a class of energies as having hierarchical costs. If an energy satisfies our “h-metric” and “h-subset” conditions, then we can often minimize it much more effectively. We provide a novel hierarchical fusion (h-fusion) algorithm to minimize our class of energies. Our algorithm generalizes the well-known α-expansion algorithm (Boykov et al. 2001) yet we provide better empirical performance and a better approximation bound. The improved theoretical guarantees are important because, in practice, α-expansion can easily get stuck in poor local minima for this useful class of energies; to the best of our knowledge, our h-fusion algorithm is state of the art. Like the original fusion algorithm (Lempitsky et al. 2010) ours is highly parallelizable.
With respect to our energy itself, the most relevant work is the “label costs” of Delong et al. (2012). Our notion of hierarchical costs is a special case of their energy, yet it is important to explicitly characterize this subclass because, as we show, it permits a much better minimization algorithm. With respect to our algorithm, by far the most closely related work is the r-hst metrics of Kumar and Koller (2009). We review both these works in some detail.
2 Review of Related Work
First we review energies with “label costs” as described in Delong et al. (2012). Let the set \(\mathcal{P}\) index the data that needs to be labeled, and let \(\mathcal{L}\) be the set of possible labels. A labeling is any complete assignment \(f = (f_{p})_{p \in \mathcal{P}}\) where variable \(f_{p} \in\mathcal{L}\) designates the label assigned at index p. For example if \(\mathcal{P}= \{p,q\}\) and \(\mathcal{L}=\{\ell_{1},\ell _{2}\}\) then a valid labeling might be f=(ℓ_{2},ℓ_{1}) where f_{p}=ℓ_{2},f_{q}=ℓ_{1}.
Paper | V | H | Algorithms | Applications |
---|---|---|---|---|
Zhu and Yuille (1996) | Semi-metric | Per-label | Region merging | Unsupervised segmentation |
Torr (1998) | × | Per-label | Expectation maximization+pruning | Model selection, motion estimation |
Boykov et al. (2001) | Metric, semi-metric | × | α-expansion, αβ-swap | Stereo, denoising |
Kolmogorov (2006) | Arbitrary | × | Tree-reweighted message passing | Stereo |
Li (2007) | × | Per-label | LP relaxation+rounding | Motion estimation |
Lazic et al. (2009) | × | Per-label | Belief propagation | Motion estimation |
Kumar and Koller (2009) | r-hst metric | × | Hierarchical graph cuts | Denoising, scene registration |
Delong et al. (2011) | Metric, semi-metric | Any subsets | α-Expansion, αβ-swap, greedy FL | Homography detection, motion estimation, unsupervised segmentation |
Barinova et al. (2010) | × | Per-label | Greedy facility location (FL) | Object detection |
Ladický et al. (2010a) | Metric, semi-metric | Parsimonious^{∗} | α-Expansion, αβ-swap | Object recognition |
This work | h-metric | h-Subsets | h-Fusion w/α-expansion | Unsupervised segmentation |
Definition 1
These assumptions were originally outlined by Boykov et al. (2001) as sufficient conditions for their α-expansion and αβ-swap algorithms. These conditions arise because of the inherent limitations of graph cut methods (Boykov and Jolly 2001; Kolmogorov and Zabih 2004). Because our algorithmic approach is also based on graph cuts, we shall see a similar kind of limitation arise in Sect. 4.1.
We note that, with the exception of Zhu and Yuille (1996), all the works listed in Table 1 are of a discrete nature where \(\mathcal{P}\) is a finite set. A number of variational formulations of E have recently been developed with continuous analogues of D+V (e.g. Pock et al. 2008, 2009; Olsson et al. 2009) and of D+V+H (Yuan and Boykov 2010). Our main ideas also apply to such continuous formulations, but we focus on the discrete setting.
Energies of the form (1) are NP-hard to minimize in all but a few special cases. Even D+V is NP-hard to minimize for \(|\mathcal{L}|\geq3\) by reduction from 3-Terminal-Cut (Boykov et al. 2001); in fact this case is max-SNP-hard (Cunningham and Tang 1999), meaning there is some ϵ>0 for which no polynomial-time (1+ϵ)-approximation algorithm can exist (i.e. no polynomial-time approximation scheme (PTAS)). The D+H case is NP-hard by straight-forward reduction from Set-Cover using only per-label costs H(ℓ). A hardness result for approximating Set-Cover by Feige (1998) implies that D+H cannot be approximated within a ratio of \((1-\epsilon)\ln|\mathcal{P}|\) in polynomial time unless the complexity class NP ⊆ DTIME[n^{O(loglogn)}], i.e.NP would have to be only slightly super-polynomial, which is currently deemed unlikely. This observation will help to put the approximation bound for our algorithm into perspective.
Observation 1
Feige’s hardness result is evidence that no polynomial-time algorithm can minimize energy (1) within a constant ratio of the optimum.
From an algorithmic standpoint, the most similar work to ours is a recent paper by Kumar and Koller (2009). Their aim is to efficiently minimize energies of the form D+V. They consider the class of r-hierarchically well-separated tree (r-hst) metrics (Bartal 1998) which are a special case of metrics defined above. We discuss r-hst metrics in Sect. 8, but for now it is enough to understand that they are a special case of metrics and that each has an associated constant r>1. Kumar and Koller provide an algorithm that, for a particular r-hst metric, provides an \(\frac{2r}{r-1}\)-approximation to the globally optimal labeling. Although this coefficient is very large for r≈1, the approximation only depends on V (not on \(|\mathcal{P}|\) or \(|\mathcal{L}|\)). In some cases this ratio is better than the well-known bound for the α-expansion algorithm (Boykov et al. 2001) and the \(O(\lg|\mathcal{L}|\lg\lg|\mathcal{L}|)\) bound for linear programming relaxation (Kleinberg and Tardos 2002).
Kumar and Koller describe their algorithmic process as hierarchical graph cuts. This does not refer to computing a graph cut in a hierarchical manner, but rather to minimizing an energy D+V via a hierarchical sequence of standard graph cuts. They show that the r-hst metric assumption on V is sufficient to apply their algorithm. We aim to minimize energies of the form D+V+H and, motivated by the difficult H term, have independently developed an algorithm we call hierarchical fusion (h-fusion). However, at the highest conceptual level our algorithm is the same as theirs—it is only our class of energies and our sequence of subproblems that is different, each of which we solve with the extended α-expansion of Delong et al. (2012). The h-fusion process will be explained in Sect. 5.
Also worth mentioning is a recent work by Felzenszwalb et al. (2010) concerning energies of the form E=D+V. By making the strong assumption that bothD and V are tree metrics, they can compute a global optimum. However, most applications do not satisfy the metric assumption on data costs D. Note that our work makes no such assumption. We discuss tree metrics in Sect. 4.1.
Our contributions
- (a)
we define h-metric smoothness costs V, a wider class than tree metrics yet still sufficient for our h-fusion algorithm to apply,
- (b)
we define h-subset label costs H, a sufficient condition to apply h-fusion with high-order label costs,
- (c)
we prove that the approximation bound of h-fusion is much better than α-expansion in important cases, and
- (d)
we provide worst-case examples to show that our theoretical bound is tight in some reasonable sense.
The remainder of this paper is organized as follows. Section 3 reviews the α-expansion algorithm in some detail. Section 4 then introduces our notion of hierarchical costs, a useful subclass of energy (1). Section 5 describes our h-fusion algorithm, and Sect. 6 derives its approximation bound. Section 7 gives some experiments to suggest how our energies and algorithm work. Finally, Sect. 8 discusses other applications, relations to facility location, and extensions.
3 Review of α-Expansion
The algorithm we introduce in this paper uses the well-known α-expansion algorithm (Boykov et al. 2001) as a key subroutine. The algorithm was designed for energies of the form D+V, though we employ an extension to D+V+H by Delong et al. (2012). Our approximation bound is therefore intricately linked with α-expansion and its limitations, so we review the algorithm here. Readers familiar with α-expansion may skip ahead to Sect. 4.
3.1 How α-Expansion Works
3.2 Graph Cuts and the Limits of α-Expansion
From Table 1 we see that α-expansion is applicable if V is a metric (Definition 1). We briefly review how this limitation arises, as it will be relevant to our new h-fusion algorithm.
Observation 2
The metric assumption is sufficient for (6) to hold, and so α-expansion is applicable if V is metric energy.
Rother et al. (2005) showed that, by assuming a non-arbitrary initial labeling, α-expansion can be applied to a wider class of energies: for each \(\beta,\gamma\in\mathcal{L}\), either V must satisfy (6) for all \(\alpha\in\mathcal{L}\), or V(β,γ)=∞. Unfortunately, α-expansion offers no approximation guarantees for this extended class (Theorem 1 below). In this paper we define a class of smoothness costs V called h-metrics, and it too can be extended to non-metric infinities this way. However, we aim to quantify approximation bounds for our algorithm so, for simplicity, we will not include such infinities in our definition of h-metrics.
Delong et al. (2012) showed that α-expansion is also applicable to energies with label costs as long as H(L)≥0 for each label subset \(L \subset\mathcal{L}\). The expansion step requires a binary energy of the form E′(x)=D′(x)+V′(x)+H′(x) where H′ defines very high-order potentials over x, unlike V′ which defines only ‘quadratic’ (pairwise) potentials. We will use their construction in our main subroutine.
3.3 Approximation Bounds of α-Expansion
Local search with expansion moves is guaranteed to terminate at a local minimum \(\hat{f}\) that is within a constant factor of the global optimum f^{∗} (Veksler 1999; Boykov et al. 2001). The actual bound is \(E(\hat{f}) \leq2c E(f^{*})\) where c≥1 is some constant that depends on V. If c is small, then we can expect α-expansion to do at least a reasonable job. If c is large, the bound is meaningless and we have even more reason to try other algorithms (e.g. Kolmogorov 2006).
Understanding the approximation bound of α-expansion will be helpful for understanding our generalized bound in Sect. 6. The following holds for any energy E=D+V with^{1}D_{p}(⋅)≥0 and metric V(⋅,⋅).
Theorem 1
(Veksler 1999)
Delong et al. (2012) showed that incorporating label costs H(⋅) into α-expansion can worsen the above bound. If arbitrary label costs H(L)≥0 are assumed on arbitrary subsets \(L \subset\mathcal{L}\) then the bound is as follows.
Theorem 2
(Delong et al. 2012)
This tells us that, if arbitrary label costs are assumed, standard α-expansion is no longer a constant-ratio approximation algorithm (recall Observation 1) and furthermore the bound gets worse (c_{2} gets larger) if costs are defined on large subsets L.
4 Energies with Hierarchical Costs
We wish to minimize an energy of the general form (1), but we assume the labels are grouped in some kind of hierarchy. Depending on the application, the grouping will likely be either semantic (hierarchy of object labels) or geometric (families of geometric models). One option for minimizing such energies is to ignore the grouping and simply apply α-expansion. However, Theorem 2 suggests caution, because α-expansion finds poor local minima when, for example, strong high-order label costs are involved.
Merely declaring the labels to be ‘grouped’ does not in itself change energy E(f) (we still have \(f_{p} \in\mathcal{L}\)) nor is the standard α-expansion algorithm ‘aware’ of a label hierarchy. However, in Sects. 4.1–10 we describe energies for which a ‘good’ tree can be defined so that our h-fusion algorithm (Sect. 5) is provably better than α-expansion. In fact if one defines a flat tree (\(\mathcal{S}= \{\,\}\)) then our algorithm and approximation bounds all reduce to those of Boykov et al. (2001) and Delong et al. (2012) as a special case.
Sections 4.1 and 10 develop key definitions: the class of h-metrics for smoothness costs V, and the class of h-subsets for label costs H. These definitions are directly motivated by our h-fusion algorithm and how its computation is organized. If an energy satisfies our h-metric and h-subset assumptions, then it can be minimized by h-fusion (Sect. 5).
4.1 Hierarchical Smoothness Costs (h-Potts, h-Metrics)
Definition 2
(Delong et al. 2011)
Given tree structure π, for each node \(i \in\mathcal{T}\) let w_{i}≥0 be its transition cost so that V(α,β)=w_{lca(α,β)} for all \(\alpha,\beta\in\mathcal{L}\) and w_{i}=0 for each leaf \(i \in\mathcal{L}\). We then say that (V,π) forms a hierarchical Potts (h-Potts) potential.
For example, Delong et al. (2011) use a two-level tree where w_{r} is the transition cost between ‘super-labels’ and each w_{i} for \(i \in\mathcal{S}\) is the transition cost between ‘sub-labels’ in group i. They show that if w_{i}≤2w_{r} then V is metric and standard α-expansion can be applied. For our h-fusion algorithm to apply, a simple sufficient condition is that w_{i}≤w_{π(i)} for all \(i \in\mathcal{L}\cup\mathcal{S}\).
Now we define h-metrics, a class of hierarchical smoothness costs where V is not necessarily parameterized by w_{i}. As we shall see in Sect. 5, the h-metric assumption is necessary for our specialized algorithm.
Definition 3
4.2 Hierarchical Label Costs (h-Subsets)
We have already defined a notion of ‘hierarchical’ smoothness costs (h-metrics), and we now do the same for label costs. As we shall see, if an energy E(f) has hierarchical costs with respect to some tree, then E(f) can be minimized by our h-fusion algorithm on that tree (Sect. 5).
Definition 4
Definition 5
Given a tree π we say that (H,π) form hierarchical label costs if \(H(L)>0 \Rightarrow L \in\mathcal{H}\), i.e. if label costs appear only on the h-subsets.
Note that for a flat tree, the set \(\mathcal{H}= 2^{\mathcal{L}}\) and so all subsets are considered ‘hierarchical’ in this degenerate case.
5 Hierarchical Fusion Algorithm (h-Fusion)
Recall that the α-expansion algorithm (Sect. 3, Boykov et al. 2001) minimizes a multi-label energy by constructing a sequence of binary energies. Our h-fusion algorithm constructs a hierarchical sequence of multi-label energies, each of which is solved by running α-expansion as a subroutine. These intermediate multi-label energies are designed to ‘stitch’ or to ‘fuse’ labelings that were computed earlier in the sequence. As we shall see, this procedure provides better optimality guarantees than α-expansion for a wide class of energies, particularly those with strong label costs.
handle energies with label costs (E=D+V+H),
characterize the subclass of energies (h-metrics, h-subsets) for which h-fusion is applicable, and
prove approximation bounds that generalize and improve upon those of α-expansion.
The key question for h-Fusion is how to set up E′ (line 5) so that it encodes our original energy E over all possible fusions, i.e. over all labelings in \(\mathcal{M}(\{ \hat{f}^{i}\}_{i \in\mathcal{I}(j)})\). Given a set of labelings \(\{ \hat{f}^{i} \}_{i \in\mathcal{I}(j)}\) there is a one-to-one correspondence between mappings \(g : \mathcal{P} \rightarrow\mathcal{I}(j)\) and labelings \(f \in \mathcal{M}(\{\hat{f}^{i}\}_{i \in\mathcal{I}(j)})\). We let f(g) be the labeling \(f_{p} = \hat{f}_{p}^{g_{p}}\) corresponding to g. We can then design an unconstrained energy E′ such that E′(g)=E(f(g)) for all g.
The correctness of D′ and V′ are self evident; the algorithm of Kumar and Koller (2009) includes lines 1–2 but on a more restrictive class of metrics. However, our work was mainly motivated by label costs. It is not obvious how lines 4–6 encode original label cost H(L) as a local label cost \(H_{P_{I}}'(I)\). We now verify its correctness, first by simple example and then by proving it in general.
Example 1
Theorem 3
If energyEhas hierarchical label costs (H,π) thenE′(g)=E(f(g)) for all\(g : \mathcal{P} \rightarrow\mathcal{I}(j)\).
Proof
- 1.
If \(L \cap\mathcal{L}_{i} = \emptyset\) for all \(i \in \mathcal{I}(j)\), then we know that any \(\hat{f}^{g_{p}}_{p} \notin L\), then the cost H(L) is not applied in subtree j. By definition I=∅ ensuring δ_{L}(f(g))=δ_{∅}(g_{P})=0, which is correct.
- 2.
If \(L \subset\mathcal{L}_{i}\) for some \(i \in\mathcal {I}(j)\), then by definition I={i} and \(P = \{ p : \hat{f}^{i}_{p} \in L \}\). Since \(g_{p} \neq i \Rightarrow\hat{f}^{g_{p}}_{p} \notin L\) and so f(g) contains a label in L if and only if g_{p}=i for some p∈P. Therefore δ_{L}(f(g))=δ_{i}(g_{P}) holds in this case.
- 3.
If \(\mathcal{L}_{i} \subseteq L \subset\mathcal{L}_{j}\) for some \(i \in\mathcal{I}(j)\), then clearly \(P = \mathcal{P}\). Since \(L \in\mathcal{H}\) we must also have \(L = \bigcup_{i \in I} \mathcal{L}_{i}\), and so f(g) uses a label in L if an only if g uses a label in I. Therefore δ_{L}(f(g))=δ_{I}(g) holds in this case.
- 4.
If \(L \supseteq\mathcal{L}_{j}\) then \(I = \mathcal{I}(j)\) and \(P=\mathcal{P}\), so H(L) can be added to E′ as a constant or simply ignored. □
Looking at the proof of Theorem 3 we can see that the structure of h-subsets is especially needed for the third case to hold. If we allow a subset \(L \notin\mathcal{H}\), then α-expansion could not be applied to the resulting E′ because the internal binary steps would be non-submodular and potentially NP-hard. The purpose of Example 2 is to demonstrate why H(L)>0 for arbitrary L can be problematic.
Example 2
Finally, we establish that h-metrics give a precise characterization of smoothness costs V that h-fusion can handle.
Theorem 4
Theh-fusion algorithm is applicable toVusing treeπif and only if (V,π) forms anh-metric.
Proof
6 Approximation Bounds of h-Fusion
Our goal is to derive a generalized bound for h-fusion with arbitrary tree π and arbitrary label costs (i.e.\(\mathcal{H}= 2^{\mathcal{L}}\) in (1)). Like the α-expansion bound in Theorem 2, the quality of our new bound will involve some c and c_{2} that depend on the particular energy. As we shall see, these two coefficients can be much smaller for our algorithm. We begin by defining some useful quantities for expressing the h-fusion bound.
Definition 6
In other words, \(V^{\max}_{i}\) is the maximum cost for any pair of labels in the subtree of node i, and \(V^{\min}_{i}\) is the minimum cost for two labels from different subtrees descended from i. For example, in Fig. 3a we have \(V ^{\max}_{A} = 2, V^{\min}_{A}=1\) and \(V^{\max}_{B}=4,V ^{\min }_{B}=2\). For the root node r, this example has \(V^{\max}_{r}=4\) and \(V^{\min}_{r} = 3\).
Observation 3
Ifπdefines a flat tree, thencin Definition 7 reduces to quantity (16) from theα-expansion bound.
We now consider label costs in h-fusion and generalize the related coefficient c_{2}. The cardinality of set \(I \subset\mathcal{I}(j)\) on line 4 of ConstructFusionEnergy is an important quantity affecting c_{2} for h-fusion. In general, the smaller |I| the better the bound. (Note that we use ⊂ to mean ⊆̷ throughout this paper.)
Definition 8
The fact that I(L) will always be the union of some siblings in the tree follows from the assumption that L is an h-subset.
Observation 4
Ifπdefines a flat tree, thenc_{2}in Definition 8 reduces to the same quantity forα-expansion in Theorem 2.
To see how h-fusion can beat α-expansion at minimizing energies with label costs, consider the following worst-case example for α-expansion.
Example 3
Using our definitions of c and c_{2} we state the main theorem of this work: an improvement upon the bound of Delong et al. (2012). For the purposes of the bound we assume D_{p}(⋅)≥0 and that V is semi-metric.
Theorem 5
Proof
See Appendix B. □
In the presence of arbitrary label costs, this is still not a constant-ratio approximation bound, but we can construct a worst-case example to show that our bound is indeed tight (see Delong 2011).
7 Application: Hierarchical Color Segmentation
We use hierarchical color segmentation as a simple, illustrative example because it allows us to visualize the effects of hierarchical smooth and label costs. Given an image we wish to group pixels with similar color. We treat segmentation as labeling where each label represents a color; the labels essentially re-colorize the image. However, we explicitly divide the possible colors into groups, and seek a pixel labeling that relies only on a few groups of colors. For example, a natural way to group colors is by hue, and the goal is then to re-color the image using as few hues as possible while staying reasonably faithful to the original image.
In order to limit the number of hues used in re-colorization we introduce group costs in addition to regular label costs. A cost is associated with each group of colors L and is represented by a label subset cost H(L)>0. It is paid whenever any of the colors in the group is used in the labeling. For the smoothness costs V we use hierarchical Potts smoothness terms between and within color groups to encourage smooth re-colorization. If I_{p} is an image pixel, the data cost D_{p}(ℓ) is proportional to squared distance between I_{p} and the color represented by label ℓ.
We thereby formulate the re-colorization problem as hierarchical energy. We compare α-expansion and h-fusion when applied to this energy. In all the re-colorization experiments our color hierarchy consists of 121 groups of colors, each containing 20 different shades varying from dark to bright. This results in 2420 labels in total. We then demonstrate qualitative (the resulting re-colorizations) and quantitative (running time and energy value) comparisons. In all experiments we set w_{i}=1 and w_{r}=2. Each invocation of α-expansion performed two cycles only (a cycle expands on each label exactly once). This limitation was applied to all instances of α-expansion within h-fusion as well. (Allowing α-expansion to converge takes much longer but only decreases the energy by <0.01 % for both algorithms.)
The plot in Fig. 7 provides quantitative comparison between α-expansion and h-fusion in terms of running time and energy values. The blue line corresponds to energy value attained by α-expansion as a function of time. The h-fusion algorithm begins with optimizing a set of sub-energies corresponding to child-labelings. Each child labeling is restricted to one sub-tree of labels and essentially re-colors the image with the colors from that group only. (For example one child-labeling re-colors the image with the shades of red, another with the shades of green, …) The sub-energies are independent and can be optimized either sequentially or in parallel. When sub-energies are optimized sequentially we represent each sub-energy with a pink diamond and plot them as a function of cumulative time. After all sub-energies are optimized, h-fusion algorithm fuses the resulting child-labelings by running α-expansion (starting from the child-labeling with the minimal energy. Again we limit the h-fusion to two cycles only). The energy of h-fusion is represented by the red line and attains a lower energy than regular α-expansion.
Unlike α-expansion, the running time of h-fusion can be dramatically improved by minimizing sub-energies in parallel. This is illustrated by the green line in the plot of Fig. 7. In our specific application the parallel version of h-fusion is faster by a factor of 10–15 compared to sequential h-fusion. At any time in our experiments, the energy curve of parallel h-fusion is dramatically below that of α-expansion, and terminates 20–30 times faster. In theory this speed-up factor should grow linearly with the number of siblings at each level of the label hierarchy. The speed-up is due to the fact that running one expansion cycle with h-fusion is more efficient than with regular α-expansion. This is because the number of unique possible labels in h-fusion corresponds to the number of groups in the hierarchy (121 in our case) while the number of unique labels for α-expansion corresponds to the number of leaves in the hierarchy (2420 colors in our case).
8 Discussion
The main results of this paper are a characterization of hierarchical costs (h-metrics and h-subsets), the h-fusion algorithm itself, and a significant improvement on the approximation bound of α-expansion. These results are theoretical, but we foresee a number of applications for such energies.
Applications of Hierarchical Costs
We presented hierarchical color segmentation as the simplest possible example that illustrates (a) the nature of energies with hierarchical costs, and (b) the qualitative and quantitative benefits of h-fusion for such energies. However, computer vision is full of problems for which hierarchical costs are natural.
Furthermore, hierarchical costs can be useful for detecting patterns, for compression, and for learning a database of inter-dependent patches from images (Gorelick et al. 2011). Outside vision, Sefer and Kingsford (2011) showed that the r-hst metrics of Kumar and Koller are effective at identifying protein function; our work could extend their results.
Relation to r-hst Metrics
Recall that, at a high level, the h-fusion process shown in Fig. 5 is the same as that used by Kumar and Koller (2009). Given a metric V, they find the set of r-hstmetrics that best approximates V and try to minimize an energy of the form E=D+V using a bottom-up fusion process. The main idea of an r-hst metric is as follows. Assume we are given a tree with distances d(i,j) defined on each edge from child i to parent j. Assume that the distance from j to all its children is uniform, i.e.d(i,j)=d(i′,j) for all \(i,i' \in\mathcal{I}(j)\). Further assume that we know the parent-to-child distance gets cheaper by a factor of r as we descend the tree, i.e.\(\frac{d(i,j)}{d(k,i)} \ge r\) for some constant r>1. The total distance between two leaf nodes α and β is the cumulative sum of edge distances along the path from α to β in the tree. If the ‘costs’ of a pairwise potential V(α,β) correspond to such a distance function for all α,β, then V is said to be an r-hst metric.
Our concept of an h-metric is expressed directly in terms of constraints on V(⋅,⋅), not on edges or distances traversed in the tree. Furthermore, r-hst metrics are a strict subset of h-metrics (see Appendix A).
Generalizing Facility Location
There exist variants of UFL that allow for a hierarchy of facilities, e.g. Svitkina and Tardos (2006) and Sahin and Süral (2007). This generalization allows for more realistic modeling of complex interdependencies between facilities themselves. Some of these works derive constant-factor approximation bounds for hierarchical facility location, e.g. Kantor and Peleg (2009), but all such works assume metric client costs where the costs D_{p}(⋅) are computed as distances from a particular center. Without this assumption, Feige’s hardness result still holds. Strategies for optimizing hierarchical UFL include linear programming relaxation, primal-dual algorithms and, very recently, message passing algorithms (Givoni et al. 2011).
We can encode a kind of hierarchical facility cost with our framework as follows. Suppose facilities ℓ_{1} and ℓ_{2} require the services of facility ℓ_{3}, which costs 50 to open. A label cost H({ℓ_{1},ℓ_{2},ℓ_{3}}):=50 correctly accounts for the shared dependency of ℓ_{1} and ℓ_{2} on ℓ_{3}. If we furthermore have a facility ℓ_{4} that depends on both ℓ_{3} and some facility ℓ_{5} (cost 80), then our label costs should instead be H({ℓ_{1},ℓ_{2},ℓ_{3},ℓ_{4}}):=50 and H({ℓ_{4},ℓ_{5}}):=80.
Furthermore, our h-fusion algorithm can handle smoothness costs V, which to the best of our knowledge are novel for UFL. In the UFL setting, V(f_{p},f_{q}) can encode an explicit preference that clients p and q be serviced by the same facility. When clients are social, there are many scenarios where such a preference makes sense. When client costs D are metric (e.g. Euclidean distance) then this preference is implicitly encoded in D. However, when the client costs are not metric, such as clients connected by an irregular network despite being physically close, then our smoothness costs V may be useful for modeling such problems.
Improving the Bound
Recall from Observation 1 that for the Set-Cover problem the best we can hope for is a \(\ln|\mathcal{P}|\)-approximation. Yet one can formulate Set-Cover using an energy of the form (18), so minimizing energy E=D+V+H is at least as hard. However, Hochbaum (1982) gave a simple greedy algorithm for Set-Cover and proved that it yields precisely a \(\ln |\mathcal{P}|\)-approximation, the best possible according to Feige (1998). If label costs are arbitrary in (18), then α-expansion’s bound is also arbitrarily bad. So, there is a huge gap between what α-expansion can achieve on (18) versus what Hochbaum’s greedy algorithm can guarantee. For energies of the form E=D+H, it may be possible to extend Hochbaum’s algorithm and use it as a subroutine within h-fusion(rather than using α-expansion). One may ask if h-fusion could inherit a better approximation bound in that case. We also do not know if her approach can be applied in the presence of smoothness costs V.
Relation to Genetic Algorithms
Within our framework, the inner α-expansion subroutine is performing a sequence of fusion moves like proposed by Lempitsky et al. (2010). We point out that a binary fusion move is essentially an optimized crossover operation, already used to some success in genetic algorithms (Aggarwal et al. 1997; Meyers and Orlin 2007). A standard concern for genetic algorithms is how to maintain population diversity so that, when two chromosomes (labelings) are crossed, there is a chance that the descendant will be better. Our h-fusion process forces a kind of population diversity based on a tree: the labelings in our multi-label fusion each contain labels from different subtrees. It is interesting that this structured-diversity gives a provably better approximation bound in our case.
Note that α-expansion itself does not require D_{p}(⋅)≥0; this assumption is only needed for analysis of worst-case bounds.
A tree is irreducible if all its internal nodes have at least two children, i.e. there are no ‘redundant’ parent nodes and so for each i there exists some γ,ζ such that lca(γ,ζ)=i.
Due to our assumption that V is semi-metric and so V(ℓ,ℓ)=0, we can simply sum over all \(pq \in \mathcal{A}_{j}\) instead of only where \(f^{*}_{p} \neq f^{*}_{q}\).
Acknowledgements
We wish to thank the anonymous reviewers for careful reading and helpful comments. This work was supported by NSERC Discovery Grant R3584A02, the Canadian Foundation for Innovation (CFI), and the Early Researcher Award (ERA) program.