Abstract
In this work, a method based on optimum cuts in graphs is proposed for unsupervised image segmentation, that can be tailored to different objects, according to their boundary polarity, by extending the Oriented Image Foresting Transform (OIFT). The proposed method, named UOIFT, encompasses as a particular case the single-linkage algorithm by minimum spanning tree (MST), establishing important theoretical contributions, and gives superior segmentation results compared to other approaches commonly used in the literature, usually requiring a lower number of image partitions to isolate the desired regions of interest. The method is supported by new theoretical results involving the usage of non-monotonic-incremental cost functions in directed graphs. The results are demonstrated using a region adjacency graph of superpixels in medical and natural images.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Unsupervised segmentation is an important problem in computer vision, since perceptual grouping plays a powerful role in human visual perception [25]. In this context, the method must decide what are the relevant image regions without user guidance, based on color and texture similarity or local contrast.
The unsupervised over-segmentation of an image into compact regions of similar and connected pixels is commonly called superpixels [1, 22]. It can greatly reduce the computational time of computer vision algorithms, by replacing the rigid structure of the pixel grid [1]. In graph-based methods, it allows the fast creation of a Region Adjacency Graph (RAG), drastically reducing the number of graph elements compared to the graph at the pixel level (Figs. 1a-b).
Several graph-based methods have been proposed for unsupervised segmentation, including watersheds [3], mean cut [24], ratio cut [23], normalized cuts [4, 19], and minimum spanning tree (MST) based methods [7, 9,10,11,12, 26]. For instance, Felzenszwalb and Huttenlocher proposed an efficient segmentation algorithm that evaluates a predicate for measuring the evidence for a boundary between two regions, which produces segmentations satisfying global properties, although based on greedy decisions [9]. Other methods include the usage of component trees [20, 21], which can also be combined with watersheds, allowing the selection of catchment basins according to their extinction values.
Seed-based methods for region-based image segmentation are known to provide satisfactory results for several applications, being usually easy to extend to multi-dimensional images. In this work, we extend a seed-based method, named Oriented Image Foresting Transform (OIFT) [15, 17], to perform unsupervised image segmentation, leading to a new method based on optimum cuts in graphs, named UOIFT, that can be tailored to different objects, according to their boundary polarity. OIFT has been demonstrated to be an effective and efficient solution for the segmentation of a given target object based on user provided seeds, allowing the incorporation of several high-level constraints, including shape constraints [16, 18] and connectivity priors [14].
The proposed method is based on the Image Foresting Transform (IFT) [8] algorithm, which has linearithmic implementations, being much faster compared to other methods based on cuts in graphs [4, 19, 23, 24]. Differently from [13], our method exploits non-monotonic-incremental cost functions in directed graphs.
The proposed method encompasses as a particular case the single-linkage algorithm by MST, establishing important theoretical contributions, and requires a lower number of image partitions to isolate the desired regions of interest as compared to other approaches commonly used in the literature.
Figures 1c–h present the central idea of this work, which is to explore the boundary polarity in the unsupervised segmentation of images in directed graphs. Figure 1a shows a synthetic image containing dark and bright regions to be segmented in five different regions. Regular unsupervised methods, based on undirected graphs, such as watersheds, cannot distinguish the different types of boundary polarity, giving as output a mixture of bright and dark regions, as shown in Figs. 1c-d. Our proposed method can favor a particular polarity, giving the results shown in Figs. 1e-f or Figs. 1g-h.
2 Graph Concepts
We consider a weighted digraph G as a triple \(\langle \varvec{\mathcal{V}}, \varvec{\mathcal{A}}, \omega \rangle \), where \(\varvec{\mathcal{V}}\) is a nonempty set of vertices or nodes, \(\varvec{\mathcal{A}}\) is a set of ordered pairs of distinct vertices called arcs or directed edges, and \(\omega : \varvec{\mathcal{A}} \rightarrow \mathbb {R}\) represents the weights associated to the arcs.
An image can be interpreted as a weighted digraph \(G=\langle \varvec{\mathcal{V}},\varvec{\mathcal{A}},\omega \rangle \), whose nodes \(\varvec{\mathcal{V}}\) are the image pixels (or superpixels) in its image domain and whose arcs are the ordered pairs \(\langle s,t \rangle \in {\varvec{\mathcal{A}}}\) of neighboring pixel (superpixels), e.g., 4-neighborhood in case of 2D images. The digraph G is symmetric if for any of its arcs \( \langle s,t \rangle \in \varvec{\mathcal{A}}\), the pair \( \langle t,s \rangle \) is also an arc of G, but we can have \(\omega (\langle s,t \rangle ) \ne \omega (\langle t,s \rangle )\). The transpose \(G^T\) of G is the unique weighted digraph on the same set of vertices \(\varvec{\mathcal{V}}\) with all arcs reversed compared to the corresponding arcs in G.
For a given image graph \(G=\langle \varvec{\mathcal{V}},\varvec{\mathcal{A}}, \omega \rangle \), a path \({\pi =\langle t_1,t_2,\ldots ,t_n \rangle }\) is a sequence of adjacent nodes (i.e., \(\langle t_i,t_{i+1} \rangle \in \varvec{\mathcal{A}}\), \(i=1,2,\ldots ,n-1\)) with no repeated vertices (\(t_i \ne t_j\) for \(i \ne j\)). A path \({\pi _t=\langle t_1,t_2,\ldots ,t_n = t \rangle }\) is a path with terminus at a node t. When we want to explicitly indicate the origin of the path, the notation may also be used, where s stands for the origin and t for the destination node. A path is trivial when \(\pi _t=\langle t \rangle \). A path \(\pi _t=\pi _s\cdot \langle s,t\rangle \) indicates the extension of a path \(\pi _s\) by an arc \( \langle s,t \rangle \). To notation \(\varPi (G)\) is used to indicate the set of all possible paths in a graph G.
A predecessor map is a function P that assigns to each node t in \({\varvec{\mathcal{V}}}\) either some other adjacent node in \({\varvec{\mathcal{V}}}\), or a distinctive marker nil not in \({\varvec{\mathcal{V}}}\) — in which case t is said to be a root of the map. A spanning forest is a predecessor map which contains no cycles — i.e., one which takes every node to nil in a finite number of iterations. For any node \(t\in {\varvec{\mathcal{V}}}\), a spanning forest P defines a path \(\pi ^{P}_t\) recursively as \(\langle t \rangle \) if \(P(t) = nil\), and \(\pi ^{P}_s\cdot \langle s,t\rangle \) if \(P(t)=s\ne nil\).
A connectivity function \(f: \varPi (G)\rightarrow \mathbb {R}\) computes a value \(f(\pi _t)\) for any path \(\pi _t\), usually based on arc weights. A path \(\pi _t\) is optimum if \(f(\pi _t) \le f(\tau _t)\) for any other path \(\tau _t\) in G. The optimum-path value \(V_{opt}(t)\) is uniquely defined by \(V_{opt}(t) = \min _{\pi _t \in \varPi (G)} \{ f(\pi _t) \}\). An optimum-path forest P is a spanning forest where all paths \(\pi ^{P}_t\) for \(t \in {\varvec{\mathcal{V}}}\) are optimum.
The cost of a trivial path \(\pi _t=\langle t \rangle \) is usually given by a handicap value H(t). For example, \(H(t) = 0\) for all \(t \in \varvec{\mathcal{S}}\) and \(H(t) = \infty \) otherwise, where \(\varvec{\mathcal{S}}\) is a seed set. The costs for non-trivial paths follow a path-extension rule. For example:
The max-arc path-cost function \(f_{\max }\) and the additive path-cost function \(f_{\varSigma }\) with \(\omega (\langle s,t \rangle ) \geqslant 0\) are Monotonic-Incremental cost functions (MI), while \(f_{\omega }\) indicates a non-monotonic-incremental cost function.
The image foresting transform (IFT) [8] (Algorithm 1) computes the path-cost map V, which is precisely \(V_{opt}\) in the case of MI functions [6]. It is also optimized in handling infinite costs, by storing in \(\varvec{\mathcal{Q}}\) only the nodes with finite-cost path, assuming without loss of generality that \(V_{opt}(t) < +\infty \) for all \(t \in {\varvec{\mathcal{V}}}\).
3 Efficient Optimum Cuts in Graphs
For a given partition of the graph nodes in two sets \(\varvec{X}\) and \(\varvec{\mathcal{V}} \setminus \varvec{X}\), let \(\mathcal{C}(\varvec{X}) = \{ \langle s,t \rangle \in \varvec{\mathcal{A}} \mid s \in \varvec{X} ~\text{ and }~ t \notin \varvec{X} \}\) denote the set of arcs in its cut from \(\varvec{X}\) to \(\varvec{\mathcal{V}} \setminus \varvec{X}\). Consider the following energy formulation:
Let \(\mathcal{U}(x, y) = \{ \varvec{X} \subset \varvec{\mathcal{V}} \mid x \in \varvec{X} ~\text{ and }~ y \in \varvec{\mathcal{V}} \setminus \varvec{X} \}\) denote the universe of all possible partitions separating the nodes x and y, where y represents the background. By using x and y as internal and external seeds, respectively, the OIFT algorithm [17] computes an optimum partition \(\varvec{X}_{opt} \in \mathcal{U}(x, y)\) by maximizing the above energy (Eq. 4) in a symmetric directed graph, that is, \(E(\varvec{X}_{opt}) = \max _{\varvec{X} \in \mathcal{U}(x, y)} E(\varvec{X})\). OIFT is build upon the IFT framework by considering the following path function in a symmetric digraph:
where, in this work, we use \(\varvec{\mathcal{S}_1} = \{x\}\) and \(\varvec{\mathcal{S}_0} = \{y\}\). The set \(\varvec{X}_{opt} \in \mathcal{U}(x, y)\) by OIFT is defined from the forest P computed by Algorithm 1 with , by taking the pixels that were conquered by paths rooted in \(\varvec{\mathcal{S}_1} = \{x\}\) [15].
For the purpose of unsupervised segmentation, for a given reference point r in the background, we would like to find a node \(t^{\prime } \in \varvec{\mathcal{V}} \setminus \{r\}\), resulting in a partition of maximum energy among all results in \(\bigcup _{t \in \varvec{\mathcal{V}} \setminus \{r\}} \mathcal{U}(t, r)\). Fortunately, \(t^{\prime }\) can be efficiently obtained by taking , where V is the cost map by IFT using \(f_{max}\) with \(\varvec{\mathcal{S}}=\{r\}\) in the transpose graph, according to Lemma 1 from [5]. This result can be equally obtained by taking as V the cost map by IFT using \(f_{\omega }\) with \(\varvec{\mathcal{S}}=\{r\}\) in the transpose graph, but this later approach has the advantage that it allows us to rank the nodes according to their non-increasing order of values, such that the next cut with maximum energy can be easily selected (Figs. 1d, f, h). In this way we can create a hierarchy of partitions according to the following proposed algorithm:
Algorithm 2 generates a hierarchical segmentation by successive binary divisions, leading at the end to a segmentation with k partitions. Each IFT execution has linearithmic complexity in the number of involved nodes. Since UOIFT is based on multiple OIFTs executions (at each iteration being applied to smaller graphs), we considered a Region Adjacency Graph (RAG), where the regions are the superpixels computed by IFT-SLIC [2, 22] of size \(10\times 10\) pixels, rather than using the pixels directly (Fig. 1b). The initial reference node for the background was taken to be the first top/left superpixel in the image. In order to exploit the boundary polarity, we consider the following arc weight assignment:
where the weights \(\omega (\langle s,t \rangle )\) are a combination of an undirected dissimilarity measure \(|I(t)-I(s)|\) between neighboring superpixels s and t, multiplied by an orientation factor for \(\alpha \in [-1,1]\), such that \(\alpha < 0\) favors the segmentation of dark objects in a brighter background (Fig. 1g) and \(\alpha > 0\) favors the opposite orientation (Fig. 1e), and I(t) is the mean intensity inside superpixel t.
We conducted experiments, comparing the proposed unsupervised segmentation by OIFT with other graph-base methods. In the following, MST denotes the clustering of the previously described RAG nodes, obtained by successive removals of edges of maximum weight from the minimum spanning tree, where \(\omega (\langle s,t \rangle ) = |I(t)-I(s)|\), which is related to the nearest-neighbor (single-linkage) algorithm. FH denotes the unsupervised approach by Felzenszwalb and Huttenlocher [9], which computes a predicate for measuring the evidence for a boundary between two regions based on the minimum spanning tree computed in the RAG graph. EF+WS indicates the IFT-based watershed transform [3], after a volume extinction filter [20] set to preserve k leaves of the Min-tree, in order to consider only the most relevant catchment basins of a morphological gradient by a disk of radius 1. We used the code for the extinction filter available in the iamxt toolbox [21]. Note that Algorithm 2 encompasses as a particular case the single-linkage algorithm (MST) for \(\alpha = 0.0\), since its first step corresponds to a MST computation for \(\alpha = 0.0\) and each \(V(t_i)\) on its second step corresponds to an edge of maximum weight in the MST.
We performed experiments using 40 slice images from real MR images of the foot to segment the talus bone (Fig. 2) and 40 slice images from CT cervical spine studies of 10 subjects to segment the spinal-vertebra. We computed the mean accuracy curve of all the methods for different values of k (Fig. 3). For each value of k, we computed the Dice similarity coefficient between the ground truth and the best union of segmented regions leading to the object. Since the method by Felzenszwalb and Huttenlocher only provides indirect control over the number of generated regions, in our plot, we are showing for FH the mean number of regions obtained for each value of its input parameter. The results indicate that UOIFT requires a lower value of k compared to the other approaches to generate the talus bone and the spinal-vertebra for different values of \(\alpha \), due to its boundary polarity information, demonstrating the robustness of UOIFT.
Regarding the computational time, for an image of \(256\times 256\) pixels, to compute 625 superpixels by IFT-SLIC takes 203.4 ms and the final clustering into 300 regions by UOIFT in the RAG takes only 13.15 ms, in an Intel Core i3-5005U CPU @ 2.00 GHz\(\times 4\). As future work, we intend to extend UOIFT to consider more sophisticated predicates based on the following works [7, 10,11,12, 26].
References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Alexandre, E., Chowdhury, A., Falcão, A., Miranda, P.: IFT-SLIC: a general framework for superpixel generation based on simple linear iterative clustering and image foresting transform. In: 28th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 337–344 (2015)
Audigier, R., Lotufo, R.: Seed-relative segmentation robustness of watershed and fuzzy connectedness approaches. In: Proceedings of the XX Brazilian Symposium on Computer Graphics and Image Processing, pp. 61–68, October 2007
Carballido-Gamio, J., Belongie, S., Majumdar, S.: Normalized cuts in 3D for spinal MRI segmentation. IEEE Trans. Med. Imaging 23(1), 36–44 (2004)
Ccacyahuillca Bejar, H.H., Miranda, P.A.: Oriented relative fuzzy connectedness: theory, algorithms, and its applications in hybrid image segmentation methods. EURASIP J. Image Video Process. 2015(1), 21 (2015)
Ciesielski, K.C., Falcão, A.X., Miranda, P.A.V.: Path-value functions for which dijkstra’s algorithm returns optimal mapping. J. Math. Imaging Vis. 60(7), 1025–1036 (2018)
Cousty, J., Najman, L., Kenmochi, Y., Guimarães, S.: Hierarchical segmentations with graphs: quasi-flat zones, minimum spanning trees, and saliency maps. J. Math. Imaging Vis. 60(4), 479–502 (2018)
Falcão, A., Stolfi, J., Lotufo, R.: The image foresting transform: theory, algorithms, and applications. IEEE Trans. PAMI 26(1), 19–29 (2004)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Feng, W., Xiang, H., Zhu, Y.: An improved graph-based image segmentation algorithm and its GPU acceleration. In: 2011 Workshop on Digital Media and Digital Content Management, pp. 237–241, May 2011
Guimarães, S., Kenmochi, Y., Cousty, J., Jr., Z.P., Najman, L.: Hierarchizing graph-based image segmentation algorithms relying on region dissimilarity. Math. Morphol. Theory Appl. 2(1), 55–75 (2017)
Guimarães, S.J.F., Cousty, J., Kenmochi, Y., Najman, L.: A hierarchical image segmentation algorithm based on an observation scale. In: Gimelfarb, G. et al. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2012. Lecture Notes in Computer Science, vol. 7626. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_13
Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 725–739. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_47
Mansilla, L.A.C., Miranda, P.A.V., Cappabianco, F.A.M.: Oriented image foresting transform segmentation with connectivity constraints. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2554–2558, September 2016
Mansilla, L., Miranda, P.: Image segmentation by oriented image foresting transform: handling ties and colored images. In: 18th International Conference on Digital Signal Processing (DSP), pp. 1–6. IEEE, Santorini, July 2013
Mansilla, L.A.C., Miranda, P.A.V.: Image segmentation by oriented image foresting transform with geodesic star convexity. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013. LNCS, vol. 8047, pp. 572–579. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40261-6_69
Miranda, P., Mansilla, L.: Oriented image foresting transform segmentation by seed competition. IEEE Trans. Image Process. 23(1), 389–398 (2014)
de Moraes Braz, C., Miranda, P.A.V.: Image segmentation by image foresting transform with geodesic band constraints. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 4333–4337, October 2014
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Silva, A.G., Lotufo, R.D.A.: Efficient computation of new extinction values from extended component tree. Pattern Recogn. Lett. 32(1), 79–90 (2011)
Souza, R., Rittner, L., Machado, R., Lotufo, R.: iamxt: Max-tree toolbox for image processing and analysis. SoftwareX 6, 81–84 (2017)
Vargas-Muñoz, J.E., Chowdhury, A.S., Barreto-Alexandre, E., Galvão, F.L., Miranda, P.A.V., Falcão, A.X.: An iterative spanning forest framework for superpixel segmentation abs/1801.10041 (2018). http://arxiv.org/abs/1801.10041
Wang, S., Sinkind, J.: Image segmentation with ratio cut. IEEE Trans. Pattern Anal. Mach. Intell. 25(6), 675–690 (2003)
Wang, S., Siskind, J.: Image segmentation with minimum mean cut. In: International Conference on Computer Vision (ICCV), vol. 1, pp. 517–525 (2001)
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its applications to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101–1113 (1993)
Zhang, M., Alhajj, R.: Improving the graph-based image segmentation method. In: 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), pp. 617–624, November 2006
Acknowledgements
Thanks to CNPq (308985/2015-0, 486988/2013-9, FINEP1266/13), FAPESP (2014/12236-1, 2016/21591-5), NAP eScience and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) - Finance Code 001 for funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bejar, H.H.C., Mansilla, L.A.C., Miranda, P.A.V. (2019). Efficient Unsupervised Image Segmentation by Optimum Cuts in Graphs. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-13469-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)