Keywords

1 Introduction

Image segmentation is the process in which the pixels in an image are clustered in homogeneous regions taking into account certain characteristics. In various areas of research, the segmentation of certain areas in an image has a great importance to study punctual zones on it, for this reason, various methods of segmentation have been proposed [17]. Some of these methods use mathematical morphology [22], and the applications of image segmentation range from thermography [21] to medicine [8], and recently green areas segmentation, defined as the zones covered by vegetation, has lately increased its interest as a research topic [5].

In image color segmentation numerous research works have been presented. For example, a color image segmentation using morphological clustering based on 2D-histograms was proposed by [11] having the disadvantage of fusing images that must be adjusted before this process. On the other hand, in [9] the authors have introduced a method that works with 2D-histogram multi-thresholding based on RGB color space (Red, Green, and Blue) and taking the three histograms RG, RB or BG in order to reduce the number of different values, and then fusing the resulting images. The main drawback of the method is the use of three histograms having many possible values for the fusing task. Another method uses orthogonal series to perform the image segmentation [24]; this is a fast algorithm but with the disadvantage of getting errors of segmentation in small zones. With regard to the vegetable zones detection, this subject has become a recurrent research topic. For instance, in [12], a work based on k-means clustering algorithm to detect individual trees in green areas, using the NDVI (Normalized Difference Vegetation Index), has been introduced. The disadvantage of this method is the use of a specific kind of camera. Another example is the classification of farmland images based on color features, that works with neural networks and different color spaces [16], this algorithm focuses only on green color zones; this algorithm focuses only on green color zones. Finally, the method used in [18] to detect green areas based on Hue threshold [4] has the disadvantage of being a semi-automatic method. As it was seen, color image segmentation and green areas detection have become a research topic of great interest. Although the methods aforementioned are functional, it is needed to develop an algorithm that works using any commercial camera, and regardless the difference in the green tone of vegetation, and with reliability.

In this paper, an automatic vegetable zone detection based on a color image segmentation method that uses bi-variable histograms of hue-based color spaces, [2] is proposed. The robustness of this method, in terms of variation of colors that vegetation can acquire during all seasons in the year, enables the detection of vegetable zones with accurate results.

2 Theoretical Framework

2.1 Color Spaces

A color space is a mathematical representation of a set of colors. The three most popular color models are RGB, YIQ (used in video systems), and CMYK (used in color printing). However, none of these color spaces is directly related to the intuitive notions of color; hue, saturation, and brightness. In fact, perceptual color spaces enable the simplification of programming, processing, and the end user manipulation. All aforementioned color spaces can be derived from the RGB information supplied by devices such as cameras [10].

The RGB color scheme encodes colors as combinations of the three primary colors: red, green, and blue. This scheme is widely used for transmission, representation, and storage of color images on both analogue devices such as television sets and digital devices. For this reason, many image processing and graphics programs use the RGB scheme as its internal representation for color images, and most language libraries use it as its standard image representation. RGB is an additive color system, which means that all colors are created by adding the primary colors. The RGB values are positive whose range is [0, Cmax], where \(Cmax= 255\) [10], it can be expressed with a cubic shape.

When humans see a color object, they tend to describe it by hue, saturation, and brightness. Hue is a color attribute that describes a pure color, whereas saturation gives a measure of the degree to which a pure color is vanished by white light. Hue is expressed as an angle around a hexagon that usually uses red as reference in 0. In Eq. 1 are shown the formulas to convert from RGB to HSV [20], where \(max= max(R,G,B)\) and \(min=min(R,G,B)\).

$$\begin{aligned} \begin{aligned} V= max \\ S= \frac{max-min}{max} \\ H = {\left\{ \begin{array}{ll} \frac{G-B}{max-min} &{} \quad \text {if } R= max\\ \frac{B-R}{max-min}+2 &{} \quad \text {if } G= max\\ \frac{R-G}{max-min}+4 &{} \quad \text {if } B= max\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(1)

HSI color space decouples the intensity component of the color information (hue and saturation) in an image. As a result, the HSI model is an ideal tool for developing image processing algorithms based on color descriptions that are natural and intuitive to humans [13]. Equation 2 shows the formulas to convert from RGB to HSI.

$$\begin{aligned} \begin{aligned} I= \left( \frac{1}{3}\right) (R+G+B) \\ S= 1 - \left( \frac{1}{(R+G+B)}\right) min(R,G,B) \\ H= \arccos {\left[ \frac{[\left( \frac{1}{2}\right) (R-G)-(R-B)]}{[(R-G)^2(R-B)+(G-B)]^{\frac{1}{2}}}\right] } \end{aligned} \end{aligned}$$
(2)

IHSL (Improved Hue, Saturation, and Luminance) is an improvement of HSL color space to overcome its limitations. There are two main advantages of the IHSL with respect to HSL. However, The most important problem are the instabilities that appear in saturation, not only in HSL color space but also in most of the other Hue-based color spaces. IHSL color space avoids these instabilities in saturation in high and low levels of luminance [1]. In Eq. 3 the formulas to convert from RGB to IHSL are illustrated as follows:

$$\begin{aligned} \begin{aligned} L= 0.2126R+ 0.7152G + 0.0722B \\ S= max- min \\ H = {\left\{ \begin{array}{ll} \frac{G-B}{max-min} &{} \quad \text {if } R= max\\ \frac{B-R}{max -min}+2 &{} \quad \text {if } G= max\\ \frac{R-G}{max-min}+4 &{} \quad \text {if } B= max\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(3)

2.2 Morphological Filter

A morphological filter \(\psi \) is an increasing and idempotent transformation in which the former characteristic implies that for two images f and g such that \(f \le g\), \(\psi (f) \le \psi (g)\), and the latter characteristic states \(\psi | \psi (f)|= \psi (f)\) for any image f. Opening (\(\gamma _{\lambda B}\)) and closing (\( \varphi _{\lambda B}\)) are the basic morphological filters by a structural element \(\lambda B\), where B is the basic structuring element that contains the origin and \(\lambda \) is a homothetic parameter. The structural element can have different forms. Both of these filters are expressed in Eq. 4, where \(\varepsilon _{\lambda B}\) and \(\delta _{\lambda B}\) are the erosion and the dilation, respectively, defined as \(\varepsilon _{\lambda B} (f)(x)= min\{f(y)\): \(y \epsilon \lambda B\}\), and \(\delta _{\lambda B} (f)(x)=max \{f(y)\): \( y\epsilon \lambda B \}\), and max and min are the maximum and minimum value [22]:

$$\begin{aligned} \begin{aligned} \gamma _{\lambda B}= \delta _{\lambda B} [\varepsilon _{\lambda B} (f)] \\ \varphi _{\lambda B} =\varepsilon _{\lambda B}[\delta _{\lambda B} (f)] \end{aligned} \end{aligned}$$
(4)

On the other hand, the opening (\(\hat{\gamma }_{\lambda B}\)) and closing (\(\hat{\varphi }_{\lambda B}\)) by reconstruction are developed using the geodesic dilation defined as \(\delta _{f}^1 (g)= f\wedge \delta _{B}(g)\) with \(f\ge g\) and the geodesic erosion defined as \(\varepsilon _{f}^1 (g)= f\wedge \varepsilon _{B} (g)\) with \(f\ge g\). These operators are applied until stability is reached in order to obtain the reconstruction (R) and the dual reconstruction \((R^*)\) respectively [19]. Both operators are shown in Eq. 5.

$$\begin{aligned} \begin{aligned} \hat{\gamma }_{\lambda B}= \lim _{x \rightarrow \infty }\delta _{f}^n[\varepsilon _{\lambda B} (f)]=R[f,\varepsilon _{\lambda B}(f)], \\ \hat{\varphi }_{\lambda B}= \lim _{x \rightarrow \infty }\varepsilon _{f}^n[\delta _{\lambda B} (f)]=R^*[f,\varepsilon _{\lambda B}(f)] \end{aligned} \end{aligned}$$
(5)

The alternating sequential filters are also increasing and idempotent transformations. They are formed by the composition of morphological openings and closings as shown in Eq. 6, with \(\lambda _{1}\le \lambda _{2}\le ...\le \lambda _{n}\) [19].

$$\begin{aligned} \begin{aligned} \psi _{n}(f)= \varphi _{\lambda _{n}}\gamma _{\lambda _{n}}...\varphi _{\lambda _{2}}\gamma _{\lambda _{2}}\varphi _{\lambda _{1}}\gamma _{\lambda _{1}}(f) \\ \psi _{n}^*(f)=\gamma _{\lambda _{n}}\varphi _{\lambda _{n}}...\gamma _{\lambda _{2}}\varphi _{\lambda _{2}}\gamma _{\lambda _{1}}\varphi _{\lambda _{1}}(f) \end{aligned} \end{aligned}$$
(6)

2.3 Watershed with Dynamics

This operator consists in the interpretation of an image as a topographic surface, where the gray level at a point indicates the height at that point. A flooding is simulated starting with the minima of the image. When the water coming from the near minima are close to contact, a dam is formed. The whole dams represent the watershed [2]. In order to select the main maxima, we use a morphological tool known as dynamics or contrast extinction value introduced by Grimaud [7]. This measure of contrast maps each maximum with a value given by its contrast. The contrast of a maximum is the minimum descent necessary to move from the maximum to another higher maximum; the contrast of the highest maximum is defined as the difference between the maximum and minimum of the function.

3 Metodology

The proposed methodology of the paper is shown in Fig. 1.

Fig. 1.
figure 1

Proposed methodology

Image i, shown in Fig. 3(a), is coded in RGB color space; therefore a color space conversion is realized by applying the Eqs. 1, 2 or 3. In this case, HSI color space is selected and each channel (\(H_{g_{hsi}}\),\(S_{g_{hsi}}\),\(I_{g_{hsi}}\)) is stored in individual images obtaining three grayscale images as shown in Fig. 3(b), (c) and (d). In Fig. 2 the proposed methodology for color segmentation is displayed.

Fig. 2.
figure 2

Color segmentation methodology

Once the color space conversion of the picture is carried out, chromatic histogram is computed with coordinate placement of values using the matrices of H as horizontal axis and S as vertical axis \(Hist_{hs}(H_{g_{hsi}},S_{g_{hsi}})= Hist_{hs}(H_{g_{hsi}},S_{g_{hsi}}) + 1\), taking value of H as rows and S as columns.

Fig. 3.
figure 3

Example image (a), and its chanels H (b), S (c) and I (d)

Since the values in \(Hist_{hs}\) could be higher than 255, that implies that it can not be represented in a RGB color image; therefore the Eq. 7 is computed by normalizing the range of values from 0 to 255 [2] in order to obtain the chromatic histogram shown in Fig. 4(a):

$$\begin{aligned} \ f_{hs}(x)=log\left[ \frac{Hist_{hs}(x)}{max(Hist_{hs}(x))} \right] \end{aligned}$$
(7)

The next step is to carry out the color segmentation seg(i) using the computed histogram. Since the histogram contains a high amount of isolated spots, each spot representing a color in the image i, is processed with morphological operators. The following sequence of transformations is applied: (a) First a closing \(\lambda = 2\) is applied, (b) the image is complemented, (c) an alternating filter by reconstruction \(AF_{R}= \hat{\varphi }_{2}\hat{\gamma }_{2} \hat{\varphi }_{1}\hat{\gamma }_{1}\) is then applied, (d) once the histogram is filtered, the dynamics are applied to compute the main minima, (e) then, the watershed is determined using the dynamics before computed, (f) finally, an alternating sequential filter is applied and the labelling of regions is computed. Each step is shown in Figs. 4(b), (c), (d), (e), (f) and (g) respectively.

Fig. 4.
figure 4

Chromatic histogram (a), closing \(\lambda = 2\) (b), complement (c), alternating sequential filter by reconstruction (d), dynamics of histogram (e), watershed (f) and slopes (g)

All preceding operators are applied in order to reduce the discontinuities of colors in the image. The labeling of regions in the watershed helps to obtain the color segmentation. This operation is defined by \(seg(x,y) = vrt(H_{g_{hsv}},S_{g_{hsv}})\). The computed image is shown in Fig. 5(a), then the false color image (\(F_{c}\)) is computed by taking at each value of gray as an average of colors in channels RGB in image i (see Fig. 5(b)). Observe that the number of colors has been reduced in the segmented image seg(i).

The detection of vegetation G(i) is done by using the hue channel (H) that has a range from 0 to 360\(^{\circ }\). This is made by assigning values of 1 to values of H in a range of 80\(^{\circ }\) to 160\(^{\circ }\). Using this criterion, colors out of the range are assigned as 0, thus a picture that keeps only green colors (in this case values in 1 are changed for pixels in image i). To reduce the noise a morphological reconstruction is applied R, then a closing by reconstruction \(\hat{\varphi }_{n}\) is used to close small holes in green areas. The result is shown in Fig. 5(c).

Fig. 5.
figure 5

Color segmentation (a), False color (b) and green areas detection (c) (Color figure online)

Finally, the automatic green areas detection is compared with manual segmentation m(i) in order to determine the quality of vegetation detection [14, 15]. This is performed by computing the Local Consistency Error (LCE) [23] Eq. 8 where N is the number of pixels of the image i, seg(i) is the automatic segmentation of i and m(i) corresponds to manual segmentation.

$$\begin{aligned} \ LCE(seg,m)= \frac{1}{N} \sum max(seg(i), m(i)) \end{aligned}$$
(8)

The value of LCE represents the coincidence between the automatic segmentation and the manual segmentation, and it ranges from 0 to 1 [0, 1], where 0 indicates no similitude between m(i) and seg(i), and as LCE gets closer to 1, the similitude increases until the perfect match, then if LCE is close to 1, the segmentation is more accurate.

4 Results

In order to validate this methodology, an aerial picture is used. The image in Fig. 6(a) of dimensions 2880 \(\times \) 1620 pixels is changed to HSI color space, as shown below. Figure 6(b) corresponds to channel H whereas Fig. 6(c) and (d) to channels S and I respectively.

Fig. 6.
figure 6

Sample image (a) and HSI channels (b), (c) and (d)

After the color space conversion is done, the chromatic histogram is computed, obtaining the image shown in Fig. 7. Next, the morphological operators mentioned above are computed in order to get an image without discontinuities of colors. In Fig. 7, one observes the chromatic histogram (Fig. 7(a)), the dynamic of minima (Fig. 7(b)), and the labeled slopes (Fig. 7(c)).

Fig. 7.
figure 7

Chromatic histogram (a), dynamic minimum of histogram (b), labelled slopes (c)

The segmentation is obtained by taking a value of the gray in vrt(i) (Fig. 8(a)) to its corresponding coordinate in \(seg(H_{g_{hsi}},S_{g_{hsi}})\). The segmented image and its false color seg(i) are shown in Fig. 8(a) and (b) respectively. It can be observed that this method groups green areas with a lower amount of green tones, enabling the easy detection of green areas. Once color segmentation and false color seg(i) are computed, green areas detection is applied as shown in Fig. 8(c) and (d).

Fig. 8.
figure 8

Image Segmentation (a) and its false color (b), green areas detection binary (c), green areas with actual pixel (d) (Color figure online)

In Fig. 8 one observes that the algorithm enables the segmentation of different tones presented in the vegetation of the image as shown in Fig. 6(a). To analyze the results of this case of study the LCE is computed in order to measure quantitatively the detection of vegetation using manual segmentation as reference [3]. In Fig. 9 the manual segmentation m(i) is shown. The computed value of LCE is 0.985, which means that green areas segmentation is accurate respect to manual segmentation, therefore, it is proved that this method is capable of segmenting green areas despite the differences of color in vegetation.

Fig. 9.
figure 9

Manual Segmentation

Fig. 10.
figure 10

Sample aerial images (a) shadows in the green areas, (b) non-uniform in color tone vegetation and (c) normal conditions vegetation (Color figure online)

In order to validate this method, three more cases with particular characteristics were selected from the rest of the sample images; Fig. 10 illustrates the sample images that were selected from the database of the present work. Different Hue-based color spaces (HSV, HSI, and IHSL) were tested with the proposed segmentation method and compared with k-means clustering segmentation [6] that is used by [12] to green areas segmentation. Also the method of Hue threshold [4] used in green areas by [18] has been tested to compare the results. LCE values are computed in each case and The results obtained are shown in Table 1.

Table 1. Results of LCE in proposed method using different Hue-based color spaces, k-means clustering and Hue modification

From the results shown in Table 1, one observes that in general, the proposed method offers better results that k-means clustering and false color in green areas segmentation (the best results are in bold). In the case of HSI and HSV color spaces, both of them have an average LCE with a small difference between each other (0.03); therefore both are useful color spaces that can be selected to apply this algorithm.

5 Conclusion

In the present work, an algorithm for green areas segmentation that uses hue-based color spaces has been presented. It has been shown that the algorithm is able to segment vegetation in a large range of colors grouping them into a single region. This characteristic is important since vegetation color could vary depending of the season in the year or weather. During LCE analysis, it was demonstrated that the results of the algorithm are close to manual segmentation, and provides better results than k-means clustering and false color when green areas are considered. The advantage of the functionality of this system is that an additional sensor or image acquisition system is not required, which represents a lower cost. This is why it can be applied only with the use of a standard camera.