1 Introduction

Optimal harvesting date and predicted yield are valuable information when farming open field tomatoes, making harvest planning and work at the processing plant much easier. Monitoring tomatoes during their early stages of growth is also interesting to assess plant stress or abnormal development. Satellite data and crop growth modeling are generally used for estimating the yield of a large region [10, 13]. However, satellite data are affected by adverse climatic conditions (clouds, etc.) resulting in inaccurate predictions [10]. Crop growth modeling, which integrates information regarding the cultivated plant, soil and weather conditions, considers the ideal case with no infected plant. Recent studies have concentrated on combining these two approaches [19]. Nevertheless these methods depend on the quality of the different parameters involved (vegetation indices, soil and weather information) and they are not accurate enough to detect abnormal development.

In this work, we present a different approach where we intend to monitor the growth of tomatoes and measure their size in an open field. For this purpose, two cameras are installed in the field and two images are captured at regular intervals. In order to avoid a complete 3D reconstruction, we assume that a tomato can be approximated by a sphere in the 3D space, which projects into an ellipse in the image plane. Hence, the first part of our system aims at detecting and segmenting the tomatoes in both images, using elliptic approximations. Then, the second part aims at estimating the sphere radius, using the camera parameters. An estimate of the yield is obtained from this information. In this paper, we focus on the segmentation procedure only.

Computer vision algorithms have been applied in the agricultural domain in order to replace human operators with an automated system. They have been used to grade and sort agricultural products [4, 7, 11], to detect weeds in a field [2, 9, 18], and to model the growth of fruits and then predict the yield [1, 14]. In [1], the yield of an apple orchard is estimated using only the density of flowers. In [14], only 5 images captured at different stages of the apple maturation are studied in order to predict the yield. These methods [1, 14] are limited to a controlled environment (apple orchards) where complex scenarios such as occlusions are not considered. Moreover, in [1] the observed scene is modified by placing a black cloth behind the tree in order to simplify the image processing tasks. However, to the best of our knowledge, there has not been any related work where the growth of a fruit or vegetable cultivated in open fields is studied based on the images captured during the entire agricultural season.

Since there is little growth of a tomato during a given day, only one image per day is analyzed in this work, thus creating a series of approximately 20–30 images. One of the difficulties of the segmentation part is occlusion: most of the tomatoes are partially hidden by other tomatoes and/or leaves (Fig. 1). Moreover, color information is not of much use as tomatoes are red only at the end of the ripening. Also, another difficulty is a very low contrast in some cases due to shadows.

Fig. 1.
figure 1

Two successive images of the tomato \(S=7\).

In this work, the segmentation should be as automatic as possible. However, we assume that an operator validates each obtained segmentation. If the result is poor, the operator rejects it. Indeed, given the difficulties, the segmentation is a very challenging task, and a manual validation is preferable. This approach enables us to use the segmentation done in the \(i^{th}\) image (if validated) as a reference for the segmentation of the same tomato in the \((i+1)^{th}\) image.

In order to segment the tomatoes, we use a parametric active contour model, which allows us to introduce a priori knowledge on the shape of the object to be segmented, thus making the segmentation more robust to noise and occlusion. Using an elliptic shape constraint is consistent with our prior assumption.

The main steps of the segmentation algorithm are as follows: first, gradient information is used in order to find the candidate contour points and propose several elliptic approximations using the RANSAC algorithm. Secondly, region information is added, enabling us to select the best ellipse for the initialization of the active contour and finding the regions of potential occlusions. Thirdly, the active contour with elliptic constraint is applied. Finally, four ellipse estimates are computed. The operator has only to select the best one as the final segmentation.

The original features of the proposed algorithm include the approximation of the tomatoes as ellipses and the conditioning of the computation of the image energy by the non-occluded regions. These features allow coping with occlusions and local loss of contour and edges.

We present the active contour model with shape constraint in Sect. 2 and the different steps of the segmentation algorithm in Sect. 3. Section 4 discusses the experimental results. A brief discussion on the second part of the system which aims at estimating the radius of the tomatoes is presented in Sect. 5. This paper extends our premilinary work in [15].

2 Active Contour with an Elliptical Shape Prior

Parametric active contour model or snake was originally introduced in [8] in order to detect a boundary of an object in an image. This algorithm deforms the contour iteratively from its initial position towards the edges of an object by minimizing an energy functional. The energy functional associated with the contour v is usually composed of three terms:

$$\begin{aligned} \mathbf E _{T}(v) = \mathbf E _{Int}(v) + \mathbf E _{Im}(v) +\mathbf E _{Ext}(v), \end{aligned}$$
(1)

where \(\mathbf E _{Int}(v)\) is the internal energy controlling the smoothness of the curve and \(\mathbf E _{Im}(v)\) is the energy derived from image data. The external energy \(\mathbf E _{Ext}(v)\) can express contextual information, such as shape information. The authors in [6] used Legendre moments to define an affine invariant shape prior in a region based active contour. In our case, the region information is significant but not stable enough due to the presence of leaves (occlusions) and other tomatoes of similar intensity profile. In [3], Fourier descriptors are used in order to align the active contour with the reference curve of suitable shape and orientation. In our work, the tomato in each image is assumed to have an ellipse shape, which is included as a constraint in a parametric active contour model.

Let us define the reference ellipse as \(z_{e}\). This ellipse is estimated from the evolving contour z. Both curves are expressed in polar coordinates with the origin at the center of \(z_{e}\):

$$\begin{aligned} z_{e}(\theta ) = r_{e}(\theta )e^{j\theta }, z(\theta ) = r(\theta )e^{j\theta }, \theta \in [0,2 \pi ]. \end{aligned}$$
(2)

Our energy functional with an elliptic shape regularization is defined as:

$$\begin{aligned} \mathbf E _{T}(r,r_{e})=\int _{0}^{2\pi } \frac{\alpha }{2} \vert r'(\theta ) \vert ^{2}\mathrm {d} \theta + \int _{0}^{2\pi }E_{Im} (r(\theta )e^{j\theta })\mathrm {d} \theta + \frac{\psi }{2}\int _{0}^{2\pi } \vert r(\theta )-r_{e}(\theta ) \vert ^{2} \mathrm {d}\theta . \end{aligned}$$
(3)

In the above equation, the first term represents the internal energy which controls the variations of r and makes it regular. The second term is a classical image energy calculated from the gradient vector flow [17]. The last term restricts the evolving contour to be close to the reference ellipse. The parameter \(\alpha \) controls the smoothness of the curve, and \(\psi \) controls the influence of the shape prior on the total energy. Note that instead of modifying a 2-D vector \(v(s) = (x(s),y(s))\) as in the classical active contour model, only a 1-D vector \(r(\theta )\) is modified for each value of the parameter \(\theta \). Moreover, the shape constraint makes the usual second derivative term in the internal energy useless, and is therefore not included in the proposed energy functional.

The minimum of \(\mathbf E _{T}\) is obtained in two steps: first, a least square estimate of the ellipse \(z_{e}\) is computed from the initial contour \(z_{0}\). Then, the evolving contour z is computed by minimizing \(\mathbf E _{T}\) while assuming \(z_{e}\) fixed. From the evolving contour z so obtained, the parameters of the least square estimate of the ellipse \(z_{e}\) are regularly updated. This two-step iterative process is repeated, in order to obtain the minimum of \(\mathbf E _{T}\).

The minimization of \(\mathbf E _{T}\) with respect to r is equivalent to solving the following Euler equation:

$$\begin{aligned} -\alpha r''(\theta )+ \nabla E_{Im}(\theta ) \cdot n(\theta ) + \psi (r(\theta )-r_{e}(\theta ))=0, \end{aligned}$$
(4)

where \(n(\theta ) = \left[ \cos \theta , \sin \theta \right] ^{T}\).

To find iteratively a solution of this equation, we introduce a time variable, and the resulting equation is discretized using finite differences, as in the case of the classical active contours.

3 Detailed Algorithm

In this section, we present an algorithm which allows us to follow the growth of a tomato, which has been manually segmented in the first image (\(i=1\)).

Let us denote by \(im^{i+1}\) the \((i+1)^{th}\) image of the tomato S. In the rest of this paper, an ellipse centered at [xcyc], whose semi major and minor axes lengths are a and b, respectively, and which has a rotation angle of \(\varphi \), is represented as \(Ell= [xc,yc,a,b,\varphi ]\). The tomato approximated by an ellipse in \(im^{i}\) is represented as \(Ell^{i}= [xc^{i},yc^{i},a^{i},b^{i},\varphi ^{i}]\). In our sequential approach, the computation of the contour in the \((i+1)^{th}\) image is based on both the information in \(im^{i+1}\) and the contour of the tomato in the \(i^{th}\) image. The temporal regularization (assuming little growth and movement of the tomato during a day) and the spatial regularization (tomato modeled as a sphere in the 3D space) are used throughout the segmentation procedure.

3.1 Pre-processing

As mentioned above, the color information is not of much use. However, the edges of tomatoes are more prominent in the red component of the image, and hence only this component is considered. The original image is cropped around the position \((xc^{i},yc^{i})\), resulting in a smaller image \((imS_{c}^{i+1})\). The contrast is enhanced by a contrast stretching transformation.

3.2 Updating the Tomato Position

Due to its increasing weight, the tomato tends to fall towards the ground (Fig. 2). Its position in \(imS^{i+1}_{c}\) is calculated using pattern matching. The bright areas, that may correspond to the tomato, are extracted by convolving the cropped image with a binary mask representing a white disk of radius \(\chi r^{i}\) where \(r^{i} = \frac{a^{i}+b^{i}}{2}\) and \(\chi \) is a constant determined empirically (\(\chi =1.25\) in our experiments). The local maxima \(C^{i+1}_{c}=\lbrace (x_{k},y_{k})\), \(k=1,...,k_{n} \rbrace \) are then extracted. From these \(k_{n}\) points, the one \(C_{m}=(x_{m},y_{m})\), which is the closest to \((xc^{i},yc^{i})\), is selected as the new location of the tomato center (Fig. 2(b)).

A new cropped image \(imS^{i+1} \) is then extracted from \(im^{i+1}\), centered at \(C_{m}=(x_{m},y_{m})\). The size of this new image is adapted to the size of the tomato (derived from \(a^{i}\) and \(b^{i}\)) so that we restrict the region to be analyzed as much as possible, thus reducing the computation cost of the next steps. The contrast stretching transformation is applied to \(imS^{i+1}\).

Fig. 2.
figure 2

Updating the position of the tomato: previous position \((xc^{i},yc^{i})\) in red, candidate positions \(C^{i+1}_{c}\) in magenta and blue, and new position \(C_{m}\) in blue (Color figure online).

3.3 Elliptic Approximations

In order to obtain an initial contour for the active contour model, we first compute \(l_{n}\) points which may lie on the boundary of the tomato. From these \(l_{n}\) points, a RANSAC estimate is used to obtain several candidate ellipses. Finally, one of these ellipses is selected as the initial contour based on additional region information and size regularization.

Let us take \(C_{m}\) as the origin of the polar coordinate system. Then we select \(l_{n}\) points \(P_{l}= p_{l}e^{j\theta _{l}}\), where \(l=1,...,l_{n}\), \(0<\theta _{l}<2 \pi \), that satisfy the following three conditions:

$$\begin{aligned}&0.5 r^{i}< p_{l}<1.5r^{i} \end{aligned}$$
(5)
$$\begin{aligned}&\vert \arg ( \nabla imS^{i+1}(P_{l})) -\theta _{l}\vert \le \frac{\pi }{8} \end{aligned}$$
(6)
$$\begin{aligned}&\vert \nabla imS^{i+1}(P_{l}) \vert > \eta \end{aligned}$$
(7)

where \(\nabla imS^{i+1}(P_{l})\) is the gradient at \(P_{l}\) in \(imS^{i+1}\) and \(\eta \) is a constant whose value is determined experimentally (\(\eta =0.2\)). The above conditions select the points of strong gradient whose direction is within an acceptable limit with respect to the vector normal to the circle with radius \(r^{i}\). The threshold values have been set experimentally. As shown in Fig. 4(a), most points lying on the boundary of the tomato have been correctly detected along with some additional points lying on the leaves.

A least square estimate of an ellipse calculated from all \(l_{n}\) points might result in a contour far away from the actual boundary because of the detection of irrelevant points. Therefore, we use a RANSAC [5] estimate based on an elliptic model in order to compute several candidate ellipses. Note that the spatial (tomato modeled as an ellipse) and temporal (parameters of the model) regularization has been used in this step to increase the robustness of the segmentation procedure.

Under normal circumstances, the size and the orientation of the tomato in \(imS^{i+1}\) are supposed to be close to the ones in \(imS^{i}\). This information is incorporated in the RANSAC estimation and only the ellipses whose parameters satisfy the following conditions are considered:

$$\begin{aligned}&-0.1 < \frac{a^{i+1}-a^{i}}{a^{i}} <0.2, -0.1 < \frac{b^{i+1}-b^{i}}{b^{i}} <0.2 \end{aligned}$$
(8)
$$\begin{aligned}&-0.1 < \frac{SA^{i+1}-SA^{i}}{SA^{i}} <0.25 \end{aligned}$$
(9)
$$\begin{aligned}&\vert \frac{Ecc^{i+1}-Ecc^{i}}{Ecc^{i}} \vert <0.1 \end{aligned}$$
(10)
$$\begin{aligned}&\vert \frac{\varphi ^{i+1}-\varphi ^{i}}{\varphi ^{i}} \vert <0.2 \end{aligned}$$
(11)

where \(SA^{i+1}\) and \(SA^{i}\) represent the surface of the ellipses in \(imS^{i+1}\) and \(imS^{i}\) respectively. The eccentricity (\(Ecc= \frac{a}{b}\)) for the two ellipses is denoted by \(Ecc^{i+1}\) and \(Ecc^{i}\) respectively.

Negative variations for a and b (Eq. 8) are possible because of the movement of the tomato with respect to the camera or because of the variation in the orientation, as tomatoes are actually not perfect spherical objects. Equation 9 restricts the apparent size of the tomato while Eq. 10 restricts the admissible values for eccentricity, thus controlling the apparent shape of the tomato.

The threshold values in Eqs. 811 have been determined after studying the parameters of the ellipses obtained from the manual segmentation of five tomatoes. For example, Fig. 3 shows the relative evolution of the length of semi-major axis a of the ellipses. Most of the measurements are situated within the limits defined above. Note that the dissymmetry in the lower and upper bounds in Eqs. 89 is due to the fact that tomatoes are supposed to grow during the agricultural season.

Fig. 3.
figure 3

Evolution of a. The abscissa represents the image number (i), and the ordinate represents \(\frac{a^{i+1}-a^{i}}{a^{i}}\). The solid horizontal red lines show the selected threshold values (Color figure online).

From the N ellipses computed using the RANSAC algorithm, a total of \(N_{a}\) ellipses, with \(N_{a}<N\), are retained, corresponding to the \(N_{a}\) ellipses with the largest number of inliers (Fig. 4(b)).

3.4 Adding Region Information

A region growing algorithm is applied in order to add region information and determine the best initialization for the active contour among the \(N_{a}\) ellipses \({Ell}^{i+1}_{u}\), where \(u=1,...,N_{a}\). Moreover, potential occlusions are also derived from this information.

Let us denote by \(\omega _{u}\) the binary image representing the region inside the ellipse \({Ell}^{i+1}_{u}\). We apply a classical region growing algorithm starting from \(\omega _{seed}\) and limiting the growing to \(\omega _{limit}\), where:

$$\begin{aligned} \omega _{seed} = \bigcap _{u=1}^{N_{a}} \omega _{u}, \; \; \;\omega _{limit}= \bigcup _{u=1}^{N_{a}}\omega _{u}. \end{aligned}$$
(12)

The final region is denoted by \(\omega _{t}\) (Fig. 4(c)).

We define \(\tau _{m}\) as:

$$\begin{aligned} \tau _{m} = \min _{u=1,2,....N_{a}} \tau (u), \end{aligned}$$
(13)

with

$$\begin{aligned} \tau (u)= \frac{ \; \vert \omega _{u} \cap (1-\omega _{t})\vert + \; \vert (1-\omega _{u}) \cap \omega _{t}\vert }{\vert \omega _{u} \cap \omega _{t}\vert } , \end{aligned}$$
(14)

where \(\vert A \vert \) represents the cardinality of a set A. The ratio \(\tau (u)\) measures the consistency between the segmentation obtained through the contour analysis \(\omega _{u}\) and the region analysis \(\omega _{t}\). It reaches a minimum (zero) when \(\omega _{u}\) and \(\omega _{t}\) match perfectly.

Let us denote by \(a_{u}^{i+1}\) and \(b^{i+1}_{u}\) the semi-axis lengths of the candidate ellipse \(Ell_{u}^{i+1},u\in [1,N_{a}]\). We select the ellipse v (Fig. 4(d)) that minimizes \(\left[ \left( a^{i+1}_{u}-a^{i})^{2}+(b^{i+1}_{u}-b^{i}\right) ^{2}\right] \) under the condition \(\tau (v) \le 1.1 \;\tau _{m}\). Thus, we have obtained the initial contour by combining the results obtained using two different segmentation methods, one based on boundary information and the other based on region information. The selected ellipse \(Ell_{v}^{i+1}\) is chosen among the ones for which both results are consistent, leading to a better robustness with respect to occlusions. Moreover, another regularization condition is added, which imposes that the size and shape of the ellipse in \(imS^{i+1}\) are close to the ones in \(imS^{i}\).

The next step aims at finding the regions where occlusions could disturb the behavior of the active contour. For example, the region in which the tomato is attached to the plant has a different intensity from the one of the tomato.

Let \({Ell}_{te}\) denote the ellipse which covers the convex hull of \(\omega _{t}\) and which minimizes the number of pixels inside the ellipse \({Ell}_{te}\) and not belonging to the region \(\omega _{t}\) (Fig. 4(e)). Let \(\omega _{te}\) be the region inside \({Ell}_{te}\). Then, the region of occlusion \(\omega _{oc}\) can be computed as \(\omega _{oc} =\omega _{te} \cap \omega _{t}^{c}\).

Using morphological operations (erosion followed by reconstruction by dilation), small regions are removed from \(\omega _{oc}\), so that the resulting \(\omega _{oc}\) corresponds to actual leaves causing the occlusions (Fig. 4(f)). Apart from detecting the “head” of the tomato, any other additional occlusion (mostly due to leaves) can also be detected using this approach (Fig. 4(f)).

Fig. 4.
figure 4

(a,b) Points of strong gradient and ellipses detected using the RANSAC estimate. (c,d) \(\omega _{t}\): region representing tomato, \(Ell_{v}^{i+1}\): selected initial ellipse. (e,f) \(Ell_{te}\): convex hull of \(\omega _{t}\) and region of potential occlusion \(\omega _{oc}\).

3.5 Applying Active Contours

The active contour (Sect. 2) is applied with the following initialization \({Ell}^{i+1}_{vc}=[xc^{i+1}_{v},yc^{i+1}_{v},0.95a^{i+1}_{v},0.95b^{i+1}_{v},\varphi ^{i+1}_{v}]\). Indeed, the movement of the curve z is smoother and faster if initialized inside the tomato. For the first \(n_{start}\) iterations, the parameter \(\psi \) is set to zero, so that z moves towards the most prominent contours. Then the shape constraint is introduced for \(n_{ellipse}\) iterations (\(\psi \ne 0\)) in order to guarantee robustness with respect to occlusion. Finally, the shape constraint is relaxed (\(\psi =0\)) for a few \(n_{end}\) iterations, which guarantees reaching the boundary more accurately, as a tomato is not a perfect ellipse.

Note that the image forces are not considered in the region of occlusion \(\omega _{oc}\), in every step of this process.

As explained in Sect. 2, the reference ellipse \(z_{e}\) is regularly updated, every \(n_{shape}\) iterations. A least square estimate calculated from all the points of the curve z is not relevant, because some of them may lie on false contours (e.g. leaves). So, the following algorithm aims at selecting a subset of points that actually lie on the boundary of the tomato.

We use a polar coordinate system with the origin at the center of the current reference ellipse \(z_{e}\). As in Sect. 2, let us denote by \(z(\theta )= r(\theta )e^{j\theta }\) a point of the evolving curve, \(z_{e}(\theta )= r_{e}(\theta )e^{j\theta }\) the corresponding point on the reference ellipse, \(n_{e}(\theta )\) the vector normal to the ellipse \(z_{e}\), and \(z_{q}(\theta )= r_{q}(\theta )e^{j\theta }\) the point that maximizes the gradient module for \(0< r_{q}(\theta )<1.1 r_{e}(\theta )\). The point \(z(\theta )\) is selected as a point lying on the boundary of the tomato if it satisfies the following conditions:

$$\begin{aligned} \vert \nabla imS^{i+1}(z(\theta )) \cdot n_{e}(\theta ) \vert > \varGamma \end{aligned}$$
(15)
$$\begin{aligned} \frac{\vert \nabla imS^{i+1}(z(\theta )) \cdot n_{e}(\theta ) \vert }{\vert \nabla imS^{i+1}(z(\theta )) \vert }> 0.75 \end{aligned}$$
(16)
$$\begin{aligned} d(z_{q}(\theta ),z(\theta )) <d_{max} \end{aligned}$$
(17)

where \(\cdot \) represents the vector dot product.

The first condition ensures that the magnitude of the gradient vector projected onto the normal of the ellipse is strong. The threshold \(\varGamma \) is determined automatically [12]. The second condition ensures that the direction of the gradient is close to the vector normal to the ellipse. The last condition (\(d_{max}= 2\) in our experiments) imposes that the considered point is a meaningful local maximum of the gradient. Finally, the parameters of the reference ellipse are updated by calculating a least square approximation from the subset of points lying on the evolving contour z selected using the above conditions (Fig. 5(a)).

3.6 Refining the Results

A least square estimate of an ellipse from z (Fig. 5(b)) is generally not relevant as outliers may be present due to occlusion. So, again, a selection procedure is applied. A first subset of points \(\mathbf P _{h}\) (Fig. 5(c)) is obtained by using criteria similar to the ones described in Sect. 3.5 (Eqs. 1517). Then, another subset \(\mathbf P '_{h}\) is computed by relaxing the condition related to the gradient direction (Fig. 5(d)).

Fig. 5.
figure 5

(a,b) Active contour with shape constraint. (c,d) Two different sets of points \(\mathbf P _{h}\) and \(\mathbf P _{h}'\). (e,f) Final ellipse estimates for two different images (Color figure online).

Then four ellipses are computed as follows:

  1. 1.

    A least square approximation \({Ell}^{i+1}_{f1}= [xc^{i+1}_{f1},yc^{i+1}_{f1},a^{i+1}_{f1},b^{i+1}_{f1},\varphi ^{i+1}_{f1}]\) is computed from all the points of z.

  2. 2.

    Another estimate \({Ell}^{i+1}_{f2}=[xc^{i+1}_{f2},yc^{i+1}_{f2},a^{i+1}_{f2},b^{i+1}_{f2},\varphi ^{i+1}_{f2}]\) is obtained from \(\mathbf P _{h}'\) using the RANSAC algorithm with the following conditions:

    $$\begin{aligned} 0.9a^{i+1}_{f1} < a^{i+1}_{f2} <1.1a^{i+1}_{f1} \end{aligned}$$
    (18)
    $$\begin{aligned} 0.9 b^{i+1}_{f1}< b^{i+1}_{f2}<1.1b^{i+1}_{f1} \end{aligned}$$
    (19)
  3. 3.

    A least square approximation \({Ell}^{i+1}_{f3}\) is obtained from the subset \(\mathbf P _{h}\).

  4. 4.

    A weighted least square estimate \({Ell}^{i+1}_{f4}\) is obtained where the points of \(\mathbf P _{h}\) are assigned a higher weight (0.75) and the other points of z a lower weight (0.25). This is done in order to give importance to the points that are surely on the boundary of the tomato.

If the images have a good contrast, and little or no occlusion, all the four ellipses will be almost identical (Fig. 5(e)). However, in case of occlusions and poor contrast, the four ellipses may be different (Fig. 5(f)), and the user selects the best one.

4 Results

Two cameras (Pentax Optio W80) were installed in an open field of tomatoes. The same setup was used for three agricultural seasons (April-August, 2011, 2012 and 2013). We have identified 21 tomatoes, covering different sites and different seasons, thus ensuring variability (614 images in total). The tomatoes were identified manually by observing the images of the entire agricultural season. Due to the severe occlusions, only a limited number of tomatoes were visible in most of the images of a given season. Therefore, only the tomatoes which were visible in more than 10 consecutive images were studied.

As discussed earlier, one of the main challenges of the segmentation is the occlusion and the poor quality of the images due to the poor illumination and/or shadow. Moreover, for the images acquired in the 2013 agricultural season (\(S=12,..,21\)) the size of the tomatoes was significantly smaller as compared to the one observed during the agricultural seasons in 2011 and 2012 (\(S=1,...,11\)). This is due to the variation in the external climatic conditions. Also, in some images, a shadow created by the leaves (or the tomato itself) can be observed (\(S=8,12\)). As a result, a portion of the contour is not clear and distinct. This results in an ambiguity on the position of the contour. Given this ambiguity, even a manual segmentation is a challenging task on this portion of the contour. Moreover, a blurred contour was observed in some images of some sequences (\(S=3,13,18,19,20\)), due to the presence of additional neighboring tomatoes in the background.

The data set contains images of varying contrast and degree of occlusions. Obviously, it is impossible to obtain a reliable segmentation, even manually, in case of severe occlusion. Consequently we studied experimentally the effect of the percentage of occlusion on the final estimation of the radius of a spherical object (considering the complete system, segmentation and partial 3D reconstruction). In our experiments, the percentage P of occlusion corresponds to the occlusion of an arc with subtended angle equal to \(\frac{2 \pi P}{100}\). For less than 30 % occlusion, the variation in the estimated radius was very small, and for more than 30 % occlusion, significant change in the values of the estimated radius was observed. Thus, we identified three different categories:

  • Category 1, containing images with an amount of occlusion P less than 30 % for which the estimation is very robust with respect to segmentation imprecision,

  • Category 2, with \(30\,\%< P <50\,\%\) which is more prone to segmentation error,

  • Category 3, with \(P>50\,\%\) for which it is impossible to perform a reliable segmentation.

The percentage of occlusion was determined manually by selecting the end points of the occluded elliptic arc. Note that the percentage of occlusion was computed only to evaluate the segmentation procedure, and this is not a part of the algorithm.

The obtained segmentations A were compared with the manual segmentations M (approximated by ellipses) by computing the average \(D_{mean}^{i}\) and maximal \(D_{max}^{i}\) distances between A and M for the \(i^{th}\) image (expressed in pixels). In order to better interpret the results, the maximum and mean distances between two contours are normalized by the size of the tomato as:

$$\begin{aligned} D_{meanR}^{i} = \frac{D_{mean}^{i}}{r^{i}}100, D_{maxR}^{i} = \frac{D_{max}^{i}}{r^{i}}100. \end{aligned}$$
(20)

For this project, \(D_{meanR}<10\,\%\) is considered as the acceptable limit of error in order to follow the growth of tomatoes.

For the images of category 1, good results (Table 1) were obtained even in the presence of occlusion by nearby leaves/branches and tomatoes (Figs. 6(a) and 6(b)). In some images captured at the beginning of the season, when the size is very small, the occlusion due to leaves present on the “head” of the tomato results in an ambiguity on the position of the actual contour (Fig. 6(c)). Also, in some images (Fig. 6(d)), due to a shadow effect on a portion of the contour, the intensity profiles of the tomato and the adjacent leaves are nearly identical, resulting in a very low contrast. Such cases may result in comparatively high distance measures even in the absence of any occlusion.

Fig. 6.
figure 6

Final segmentation \(Ell_{f4}\) (red) obtained on images of category 1, and values of (\(D_{meanR}\), \(D_{maxR}\)) with respect to the manual segmentation (in cyan) (Color figure online).

Fig. 7.
figure 7

Final segmentation \(Ell_{f4}\) (red) obtained on images of category 2 (Color figure online).

Due to the smaller size of tomatoes in sequences \(S=12,...,21\), higher distances were observed in these sequences (since the distances \(D_{meanR}^{i}\) and \(D_{maxR}^{i}\) are normalized by the size \(r^{i}\) of the tomato). For example, Fig. 6(e) shows the obtained segmentation \(Ell_{f4}\) on the \(3^{rd}\) image of sequence \(S = 17\). The distances normalized by the size of the tomato are \(D_{meanR}\) = 9.89 % and \(D_{maxR}\) = 26.60 %. However, the distances expressed in pixels are significantly lower (\(D_{mean}\) = 2, \(D_{max}\) = 5.38 pixels). For most of the sequences a low \(\mu _{D_{meanR}}\) along with lower \(\sigma _{D_{meanR}}\) demonstrates the robustness of the proposed method. However, for some sequences (\(S=13,17,18)\) higher values of \(\mu _{D_{meanR}}\) and \(\sigma _{D_{meanR}}\) were observed mainly due to the false detection of the position of the tomato (Sect. 3.2), or due to the small size of the tomato, as discussed previously. For the sequence \(S=11\), all the images suffer from poor contrast and noise due to the shadow created by leaves. As a result, even a manual segmentation is a challenging task in this sequence.

For the images of category 2 containing a significant amount of occlusion, \(D_{meanR}\) is significantly higher than for images of category 1. This is because of heavy occlusions along with the poor quality of images (effects due to shadow and/or presence of other tomatoes). However, in some sequences (\(S=1,2,5,6,8,9,15,18,20,21\)) an average \(D_{meanR}\) of less than 10 % was observed. This is because of the good contrast on the non-occluded arc in these images, which results in a good segmentation. Finally, good results were obtained on 73 % of the images (Fig. 7), where \(D_{meanR}<10\,\%\) even in the presence of severe occlusions.

Table 1. Mean (\(\mu \)) and standard deviation (\(\sigma \)) of \(D_{meanR}\) and \(D_{maxR}\) by comparing ellipse \(Ell_{f4}\) and \(Ell_{opt}\) with the manual segmentation M. Only the images belonging to category 1 (i.e. with a low amount of occlusion) have been considered.

In the results presented so far, \(Ell_{f4}\) was compared with the manual segmentation. However \(Ell_{f4}\) is not necessarily the best ellipse, and was selected here for illustrative purpose only. Due to the variation in the contrast and occlusion, there is not a single ellipse (among the four ellipse estimates) which represents a good segmentation for all the images. Let us denote by \(Ell_{opt}\) the ellipse, among the four ellipse estimates (\(Ell_{f1}\), \(Ell_{f2}\), \(Ell_{f3}\) and \(Ell_{f4}\)), for which \(\mu _{D_{meanR}}\) is minimum. Table 1 shows the distribution of \(D_{meanR}\) and \(D_{maxR}\) for ellipse \(Ell_{opt}\). It can be observed that the values of \(D_{meanR}\) and \(D_{maxR}\) for \(Ell_{opt}\) are lower than for those of \(Ell_{f4}\). The operator selects \(Ell_{opt}\) as the final segmentation.

5 Estimating the Size of the Tomato

From the obtained segmentation in both images and the camera parameters, we then estimate the size of the tomatoes. However, determining the image point pairs which correspond to the same point in the 3D space is a challenging task given the complexity of the scene. Instead we simplify the size estimation procedure by exploiting the spherical hypothesis.

The contour of the tomato is approximated by an ellipse whose parameters are calculated using the procedure presented above. Then, the sphere center in the 3D space is computed using triangulation from the centers of the ellipse calculated in both images. Next, the 3D space points situated on the contour of the tomato are computed using properties of projective geometry, independently from each image. Finally, a joint optimization procedure enables us to estimate the sphere radius.

In order to evaluate the size estimation procedure, the size of tomatoes observed in laboratory was measured. Since a tomato is not a perfect sphere, two reference values were measured manually and compared with the estimated radius of the sphere. For the manually segmented tomatoes observed in laboratory, we found that the relative percentage error between the largest of the reference value and the estimated radius was less than 5 % in 91 % of the cases. For the tomatoes cultivated in the open field, the relative percentage error was less than 10 % in 80 % of the cases [16]. The errors are mainly caused by the imperfect segmentation, due to shadowing effect and the poor quality of the images.

6 Conclusions

We presented a segmentation procedure used to monitor the growth of tomatoes from images acquired in an open field. Starting from an approximate computation of the position of the center of the tomato, segmentation algorithms based on contour and region information are proposed and combined, in order to determine a first estimate of the contour. Then, a parametric active contour with shape constraint is applied and four ellipse estimates representing the tomatoes are obtained. In all the steps of this process, a priori knowledge about the shape and the size of the tomatoes is modeled and incorporated as regularization terms, leading to a better robustness. It is supposed that the operator selects, at the end of the process for each image, the ellipse corresponding to the best elliptic estimation of the actual contour.

The segmentation of tomatoes is a challenging task due to the presence of occlusion and variation in contrast. In order to evaluate the robustness of the proposed algorithm, the entire image set was divided into three categories based on the amount of occlusion. For the images with an acceptable level of occlusion, good results were obtained on most (87 %) of images where \(D_{meanR}\) was less than 10 %. Also, the low standard deviation for \(D_{meanR}\) indicates the robustness of the proposed algorithm. Good results with \(D_{meanR}<10\,\%\) were obtained on 73 % of the images that contain a significant amount of occlusion.

For the moment, it has been assumed that an operator manually selects one ellipse as the final segmentation. In future work, we wish to provide automatically the best representation of the tomato. Also, in some images, the position of the tomato is not detected correctly due to the presence of other tomatoes nearby. This could be improved by updating the position of the tomato globally by considering also the movement of adjacent tomatoes. One possible improvement for the active contour model is to restrict the size of the reference ellipse, as there is little growth between two consecutive images.