Keywords

1 Introduction

Life on earth depends on plants and the life cycle of human beings is dependent on them as they have the ability to transform light into food [14]. Plants were initially organised as useful and not useful. This can be considered as the first method of plant classification. The science of botany was then created and its main objectives were to create rules for the classification of plants.

In the \(18^{th}\) century, Linnaeus [11] developed the systematic classification of plants based on plant morphology. Given the number of rules used in the systematic classification of plants, it is a difficult task even for a trained botanist, because it is possible to find two different plants species with almost the same physical appearance.

The difficulties encountered in manual classification have prompted the necessity of computer vision to automate the process. The classification of plants is important for botanists and environmentalists who are interested in obtaining an organised series of descriptors or features that describe plant structures. These can be used to determine the properties of plants, prevent their extinction and the spread of diseases caused by plant pollen [6]. Leaf features include texture, vein, shape and water retention capacity. These are important components to take into consideration during leaf studies.

This research proposes a leaf feature extraction model that combines the Convexity Measure with geometrical and morphological features to improve the classification rate of plant leaves.

The Convexity Measure of Polygons is a shape characteriser used to describe the overall structure of a given shape. The New Convexity Measure of Polygons was created to provide solutions to the problems observed when using the Convex Hull polygon when evaluating the convexity measure of a given shape. These problems were the detection of small variations on the shape and the calculation of the convexity measurements of shapes with holes.

The rest of the paper is organised as follows: Sect. 2 discusses some related works; Sect. 3 presents the methods and techniques used; experimental results and discussion are discussed in Sect. 4 and the conclusion and future work are presented in Sect. 5.

2 Related Work

Many approaches have been used to perform plant classification. The following sections explore and discuss the different approaches used in the literature.

2.1 Morphological Based Approach

Panagiotis et al. [12] proposed one of the morphological approaches used for plant classification based on leaf analysis. The goal of this approach was the design of a system that is able to extract specific morphological and geometrical features from a plant leaf. They further use fuzzy surface selection to select the relevant features. This approach reduces the dimensionality of the feature space leading to a very simplified model that is more adapted to real time classification application. The model obtained is scale and orientation invariant and yields to a system that achieved a classification rate of 99 %, even with deformed leaves. The main drawbacks of this approach are the size (less than 5 species) and the system is not translation invariant.

Stephen et al. [13] presented a method based on the combination of the Probalistic Neural Network (PNN) and image processing techniques to construct general purpose semi-automatic leaf recognition for accurate plant classification. This method achieved 90 % accuracy, and is good in running time. However, it is not rotation invariant and requires human intervention during the process of feature extraction.

2.2 Texture Based Approaches

Esma et al. [8] proposed a system based on Dendritic Cell Algorithms derived from the Danger Theory [4], as a classifier. The wavelet transformed is used to extract the leaf features for the classification algorithm. This approach achieved a classification accuracy of 94 %.

Ahmed et al. [3] developed an approach that combines texture features based on Discrete wavelet transformation with an entropy measurement to construct an efficient leaf identifier. An accuracy of 92 % was achieved. The main advantage of this approach is noise removal from the image background. The drawback of the method is the size of the data set as it is based on less than 10 species.

2.3 Hybrid Approaches

Anant et al. [5], presented an approach based on the shape features combined with texture features to generate a feature set that will be used by the nearest neighbour classifier for the classification process. The method achieved a classification rate of 91.5 % for 14 plant species.

3 Materials and Methods

In many applications it is always important to enhance the quality of the captured image before any real processing can start. Particularly, in leaf classification the process we used will be run using four steps - input, preprocessing, feature extraction and classification.

3.1 Image Preprocessing and Leaves Data Set

Figure 1 presents samples of the categories of leaves randomly selected for the experimentation. Figure 2 shows the transformations performed on leaf image before extracting features.

Fig. 1.
figure 1

Leaves selected for the classification

Fig. 2.
figure 2

Leaf image processing (Minimum Bounding Rectangle (MBR))

Image Preprocessing

  • All images are transformed from a colour image into a grey level image using Eq. (1). In fact, converting the image into grey level will preserve the shape of the leaf; thereby not impacting negatively on the end result.

    $$\begin{aligned} l=0.2989*R+0.5870*G+0.1140*B \end{aligned}$$
    (1)
  • Leaf boundary is extracted by applying the Sobel filter on the leaf binary image.

  • Thinning operation is then used to have leaf contours that are one pixel thick.

3.2 Features Extraction

The Convexity Measure of Polygons: A set of points A is said to be convex if the straight line segment joining any two points in A is contained in A [10]. The Convexity Measure of Polygons is a numerical value used to represent the probability that a straight line joining two points in A lies entirely in A.

The Convexity Measure of Polygons has the following properties as defined in [10]:

  • The value of the convexity measure is in (0,1].

  • For a given shape, the convexity measure can be arbitrarily close to 0.

  • The convexity measure of a convex set is equal to 1.

  • The convexity measure is invariant under similarity transformation.

In the literature there are two types of convexity measure: surface base convexity measures and boundary base convexity measure [10]. The first approaches for the determination of the convexity measure of polygon was based on the convex Hull polygon (CH). \(C_{1}\), \(C_{2}\) and \(C_{3}\) were defined as:

$$\begin{aligned} C_{1}=\frac{Area(S)}{Area(CH(S))}. \end{aligned}$$
(2)

\(C_{1}\) is a surface based convexity measure, obtained by dividing the area of the shape with the surface of the associate convex hull polygon.

$$\begin{aligned} C_{2}=\frac{Area(MCS(S))}{Area(S)}. \end{aligned}$$
(3)

\(C_{2}\) is a surface based convexity measure, obtained by dividing the area of the minimum convex set (MCS) of shape S with the surface of shape S.

$$\begin{aligned} C_{3}=\frac{Per(CH(S))}{Per(S)}. \end{aligned}$$
(4)

\(C_{3}\) is a boundary based convexity measure, obtained by dividing the perimeter of the convex hull of shape S with the perimeter of shape S.

The Convexity Measure of Polygons New Definition: The new definition of the Convexity Measure of Polygons introduced by J. Zunic et al. [10] was designed because of the incapacity of other convexity measure’s to include huge defects. In addition, it can evaluate small variations on a shape. It is the first element in the leaf feature vector. It is the only convexity measure used in this paper. The Convexity Measure of Polygons defined by J. Zunic et al. [10] is evaluated as:

$$\begin{aligned} C\left( P\right) =\underset{\alpha \in \left[ 0,2\pi \right] }{min}\frac{{Per}_2 \left( R\left( P,\alpha \right) \right) }{{Per}_1 \left( P,\alpha \right) }, \end{aligned}$$
(5)

where:

\(\alpha \)         = Rotation angle

P         = Shape Parameter

R         = The optimal rectangle

\(Per_{2}\) = Perimeter by projection on axis

\(Per_{1}\) = Euclidian perimeter

In this equation the perimeter of the polygon P is fixed and the perimeter of the bounding rectangle note R(\(P,\alpha \)) depends on the value of \(\alpha \). C(P) is equivalent to the following equation.

$$ C(P) =min \left\{ \begin{array}{l l} \frac{ Per_2 \left( R\left( P,\alpha _i\right) \right) }{ Per_1 \left( P,\alpha _i\right) } |~i=1,2,...,n\\ \end{array} \right\} $$

Where

\(Per_2(R(P,\alpha _i))=g_i*\cos (\alpha _i)+f_i*\sin (\alpha _i),\)

\(Per_1(P,\alpha _i)=c_j*\cos (\alpha _i)+d_j*\sin (\alpha _i).\)

\(g_i, f_i, c_j, d_j\) are the constants associated to the Euclidian length of the rectangle edge and the polygon edges.

This equation represent the computation process of C(P).

The Convexity Measure of Polygons (J. Zunic et al. [10]) in order to be used for shape characterisation has to be combined to other convexity measure to increase the recognition rate [10]. The Convexity Measure of Polygons only expresses how convex or concave a given shape is, but it is also important to have another descriptor for the surface.

The Seven Invariant Moments. Hu’s seven invariant moments are computed from the central moments. They are very useful for shape description and classification [9]. The discrete form of the geometrical moment of order \(p+q\) is defined as:

$$\begin{aligned} M_{pq}=\sum \limits _{x=1}^{N}\sum \limits _{y=1}^{M}x^py^q. \end{aligned}$$
(6)

where:

p,q            = 0,1,2,....

\(N \times M\) = the image size.

Consequently, a set of seven invariant moments \((Ph_{1},Ph_{2},...,Ph_{7})\) can be derived from the normalized central moments as in [13].

Geometrical Features. The rectangularity (R) represents the ratio between the leaf area (\(A_{leaf}\)) and the area of the minimum bounding rectangle. It evaluate how the leaf shape is close to a rectangle shape.

$$\begin{aligned} R=\frac{A_{leaf}}{D_{max}\times D_{min}} \end{aligned}$$
(7)

The aspect ratio (A) is the ratio between the maximum length (\(D_{max}\)) and the minimum length (\(D_{min}\)) of the minimum bounding rectangle

$$\begin{aligned} A=\frac{D_{max}}{D_{min}} \end{aligned}$$
(8)

The sphericity (S) is express by the following equation.

$$\begin{aligned} S=\frac{r_{i}}{r_{c}} \end{aligned}$$
(9)

where:

\(r_{i}\) = represents the radius of the in cycle of the leaf.

\(r_{c}\) = the radius of the ex-circle of the leaf.

The ratio between the length of the main inertia axis and the minor inertia axis of the leaf, determines the accent of the leaf. It evaluates how much an iconic section deviates from being circular [2].

$$\begin{aligned} E=\frac{E_{A}}{E_{B}} \end{aligned}$$
(10)

The circularity (C) is defined by all the contour points of the leaf image.

$$\begin{aligned} C=\frac{\mu _{R}}{\sigma _{R}} \end{aligned}$$
(11)

where

$$ \mu _{R}=\frac{1}{N}\displaystyle \sum _{i=0}^{N-1} ||(x_{i},y_{i})-(\bar{x},\bar{y})|| $$

and

$$ \sigma _{R}=\frac{1}{N}\displaystyle \sum _{i=0}^{N-1} (||(x_{i},y_{i})-(\bar{x},\bar{y})||-\mu _{R})^{2} $$

Form Factor (F) compares the perimeter of the equivalent circle to the perimeter of the leaf shape. It is also used to describe surface irregularity. It is given by the following equation:

$$\begin{aligned} F=\frac{4\pi A_{leaf}}{P_{leaf}^2} \end{aligned}$$
(12)

Area ratio of convex hull (CA) is defined as the ratio between the leaf area and the area of it’s associated convex Hull polygon (equivalent to the surface based convexity measure). It is expressed by the following equation:

$$\begin{aligned} CA=\frac{A_{C}}{A_{ROI}} \end{aligned}$$
(13)

4 Experimental Result and Discussion

4.1 Experimental Results

The experiments were conducted using FLAVIA. The leaf database is composed of more than 2500 plants leaves of more than 30 species [13]. We randomly chose 400 leaves of 20 species, 1600 leaves of 32 species and 100 leaves of 5 species (the 1600 leaves of 32 species represent all the available species in FLAVIA). The experiments are organized in two phases. First the leaves are characterized using the geometrical features and the seven invariants moments. Secondly the seven invariants moments and the geometrical features are combined to the New Convexity measure of polygones to characterized a leaf image.

Table 1 presents the accuracy of the proposed method with and without the Convexity Measure of Polygon (J. Zunic et al.) in the feature vector. In the first row with the Convexity Measure of Polygons, the Multi Layer Perceptron (MLP) achieved an average of 92 % of well classified leaves with an area under Received Operating Characteristics curve (ROC) equaled 0.993. Without the Convexity Measurement of Polygons, 86 % of well classified leaves with the area under the ROC curve equaled 0.98.

4.2 Discussion

We decided to organise our experiment using three processes to show how efficient our model is when used in various conditions. As Panagiotis et al. in [12] we used 100 images of leaves to illustrate how the proposed method is more efficient when applied to a small number of leaves. We obtained a classification rate of 99 % with the Convexity Measure of Polygons (J. Zunic et al.) in the feature set it shows that the Convexity Measure of Polygon contributed significantly in the discrimination process of leaf shapes even with a small dataset. We then applied the proposed method to a medium dataset which had 400 leaves. In this case, a classification rate of 92 % shows again how efficient the proposed method is when applied to a medium size dataset when the Convexity Measure of Polygons (J. Zunic et al.) is used. Finally, to complete our experiment we applied the proposed to a large dataset with 1600 leaves and the classification rate obtained was 96 %. This classification rate shows that the proposed method remains consistent even with a larger dataset. The classification rate and the Area Under the ROC Curve (AUC) clearly shows that the Convexity Measure of Polygons (J. Zunic et al.) contributes to the improvement of the classification rate and to the efficiency of the proposed model.

5 Comparative Study of the Proposed Method with the Available Method in the Literature

In Table 1 some methods of plant classification using leaves are presented. In Table 2 in the first row is - the authors name; the title of the article; the nature of the features; the number of leaves used for the experimentation; the number of species; the classification rate; the classification algorithm; drawbacks of the method and advantages. On the second row of Table 2, Jixian et al. [9] proposed leaf shape based plant species recognition. This approach is shape based as 400 leaves of 20 species of plant were used and a classification rate of 91 % was achieved with the MMC. This was after reduction of the feature space in order to obtain a fast classification method. In the last row the method use in this paper is present. This method is based on the introduction of the Convexity Measure of Polygons to boost the leaf recognition rate. 400 leaves of 20 different species are used for the experiment, and the MLP achieved a classification rate of 92 %. The drawback of this method is time taken to calculate the convexity measure (using J. Zunic et al. [10] definition) of a big leaf image shapes. Despite this, the method is very accurate and fast during the classification and achieved better results even with deformed leaves.

Table 1. Comparative study of classifiers
Table 2. Table of some of the methods of plants classification using leaves

6 Conclusion

This paper presented the classification of leaf images using shape analysis. Here the investigation of the use of the Convexity Measure of Polygons in the process of leaf shape characterisation is performed. The result obtained is a method with the following properties: rotation, translation and scale invariance. Experiments show that the use of the Convexity Measure of Polygons, combined with classical shape features used for leaf shape characterisation, increases the success rate of leaf shape classification. An average classification rate greater than 95 % was achieved using the multilayer perceptron for all the species in the FLAVIA data set. Good results were also obtained with other classifiers like KNN (90.5 %) and Naive Bayes (88 %). The proposed method outperforms some methods found in the literature. Proposed future works include the design of a new approach for the characterisation of the convexity of object shape inspired by the Convexity Measure of Polygons, as well as the analysis of the shape of colour images using the new design shape characteriser.