1 Introduction

One of the greatest technological advances that humans have ever made has been the domestication of plants during the agricultural revolution 8 to 12,000 years ago at multiple sites around the world [5]. These and the following breakthroughs allowed for the evolution of human population to the current remarkable 7.7 billion peopleFootnote 1.

Such evolution has put colossal pressure on agriculture technology, that is nowadays under immense challenges due to infectious diseases and pests spread by the globalization and compounded by climate change [1].

Hence, major benefits can arise from more accurate and faster detection of plant diseases, which is being increasingly enabled by sensors that obtain real time information from crops plantation sites.

Such approaches, usually coined as smart farming since they constitute an evolution of traditional techniques, are also important to tackle the challenges of agricultural production in terms of productivity, environmental impact, food security and sustainability [7].

To tackle these challenges, more information on agricultural ecosystems is needed, which is being obtained through new technologies achieved by monitoring, measuring and continuously analyzing various physical aspects and phenomena. Such scenarios have become possible through the development of Internet of Things (IoT), allowing that remote and often autonomous sensors provide information, whether sensory as humidity and temperature, or imaging as snapshots taken by fixed cameras or drone-based imaging.

This setup permits the gathering of huge amounts of data over large geographical areas, thus originating problems that can be framed into big data scenarios, which have not yet been widely applied in agriculture [6].

A recent technique that can deal with such challenges is Deep Learning (DL) [10] that by using “deeper” neural networks better represents data by means of various convolutions, allowing larger learning capabilities and higher accuracy [7].

In this paper we propose a two-level deep learning hierarchical approach for plant disease detection. The first level starts to determine the crop in the image and the second level focuses on learning the specific diseases of that specific crop. Results show the overall advantages of the approach and allow for the application in real scenarios in which the crop is a given.

The rest of the paper is organized as follows. In the next section we will present the current state of the art approaches to plant disease detection with deep learning, including a brief introduction to existing deep learning implementation strategies. In Sect. 3 we will introduce the hierarchical approach for plant disease detection starting from a standard baseline approach. Section 4 will detail on the experimental setup, namely on the dataset used, the deep learning model setup and the performance metrics. Section 5 will present and discuss the results obtained, and in Sect. 6 final conclusions and future lines of research are pointed out.

2 Plant Disease Detection with Deep Learning

2.1 Deep Learning Strategies

Deep learning is extremely good for visual feature extraction from images, audio signals, or text, which makes it very attractive to be used today. Given that specific datasets of plant disease images can be easily processed by deep neural networks without the need to extract manually hand-crafted features, we posit that higher performance than traditional methods can be obtained from these models. In particular, Convolutional Neural Networks (CNN) (and its variants) are becoming the current state of art in many image processing problems.

Deep neural network architectures fall into the four main types (for a recent survey see [12]):

  1. 1.

    Stacked Auto-Encoder (SAE)

  2. 2.

    Convolutional Neural Networks (CNN)

  3. 3.

    Resctricted-Boltzmann Machines (RBM)

  4. 4.

    Deep Belief Networks (DBN)

Despite the diversity of models, improved learning algorithms and new application players are exponentially increasing.

Fig. 1.
figure 1

Convolutional neural network [16]

CNN are depicted in Fig. 1. Being biologically inspired variants of Multilayer perceptrons. From Hubel and Wiesel’s early work on the cat’s visual cortex [3, 4], we know that the visual cortex contains a complex disposition of cells, called receptive fields, that are sensitive to small regions of the visual field. These simple cells act as filters that respond to edge-like patterns and form complex hierarchies that are invariant to the position of the pattern. Being the visual cortex the most powerful visual processing system known, its behavior has inspired many models, for instance, the CNN.

2.2 Image-Based Approaches

Convolutional Neural Networks (CNN) are multilayer neural networks designed to recognize visual patterns directly from image pixels. The work by [9, 11] has been pioneer for the current CNN that are researched today. As a consequence, CNN have been catapulted to the center of object recognition research. The rekindled interest in CNN is largely attributed to [8] CNN model, that showed significantly higher image classification accuracy on the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Their success resulted from a model inspired by LeCun’s previous work and a few twists that enabled training with 1.2 million labeled images (e.g. GPU programming, max(x,0) rectifying non-linearities and dropout regularization). Likewise, the emergence of the giant processing power from parallel programming models developed by large HPC (High Performance Computing) teams working in the industry also contributed by a large amount to their success.

2.3 Disease Detection

Although it is still a developing research field, different successful approaches have been pursued as we will show in this section.

In [2] a deep-learning-based approach is presented to detect diseases and pests in tomato plants using images captured in-place. Three main families of DL detectors were considered and combined with additional information to effectively recognize nine different types of diseases and pests, with the ability to deal with complex scenarios from a plant’s surrounding area.

In [14] DL is applied to Cassava disease detection. Cassava is the third largest source of carbohydrates for human food in the world, hence any improvement in the disease control can prove precious specially in less developed countries. In this work, the authors have applied transfer learning to train a deep convolutional neural network to identify three diseases and two types of pest damage, achieving accuracy of up to 98%.

In [13] the authors take advantage of increasing global smartphone penetration and recent advances in computer vision made possible by deep learning to propose smartphone-assisted disease diagnosis. Using a deep convolutional neural network to identify 14 crop species and 26 diseases, the trained model achieves an accuracy of 99.35% on a held-out test set, demonstrating the feasibility of this approach.

In [7] it is possible to find a survey on deep learning in agriculture, giving a broad view of the wide range of applications and methods that are being pursued.

3 Hierarchical Approach for Plant Disease Detection

To introduce the proposed approach, we will firstly define the problem of plant disease detection using images. It can be stated as follows.

Consider a set I of N images of C agricultural crops. For each crop c there are different target classes, i.e it is a multiclass problem. Those classes include healthy crops and specific diseased crops.

The goal can then be stated as, given a set of images I,

$$\begin{aligned} I = \{i_1,i_2,\ldots ,i_N\}, \end{aligned}$$
(1)

identify, for each image \(i_n\) the target class \(T_d^c\), i.e. make the correspondence of each image \(i_n\) to a specific disease d of crop c or just the correspondence to a healthy state.

Fig. 2.
figure 2

Standard baseline deep learning flat approach.

3.1 Standard Baseline Approach

The standard baseline approach is depicted in Fig. 2 and consists of building a model that uses all images and detects for each input simultaneously the crop and the disease. This approach is the typical deep learning approach in which all preprocessing efforts are supported by the neural network and has usually good results.

However, in plant disease detection, specially in approaches based in the crop/disease pair, difficulties can arise for instance due to the same disease in different crops, or different diseases producing similar damaged patterns but in different crops and hence a more structured approach may become appropriate.

Fig. 3.
figure 3

Hierarchical deep learning approach.

3.2 Hierarchical Approach

Figure 3 shows the proposed two-step hierarchical deep learning approach.

The rationale is to divide the problem of crop disease detection in two subproblems. The first (STEP 1) consists in identifying the crop and the second (STEP 2) consists in determining for a specific crop which, if any, disease in the dataset is identified/recognized. Such division may prove reasonable for various reasons and applications.

It also allows for the reuse of the models in STEP 2 in scenarios where we have plantations of only one crop and the goal is to detect diseases in that specific crop.

4 Experimental Setup

4.1 Dataset

PlantVillage is an online platform dedicated to crop health and crop diseases (http://www.plantvillage.org) with more than 2 million site visits reported [5]. It started as an online crowdsourced forum where users could put questions and receive answers from other users. Additionally, a library of open access information of images of over 150 crops and over 1,800 diseases was also constructed and curated by plant pathology experts.

Fig. 4.
figure 4

Example images of some classes in the dataset: (a) Apple healthy; (b) Peach healthy; (c) Tomato healthy; (d) Apple black rot; (e) Peach bacterial spot; and, (f) Tomato leaf mold.

In this work, we have constructed a dataset using the PlantVillage images to test the proposed hierarchical approach for plant disease detection. We have focused on three major different agricultural crops: apple, peach and tomato. Figure 4 shows example images of the classes in the dataset, namely three examples of healthy apple, peach, and tomato crops, and three examples of diseases from the same crops.

Table 1 presents the classes associated with these three crops, consisting in 16 classes and nearly 24,000 image examples. We have used 70% of each class for training purposes and 30% for testing. Notice that both classes per crop and examples per class are heterogeneous.

Table 1. Dataset classes and training/testing sets

4.2 Deep Learning Models

The evaluation of our approach was performed with a fine-tuned InceptionV3 CNN proposed in [17]. In an Inception CNN, the input is split into a few lower-dimensional embeddings, transformed by a set of specialized filters, and merged by concatenation. The solution space of this architecture is a strict subspace of the solution space of a single large layer operating on a high-dimensional embedding, and this it is expected the representational power of large and dense layers, but at a considerably lower computational complexity [18].

We have used the implementation of the InceptionV3 distributed in KerasFootnote 2 with the TensorFlowFootnote 3 backend. The InceptionV3 was initialized with weights pre-trained on the ImageNet dataset, and fine-tuned to transfer learning.

We have empirically defined our experimental configurations, and thus all CNN run for 30 epochs and 50 steps each. They consistently converged to a better accuracy performance.

The main goal was to compare the performance of the standard and hierarchical approaches using identical models as base classifiers, not targeting improvements in the classification performance of the models.

4.3 Evaluation Metrics

In this setup a multiclass single label problem is defined, i.e., each image belongs to just one of the specified classes. One of the most common evaluation strategies is to calculate the accuracy of a classifier, i.e., the percentage of image instances that are correctly assigned to their class.

We can also simplify our problem into a binary decision problem, as each image can be classified as being in a given class, or not. In order to evaluate the binary decision task, a contingency matrix can be defined to represent the possible outcomes of the classification, as shown in Table 2.

Table 2. Contingency table for binary classification
Table 3. Standard and hierarchical results for each crop/disease pair

Traditional measures can be defined based on this contingency table, such as error rate (\(\frac{b+c}{a+b+c+d}\)) and the above-mentioned accuracy (\(\frac{a+d}{a+b+c+d}\)). However, for unbalanced problems, i.e., problems where the number of positive examples is rather different among classes, or in the case of a binary problem, the number of positive examples is rather different from the negative examples more specific measures should be defined to capture the performance of each model. Typical examples include recall (\(R=\frac{a}{a+c}\)) and precision (\(P=\frac{a}{a+b}\)). Additionally, combined measures that give a more holistic view of performance in just one value, like the van Rijsbergen \(F_1\) measure [15], which combines recall and precision:

$$\begin{aligned} F_1=\frac{2\times P \times R}{P+R}. \end{aligned}$$
(2)

Two conventional methods are widely used in multiclass scenarios, namely macro-averaging and micro-averaging. Macro-averaged performance scores are obtained by computing the scores for each class and then averaging these scores to obtain the global means. Differently, micro-averaged performance scores are computed by summing all the previously introduced contingency matrix values (a, b, c and d), and then use the sum of these values to compute a single micro-averaged performance score that represents the global score.

5 Experimental Results and Analysis

Table 3 summarises the performance results obtained by classifying the testing set, with the macro-averaged performance measures, namely recall, precision, F1, and accuracy. Each row refers to one of the 16 classes considered. Results are presented both for the standard flat approach and for the hierarchical approach.

A global analysis of macro-average shows the potential benefits of the proposed method. Specifically in classes where the flat model underperforms in terms of recall and precision, the hierarchical approach shows significant improvement. Notice that the accuracy often used as the only metric, in fact offers a rather biased notion of the performance obtained.

Table 4 shows the macro-averaged performance measures for the three crops in STEP 1: Apple, Peach, and Tomato. This remarkably easier problem exhibits an almost perfect result. This achievement allows for the improvement that STEP 2 presents over the flat approach.

Table 4. Hierarchical approach STEP 1 results

6 Conclusions and Future Work

In this paper we have proposed a two-level deep learning hierarchical approach for plant disease detection. The first level starts to determine the crop in the image and the second level focuses on learning the specific disease of that specific crop. Results show the overall advantages of the approach and are expected to be used in real scenarios applications in which the crop is given.

Results show the effectiveness of the proposed hierarchical approach in terms of overall performance. When targeting specific crop diseases such approach can bring significant advantages, when considering other measures than accuracy that only provides a partial representation of the resulting performance.

Future work is foreseen on further exploring the hierarchical nature of the proposal and additional research on different deep architectures.