Keywords

1 Introduction and Related Work

Diatoms are a type of microscopic algae or plankton called phytoplankton, divided into more than twenty thousand species. They are used as paleoenvironmental indicators since the presence of certain diatom’s genres indicates water purity or contamination, along with the presence of fecal matter, among others. Additionally, diatoms may be used to make historical environmental estimates of water sources, through the abundance or scarcity of some diatom individuals in water sources, such as studying of fossil deposits in lake sediments. Also, environmental variables that have been affected or dominated in the past can be tracked and estimated by identifying the present of diatoms in the source to be analysed [1]. Variations in temperature, pH or conductivity over centuries may be estimated by studying diatoms in sediments, allowing to know how climate has affected a studied area, along with to state baseline conditions from which it is possible to define a set of criteria to determine quality of water, and establish parameters by environmental regulatory bodies of some governments.

Currently, diatomists visually identify those microscopical structures from a given sample in a microscope. Visual identification of diatoms is a task mainly based on subjective with limited repeatability and requires inter-observer agreements [3, 4]. However, images of different sections of water samples can be obtained connecting a camera to a microscope. Different methods for diatom identification have been studied. Identification methods based on coherent optics and holography have been also proposed. However, these methods have a high computational cost and have not been adopted as an alternative to support biologists. The use of operators invariant to translation, rotation and scale, as well as Legendre Polynomials and Principal Component Analysis have been used to identify specific genera of diatoms [6, 7]. Rojas Camacho et al. [5] studied the use of a tuning method to set up the best parameters iteratively, as an optimisation problem, comparing the current result with the last result, and then validated them with Canny edge detector and a binarisation technique.

Although segmentation of structures, like diatoms, is the first step in any investigation, computer science applied to the diatoms field is focused on the classification of species. The Automatic Diatom Identification and Classification (ADIAC) project is a reference in the investigation of diatoms analysis systems [8]. In ADIAC, 171 features were used for diatom classification, using features to describe symmetry, shape, geometry and texture by the means of different descriptors. Dimitrovski et al. argue that, in ADIAC image data set, the SIFT descriptors have better results that the use of Support Vector Machines (SVM). The best results, up to 97.97% accuracy, have been obtained with 38 classes using Fourier and SIFT descriptors with a random forest classifier. Alvarez et al. [11] proposed a method to classify diatoms using Learning Vector Quantization (LVQ) neural network. According with Hawickhorst et al. [12], the use of LVQ allows lower training time that networks based on a training with back-propagation. However, if it is necessary to include more hidden units, the LVQ network will take more time. Approaches as [3, 4] are based on hand-crafted or “hand-designed” methods where a set of fixed features is used. However, hand-crafted methods present limited results as in [9], where 14 classes were classified with SVMs, 10 fcv, using 44 GLCM features that describe geometric and morphological properties. They obtained an accuracy of 94.7%.

In this paper, an automated method for identification and classification of diatom from images is presented. The proposed method is based on the combination of Scale and Curvature Invariant Ridge Detector (henceforth SCIRD-TS) [2] followed by a post processing, and the use of nested Convolutional Neural Networks (CNN). An experimental evaluation is conducted using the F-Score for assessing results, using a ground truth images set. Our approach is able to segment well-defined ridge structures or Regions of Interest (henceforth, RoIs) and the nested CNN is able to classify those RoIs that have been previously segmented in an image of a water sample. The first CNN allows to discard those RoIs from well-defined structures, but which correspond to undesired elements (debris, flocs, etc.), and a second CNN classifies those RoIs containing diatoms into genera to which they belong to.

2 Identification and Classification of Diatoms

The diatoms identification method has two phases: the first phase is focused on segmentation of objects present in images, called RoIs, and the second phase is focused on identification of diatoms by classifying those RoIs depending on whether a RoI corresponds to diatom or not. Whilst the classification is done using identified RoIs as diatoms for classifying them into genera.

2.1 RoIs Segmentation

The segmentation of RoIs is based on SCIRD-TS, which is presented as a filter bank in the application domain of retinal images and it is able to identify thin structures [2]. SCIRD-TS filter bank is adapted and tests on a set of diatom images, using the implementation available at the author’s web-page. SCIRD-TS filter bank, by Annunziata [2], is defined as:

$$\begin{aligned} F(x;\sigma ;k)=\frac{1}{\sigma _{2}^2}\left[ \frac{(x_{2}+kx_{1}^2)^2}{\sigma _{2}^2}-1 \right] exp \left[ {-\frac{x_{1}^2}{2\sigma _{1}^2}-\frac{(x_{2}+kx_{1}^2)^2}{2\sigma _{2}^2}}\right] , \end{aligned}$$
(1)

where \( (x_{1}, x_{2})\) represents a point in the image coordinate system, k is a shape parameter and \( \sigma = (\sigma _{1},\sigma _{2}) \) corresponds to standard deviations in the Gaussian distribution, in each coordinate direction, and k, \(\sigma _{1}\) and \(\sigma _{2}\) are parameters provided by a user. Since quality of water sample images may vary, two segmentation methods are presented: Method 1: it is proposed for images with high luminosity, large diatoms size, fluorescence conditions, debris concentration of large size and low noise levels, along with diatoms have high relief. It is based on the application of SCIRD-TS with the following parameters: \(\sigma _{1}=\left[ 1 , 2 \right] \) with step 1, \(\sigma _{2}=\left[ 1 , 2 \right] \) with step 1; \(k=\left[ -0.1 , 0.1 \right] \) with step 0.1 and \(\theta _{step}=15\) and a post-processing with fixed threshold, morphological operations and filtering based on area, under the assumption that flocs are of small size, due to low noise levels. Method 2: it is proposed for images with high noise levels—caused by large load of particles, fragments and flocs of organic matter—, and low signal-to-noise ratio. It is based on a difference of Gaussians that is calculated by subtracting a resulted image after a single application of SCIRD-TS and a resulted image after a double application of SCIRD-TS. The first image is obtained using the set of parameters: \(\sigma _{1}=\left[ 1 , 2 \right] \) with step 1, \(\sigma _{2}=\left[ 1 , 2 \right] \) with step 1; \(k=\left[ -0.1 , 0.1 \right] \) with step 0.05 and \(\theta _{step}=15\); and the second image is obtained using the set of parameters—with a variation on \(\sigma _{2} \)—: \(\sigma _{1}=\left[ 1 , 2 \right] \) with step 1, \(\sigma _{2}=\left[ 1 , 11 \right] \) with step 3; \(k=\left[ -0.1 , 0.1 \right] \) with step 0.05 and \(\theta _{step}=15\). Since images have high presence of fluff and dust, the first image has higher intensities than the second one. Subtracting two Gaussian blurs allows to keep the spatial information conserved in the two blurred images, which is assumed to be the desired information [10]. That means to purge dust and fluff. After the difference of Gaussians, an adaptative threshold is applied following by morphological operations and filtering objects by area. Figure 1 illustrates results obtained during the different steps of the two methods.

Fig. 1.
figure 1

Illustration of the two proposed methods: images, in the first and the second columns, have characteristics for using the Method 1; and the image, in the third column, has characteristics for using the Method 2. Up to down: The first row illustrates obtained results after applying SCIRD-TS, e.. The second row illustrates the difference of two Gaussians using SCIRD-TS. The third row presents obtained results after applying the threshold to SCIRD-TS images using the Method 1 (images g. and h.) with a fixed threshold and the Method 2 (image i) with local adaptive thresholds. The fourth row shows obtained results from morphological operations, the size of the structural elements vary between the two methods. The fifth row shows identified diatoms.

Fig. 2.
figure 2

Illustration of classification results using three nested CNN models, from left to right and from top to bottom, respectively: a. Alexnet: diatom, nitzscia; diatom, noidentified; diatom, nitzscia; diatom, noidentified. b. GoogLeNet: diatom, cocconeis; diatom, noint; diatom, cocconeis; diatom, nitzscia. c. ResNet: diatom, goomphonema; diatom, goomphonema; diatom, goomphonema; diatom, goomphonema.

2.2 Classification of Diatoms

After segmented RoIs, three nested Convolutional Neural Networks (henceforth, CNNs) are used to classify them into diatom and non-diatom. AlexNet, GoogLeNet and ResNet are the best known, commonly used for recognition of objects, such as animals, people and equipment, or for recognition of specialised objects through transfer learning techniques. Using a fine-tuning technique, a pre-trained CNN model is taken and modified some layers to recalculate parameters in order to learn about training images in the problem that is addressed. A nested CNN consists of a first network that allows discarding those unwanted elements that have been segmented in the segmentation phase (background and debris). RoIs classified as diatoms are taken into a second network, where they are classified according to genera to which they belong. Figure 2 shows the classification results obtained using three nested CNN models.

3 Experimental Results

The experimental evaluation is performed using two groups of images, according to the previously defined methods. Hence, the first group is composed of 96 images, obtained with a microscope Nikon Eclipse Ni-U90 and the second one is composed of 269 images, obtained with a microscope Nikon E200. The ground truth consists of the 365 images of the two groups aforementioned, with labelled regions indicating the specimens, by experts. CNN models are trained using 16,000 segmented RoIs that contain diatoms, background and debris are used. CNNs are trained with MATLAB©Deep Learning Toolbox™. The performance of the proposed segmentation methods is evaluated using two levels of quantitative strategies: pixel and diatom identification, and measure with F-Score. Table 1 shows the results in terms of pixels correctly identified and at the diatom identification level. The experimental results indicate that the Method 1 yields higher F-Scores at pixel and at diatom levels than the Method 2, whist the Method 2 has higher accuracy than the Method 1 at pixel level.

Table 1. Error analysis of segmentation results using G1 and G2, that symbolise the group 1 and the group 2 of images.

Classification tests using three nested CNN models were done using as input the RoIs obtained in the segmentation phase. Table 2 shows the architecture per network with the respective error analysis. Among the three nested CNN, AlexNet shown the best performance.

Table 2. Classification error analysis at the diatom identification level using the three CNNs. G1 and G2 symbolise the group 1 and the group 2 of images.

4 Final Remarks

We proposed a method for automatically identify and classify diatoms. The method combines SCIRD-TS hand-crafted filter banks with a post-processing, in two different ways depending on specific image characteristics, in order to identify RoIs. We reckon that combining detection of structures and a post-processing strategy to detect potential regions of interest, may lead to a substantial speed-up of diatom segmentation, since a post-processing allows filtering unwanted elements. Although, morphological operations and filters remove flocs of small sizes, there remain regions with flocs of large size. Those flocs cannot be removed by the above mentioned operations, because wanted structures, such as diatoms, may be affected and they may have even smaller size of unwanted structures.

Well-known CNN models were tested for classifying RoIs into diatoms and unwanted elements, such as debris or flocs. Once diatoms are identified from RoIs, a second CNN is used for classifying those diatoms into genera. AlexNet has shown the best performance among the three evaluated networks. In general, the first network, in the nested CNN models, has had a good performance which can be improved in a future work. This indicates that the proposal meets with the objective of discarding those RoIs that are not desired. It is possible that some of those RoIs have no justification in being discarded, which contributes to false negatives. We notice that the performance of second network, used to classify identified diatoms into different genera, goes down. This allows to set a horizon of improvement of the networks. It appears to be very important to maintain a balance among training images by class. While the first network has an acceptable level of balance (more than 16,000 diatom training images), the imbalance of the second network’s classes is large. This is due to a scarce image bank, which makes it necessary to have a larger set of images per genus, especially with those genera with a limited number of individuals. In addition, there is a lot of work in trying other ways to increase the data, enhancing different characteristics to be learned by a network.