1 Introduction

Clinical microbiology is tasked with providing diagnosis and treatment of infectious diseases. The ability to achieve accurate diagnoses in standardized and reproducible conditions is of utmost importance in order to provide appropriate and fast treatment. The gold standard for bacteria identification in the workflow of Clinical Microbiology Laboratories (CML) is bacteria culturing on agar plates. Since traditionally performed almost totally manually, this requires labor-intensive pre-analytical phases with critical aspects arising with respect to both intra- and inter-laboratory repeatability. Nevertheless, new groundbreaking trends related to the recent diffusion of Full Laboratory Automation (FLA) systems started deeply changing working habits in many clinical microbiology facilities worldwide [1, 2]. A single FLA plant is able to process even thousands of plates per day, generating huge flows of high-resolution digital images (taken during and after plate incubation) to be read on diagnostic workstations. As a consequence, the new field of Digital Microbiology Imaging (DMI) involves high expectations related to the solution of a variety of visual interpretation challenges aimed at supporting and improving the accuracy and speed of the clinical procedures and decisions in CMLs. In this work we focus on the automated detection of one of the main diagnostically relevant features for the assessment of human infections, that is the identification of phatogens’ \(\beta \)-hemolytic activity.

1.1 Problem Definition

\(\beta \)-hemolysis is an effect caused by certain bacteria species growing on blood agar plates that leads to the dissolution of the blood substrate surrounding the colony. The produced visual effect is a yellowish halo visible by holding the plate against the light [3] or on back-lit images acquired by FLA systems under proper plate illumination settings. In many clinical microbiology protocols, \(\beta \)-hemolysis has high impact and it is the (almost) first step of a chain that needs a high sensitivity. This is true especially in throat swabs culture, and when it is important to address streptococci. Moreover, there is a diagnostically relevant information about virulence (see for instance E. Coli [4]) that is promptly available from hemolytic activity assessment which is not possible or difficult to achieve by other diagnostic procedures. However, to accurately distinguish \(\beta \)-hemolysis with the naked eye in all possible manifestations is difficult even for a skilled microbiologist. This requires caution and experience and, especially under high labs load, it is an error prone procedure. In Fig. 1 some examples of negative (a–f) and positive (g–p) cases are shown. In the first line of positive examples (g–l) \(\beta \)-hemolysis is easily recognizable, even if appearing in a variety of different morphological forms and textures: in the middle of a confluence growth (g), over a written portion of the plate (h), or forming multiple rings (i) or in heavy mixed situations inside big confluences (l). These situations and their variability configure a first main challenge (we refer to it as Multiform Challenge, MC) for a machine which is asked to reliably classify images, usually well interpretable by a microbiologist, by containing false positives (FP) while maintaining a high recall. A second challenge (we refer to it as Detectability Challenge, DC) occurs in cases with soft hemolysis like in (m, n) and particularly in diagnostically relevant cases when early detection by humans start being difficult due to the presence of very thin halos (check for example (b) with respect to (o)), or when \(\beta \)-hemolysis is barely visible because hidden under the colony (see (p)). In this case both humans and machine-based techniques are particularly committed to prevent false negatives (FN) by maintaining a suitable degree of precision.

Fig. 1.
figure 1

Negative and positive \(\beta \)-hemolysis examples on blood agar plates. The dotted yellow line highlights the \(\beta \)-hemolytic regions.

1.2 Related Work and Contribution

So far, there has been only one and very recent work dealing with the problem of automated detection of hemolysis on agar plates [5], where a machine-learning method based on hand-crafted features is able to accurately classify hemolysis on image patches representing single colonies. On the one hand, this previous work does not handle detection on a whole plate basis and is not even able to handle most of the frequently occurring range of cases exemplified in Fig. 1 (under-the-colony, within confluences, very thin halo, over the written part cases) which characterize the clinical problem in its real complexity. On the other hand, classification in [5] comprises \(\alpha \)-hemolysis (which generates a brownish halo), which is however virtually absent and of no diagnostic interest for the throat-swab clinical context considered in our work.

Deep learning (DL) approaches, especially those based on Convolutional Neural Networks (CNN), have recently been shown to outperform feature-based machine learning solutions whenever difficult visual tasks and large datasets are involved. Applications of deep learning to medical image analysis started to appear consistently only very recently and nowadays are rapidly spreading [6]. Concerning DL methods in the field of DMI, Ferrari et al. [7] already proposed a system for bacterial colony counting, while Turra et al. [8] started investigating bacterial species identification by using hyper-spectral images. More in general, DL detection methods in Computer Assisted Diagnosis (CAD) contexts have been recently proposed for classification of skin cancer [9], cells and mitosis detection [10, 11], and mammographic lesions [12], to name a few.

In this work, by exploiting a dataset created for the purpose (as described in Sect. 2), we present a \(\beta \)-hemolysis detection technique, based on a region proposal stage (Sect. 3.1) followed by a CNN (Sect. 3.2) which classifies image patches as \(\beta \)-hemolytic or not. Our system is able to effectively cope with the highly diversified behaviour that \(\beta \)-hemolysis displays in the considered CML procedures involving throat swab cultures finalized to respiratory tract infections identification. Our approach overcomes all the limitations of [5] thus resulting the first one capable to work in real complexity conditions (i.e., facing both the above defined challenges MC and DC). We eventually validate the effectiveness of the method according to both patch-based and whole-plate tests to evaluate the quality of the classification stage and of the overall system, respectively (Sect. 4).

2 Throat Swab Culture Dataset

We collected a dataset from 1,500 culture plates coming from routine lab screening tests and produced by the inoculation of throat swab samples on REMEL 5% sheep blood agar media. Images came from a WASPLab FLA system (by Copan Diagnostics Inc.) which acquires, by linear scanning, 16-mega-pixel RGB color images. For each plate we retrieved both back-light and top-light acquisitions. The ground truth data for the training process consists in throat swab (1,200 plates), randomly selected from a one week of work in a medium size lab, and comprises the segmentation maps produced for the purpose by expert microbiologists that delineated \(\beta \)-hemolytic regions. This dataset is composed of 160 positive plates and 1,040 negative ones. In order to create a blind test-set for the overall evaluation of the system, we labelled another batch composed of 300 new plates acquired two weeks after with respect to the training one. In this case we only required specialists to give information accounting for the presence or not of \(\beta \)-hemolysis. In this case we had a proportion of 51 positive plates and 249 negative ones.

From the image database an image patch dataset can be created by considering \(150\times 150\) pixel patches extracted from the 1200 fully annotated plate images, labeling them as positives if at least one pixel from the delineated \(\beta \)-hemolytic regions falls inside a \(100\times 100\) pixel region centered with respect to the patch. Every patch is taken with a 33% of overlap so that every portion of the image falls inside the 100-pixel region. We choose 150 pixels as patch dimension as a good trade-off between the colony dimensions and the required computational effort. To the above patches extracted on a regular grid basis, we added all the patches generated by the differential region proposal approach that we explain in Sect. 3.1, resized to \(150\times 150\) pixels if needed, where again each patch is labeled negative or positive according to the same rule above. This is done to add more examples similar to those that will be encountered during test-time, when only patches coming from the region proposal are considered. At the same time, the use of a sliding window guarantees to collect a suitable amount of training material, being the region proposal tailored to the reduction of the analyzed patches. Moreover, since following natural CML proportions, negative patches would be about 50 times more than the positive ones, we randomly sample just a portion of them, from each plate, until the patch dataset results balanced. Finally, the full set of patches (about 160k in total) is further divided on a plate basis in two additional sets for the CNN training (70%) and validation (30%) processes.

3 \(\beta \)-Hemolysis Detection Method

In this section, we describe our approach to \(\beta \)-hemolysis detection, consisting of a patch extraction (region proposal) phase followed by a classification stage based on a specific CNN architecture. An overall scheme of the proposed solution is depicted in Fig. 2.

Fig. 2.
figure 2

The overall system for \(\beta \)-hemolysis detection.

3.1 Patch Extraction

In a common scenario the plate is covered by colony growth only in a minority portion with respect to the whole substrate. Moreover hemolysis usually involve (with few exceptions) only a small portion of the growth. This is why a sliding window patch extraction mechanisms for hemolysis detection and classification would be highly inefficient. To significantly increase the computational efficiency of our method we exploit the physical effect that hemolysis produces i.e., an erosion of the blood film, which results in a region in which more light is transmitted from below when acquired back-lit. Thus we adopt a region proposal solution which works on a differential image obtained by subtracting the back-lit image from the top-lit one. We process this image by bilateral filter denoising and morphological filtering in order to produce a map composed of high probability \(\beta \)-hemolytic blobs. Specifically, this map is obtained as \(max(|Img_{top} - Img_{back}|, t) \bullet K\), where \(\bullet \) is a morphological closure operating on the denoised differential image with a circular \(5\times 5\) structuring kernel K, and where t is a parameter impacting on the recall of the patch-proposal that mainly depends on the FLA illumination settings and plate manufacturer. All the parameters, including t, are tuned by using the patch database so as to produce a \(100\%\) recall region proposal (no FNs). As a last step we use this map to create a list of possible hemolytic regions to be extracted from the back-light plate for the subsequent classification phase. In particular \(150\times 150\) patches are created with smaller regions in their centre or by subdivision of larger regions.

3.2 Patch Classification

For the patch-classification phase we need a state-of-the-art CNN architecture particularly suited to be used on datasets with similar dimension and complexity to ours. This is why we selected DenseNet [13] which is composed of a fully-interconnected series of layers that ensure maximum information flow and force an efficient use of the learned representation (Fig. 2 top-right). DenseNet exposes two parameters: the number of layers L which controls the vertical scale, while the growth rate k accounts for the horizontal scale (i.e., the number of filters). Moreover, to increase the computational efficiency we add a bottleneck layer before each convolutional layer (solution referred to as DenseNet-BC). We train the network from scratch following Xavier weight initialization. In this case in fact, due to the new type of images, fine-tuning approaches would lead to no performance improvement. We adopt Adam as optimizer, Keras framework with TensorFlow, and a Nvidia GPU. We perform 120 training epochs with an initial learning rate set to 0.01 and factor-two reduction on plateaus.

4 Results and Discussion

After a quantitative assessment of the complexity reduction factor produced by the region proposal method, we evaluate the obtained detection performance according to two different criteria: (1) Patch-based: we consider the ability to correctly identify and classify patches that present \(\beta \)-hemolysis from negative ones. This metric is useful to evaluate and guide CNN hyper-parameter tuning and training, and accounts for the high performance of the implemented solution in response to both MC and DC challenges. (2) Plate-based: we investigate the ability to correctly classify the whole plate, which is the ultimate clinically relevant target.

Patch Extraction. The adopted region proposal allows to extract image patches containing all regions with a high probability of \(\beta \)-hemolysis occurrence. Following the parameter selection described in Sect. 3.1 we indeed obtained no FN, with a concurrent \(20{\times }\) reduction in the number of patches to classify with respect to the sliding window generation used for dataset creation (Sect. 2).

Patch Classification. In Table 1 we report some results obtained with different configurations of DenseNet. We achieve best result with a medium capacity model, either using or not the bottleneck layer BC. This can be explained observing that medium-sized models have a number of trainable parameters which is more compatible to the dimension of our dataset. Bigger models tend to overfit and prevent to reach a good generalization. The adoption of conventional radiometric and geometric data augmentation techniques accounts for an improvement of about 0.2% already included in the final score. In Fig. 3(a) we show the confusion matrix of the best classifier (BC-Medium). FN errors are mainly due to borderline cases, which are also very difficult to discriminate to the naked eye, while FP patches are typically caused by light reflections creating misleading color effects on the plate. In the additional material we included both correctly classified patches as well as FP and FN cases. Results are very promising with both recall and precision approaching 99%. This demonstrates a highly satisfactory response to both the MC and DC challenges defined in Sect. 1.1. In Fig. 3(b), we show the CNN internal representation of the last hidden layer by using a reduced dimensionality visualization based on t-SNE, where a random portion of the validation patches is taken in input. This allows to appreciate the good level of separability of the two classes (with isolated rare exceptions).

Table 1. DenseNet models comparison on patch classification task (L is the depth and k the growth-rate as in [13]).
Fig. 3.
figure 3

(a) Normalized confusion matrix of the patch-based classifier. (b) T-SNE visualization of the last CNN hidden layer for \(\beta \)-hemolysis discrimination.

Plate Classification. We now apply the proposed pipeline to the 300 unseen plates (blind test-set). Without any post-processing we reach 83% precision and 99% recall with only 3 FN plates with a single and very light \(\beta \)-hemolytic colony in challenging conditions (in our cases near the plate border or below a colony). All the images of FN plates are given in the additional material, with TP, TN and FP meaningful examples as well. By using instead one third of the blind test-set to tune the classification threshold, and test again on the remaining, we reach a 90% precision with the same recall.

Finally, we compare our solution against the one in [5], using their publicly available plate based test-set. In a fair comparison, which requires the exclusion of all the colonies grown over the written portion, we reach almost the same recall of 96% with a significantly increased precision by 12% up to 87%. Beyond this improvement, our method handles \(\beta \)-hemolysis detection inside confluences (not considered in [5]) and over the usually large written plate portions as well, thus standing as a system able to better cope with the real problem complexity.

5 Conclusion

We presented a fully automatic method for \(\beta \)-hemolysis detection on blood agar plate images. We operated with a complexity reduction region proposal and with a representation learning approach based on DenseNet CNN for the classification of both single patches and full plates. Our solution evidenced highly satisfactory performance on a blind test-set and overcomes performance and functional limitations of a previous work. As a next step, we would like to integrate our method in a diagnostic workflow with the microbiologist-in-the-loop. Our feeling is that thanks to the achievements reached on both the multiform and detectability challenges, further impact can be expected in terms of consistency and efficiency as suggested in [14], where the combination of deep leaning predictions with the human diagnostic activity led to significantly improve the total error rate.