Keywords

1 Introduction

Haematology is the branch of medicine involved in the diagnosis and treatment of patients who have disorders of the blood and bone marrow. Haematologists provide direct clinical care to patients with diagnostic work in the laboratory. Their tasks are the study, diagnosis, monitoring, treatment and prevention of diseases related to the blood and the bone marrow. In some cases, predicting how the bone marrow may have contributed to a clinical condition may be more important than identifying the patient’s haematologic condition. Haematologists perform a wide range of laboratory tests to produce and interpret results assisting clinicians in their diagnosis and treatment of disease. For example, haematologists receive blood samples and check them for abnormalities. They look at blood film and if they suspect leukaemia or related pathologies, perform a bone marrow biopsy and examine and interpret the samples. The diagnosis must be made within a few hours since in some cases treatment must start immediately. Several diseases, disorders and deficiencies indeed affect the number and type of blood cells produced, their functions and their lifespan. Under normal health conditions, bone marrow releases only mature and nearly mature cells into the bloodstream. However certain circumstances can induce the bone marrow to release immature or abnormal cells into the circulation. Among the several tests to monitor the cells’ components proportions, the Complete Blood Count (CBC) indicates the numbers and types of cells in the peripheral circulation. The percentages of cells are compared with the reference ranges to determine if they are present in reasonable proportion to another, if a cell type is increased or decreased, or if immature cells are present. Reference ranges for blood tests are usually defined as the set of values in which 95% of the healthy population falls within. They are determined by collecting data from laboratory tests results performed over a representative set of the population. Quicker ways to perform an automatic CBC are the automatic cell counter or the flow cytometry, however, if the results from an automated cell count indicate the definite or possible presence of abnormal cells, a blood smear is realised and then analysed. Its analysis is particularly useful to categorise and identify pathological conditions that affect one or more types of blood cells or to monitor patients under treatment. It typically offers a description of the appearance of the cells as well as any cells abnormalities. The manual analysis of blood smears depends on the operator’s skills and opinion. It is lengthy and repetitive, suffers from the absence of standard procedure, and it is subjective because the same scene can produce different results if analysed by several operators. Image processing techniques help to automatise the blood cells counting procedure as well as to provide information about their morphology. In this work we investigate a technique to provide an automatic counting of the red blood cells. The rest of the paper is organised as follows. Section 2 gives a background about peripheral blood analysis. Section 3 illustrates some related works. Section 4 presents the proposed method for cell detection and counting. Section 5 shows experimental results. Finally, Sect. 6 gives discussions, conclusions and future aspects.

Fig. 1.
figure 1

Blood smear components: a real image and a schematic representation depicting all the leukocyte lineage

Fig. 2.
figure 2

Variations in RBCs appearance. (Top and middle) Shape and colour abnormalities: spherocyte, elliptocyte, tear, sickle, acanthocyte, echinocyte, keratocyte, byte, stomatocyte, target, schistocyte and rouleaux formation. (Bottom) Inclusions: Howell-jolly bodies, siderotic granules, basophilic stippling, Heinz bodies, malaria and nucleated RBC. (Color figure online)

2 Background

Under normal conditions, a blood image consists of three components: platelets, red blood cells (RBCs) and white blood cells (WBCs), as shown in Fig. 1. Platelets (or thrombocytes) are small non-nucleated disc-shaped cells. In homoeostasis, they lead to the formation of blood clots. A sufficient number of platelets is necessary to control bleeding, even though this problem could cause bleeding, because many of the extra platelets may be dysfunctional also though they appear normal. A platelet count is usually evaluated by preparing a blood smear to visualise any anomalies in shape or size directly. RBCs (or erythrocytes) are uniform in size with a diameter of 7–8 \(\upmu \text {m}\). They are round and flattened like a doughnut, due to the presence of haemoglobin that is peripherally located. It gives an area with a central pallor equal to 1–3 \(\upmu \text {m}\), approximately 30–45% of the diameter of the cells. Considering that RBCs have not the same shape in their different types, any significant number of cells different in shape or size may indicate the presence of a disease [9]. Identifying normal and abnormal erythrocytes is essential because automated cell counters have not replaced the well-trained eye yet. Erythrocytes’ colour is representative of haemoglobin concentration in the cell, while an abnormal shape may indicate the possible presence of a specific disease or disorder. The cytoplasm of all healthy RBCs is free of debris, granules, or other structures. Inclusions are the result of unique conditions, and their identification can be clinically helpful. Some examples of shape and colour abnormalities and inclusion bodies are shown in Fig. 2. WBCs or leukocytes are the biggest cells in the peripheral circulation, ranging in size from 10 to 20 \(\upmu \text {m}\). They have a nucleus surrounded by cytoplasm which permits to identify them more efficiently than the other regions, as their nucleus appear darker than the background. Nevertheless, the analysis and the processing of data related to the WBCs are problematic due to the wide cells variations in shape, dimensions and edges. The generic term leukocyte refers to a set of cells that are very different from each other. Indeed, although they are all derived from bone marrow stem cells, they differentiate into two main groups: cells containing granules, called granulocytic or myelocytic, and cells without granules called mononuclear or lymphoid. Thus, we can distinguish between these cells according to their shape or size, the presence of granules in the cytoplasm and the number of lobes in the nucleus. The lobes are the most considerable part of the nucleus, and thin filaments connect them to each other. WBCs mature into five distinct types, that include neutrophils, basophils and eosinophils for the granulocytic type and lymphocytes and monocytes for the non-granulocytic ones. Neutrophils compose the majority of WBCs in a healthy adult. WBCs have a short time in the peripheral circulation and alterations either in quantity or in the quality of a particular WBC can be dramatic for the patient. Numerous diseases and conditions can affect the absolute or relative number of WBCs and their appearance on a blood smear. They most often increase in number in individuals with allergies and parasitic infections, while more severe cases that need to be diagnosed are leukaemias. When the WBCs number rises, the peripheral smear usually shows more immature or abnormal cells. As previously said, the automated cells counters are not able to distinguish normal from abnormal cells and, even worse, they could fail due to the presence of abnormal cells. This is why many computer-aided systems from digitised images have been proposed in the last years.

3 Related Works

Among the proposed systems, few of them work on the whole analysis process, but they are mostly devoted to perform a single step or to analyse a single cell type. In particular, a considerable amount of work has been conducted to achieve leukocytes segmentation. For example, Madhloom [16] developed an automated system to localise and segment WBC nuclei based on arithmetical image operations and threshold operations. Sinha [21] attempted to differentiate the five types of leukocytes in cell images using a k-means clustering on the HSV colour space for WBCs segmentation and different classification models for cells differentiation. Often, images acquired from digital microscope are affected by uneven lighting and a very bright central area region, actually caused by the lens, lamp light and the presence of more marked shading area towards the corners. In this case, the use of a local approach for segmentation is more appropriate. In [7] a local fuzzy threshold has been proposed, to manage the local variations and the presence of noise or imprecision. Khan [12] proposed a method to count the cells. It uses an iterative threshold, determined from the histogram, to binarise the image. The count is performed by extracting the connected components, but neglecting overlapping or adjacent cells. Nguyen [19] also proposed a method to count all the cells types but adding a step to solve the overlapping cells problem that uses the distance transform. Unfortunately, this method produced good results only with the presence of almost round cells. The distance transform, in combination with the watershed algorithm, has also been used in [20] to separate cells agglomerates. The separation is less influenced by the shape of the cells, but it works only for small or simple cells agglomerates. Mahmood and Alomari instead [3, 17] proposed two methods to count the WBCs and RBCs that use the Circular Hough transform (CHT). Mahmood applied the CHT on binary images obtained from the Lab colour space, while Alomari modified the CHT to reduce the number of cells candidate by selecting the one with the higher probability. Also in [4, 23] the CHT has been used to count the RBCs starting from a binary image, instead the WBCs are in both cases counted by merely extracting the connected image component; thus they do not take into account the presence of touching leukocytes. The only difference between these two approaches is the segmentation phase, in the first case performed using k-means and in the second case using thresholds. Alilou has proposed a completely different approach in [2], where a detection phase using grey level co-occurrence matrix has been applied directly on the original images without a previous segmentation. As can be guessed, it produces a significant amount of false positives since it works without any restriction on the area of interest.

4 Our Approach for Cells Detection

Image region proposals are now considered as valid alternatives to objects detection algorithms as they can locate objects regions efficiently [18]. The computational efficiency was firstly shown with R-CNN [10]. After then, many other similar systems have been proposed. The most common region proposal algorithms include Selective Search [22] and Edge Boxes [24]. Edge Boxes is a novel approach for generating object bounding box proposals directly from edges. Built on the Structured Edge Detector [8], Edge Boxes uses the number of enclosed edges to find proposals and the number of edges at the border of boxes to rank them. In [11] the authors demonstrated experimentally that Edge Boxes is now the state-of-the-art region proposal system. For this reason, we have utilised this approach to address our problem too. R-CNN procedure is not needed in our approach since our interest is only devoted to cells location and it is unnecessary complex for the task we faced.

4.1 Edge Boxes

Edges provide a sparse but informative representation of an image. Also, the number of contours that are wholly contained in a bounding box is indicative of the likelihood of the box containing an object. By scoring a box based on the number of contours it wholly encloses generates a particularly useful proposal measure. On the contrary, merely counting the number of edge pixels within the box is not as informative. In the Edge Boxes system, a simple box objectness score is proposed. This score measures the number of edges that occur in the box minus those that are members of contours that overlap the box’s boundary. An object is not searched at every image location and scale. Alternately, a set of object bounding box proposals is first generated to reduce the set of positions that need to be further analysed and consequently decreasing spurious false positives too. The approach is quite similar to superpixels straddling measure introduced by [1]; however, rather than measuring the number of straddling contours, such contours are removed from consideration. To obtain the initial edge map the fast and publicly available Structured Edge detector proposed in [8] is utilised. Given the dense edge responses, a procedure of Non-Maximal Suppression orthogonal to the edge response to finding edge peaks is then performed. Candidate boxes by using a sliding window approach are evaluated, in the same way as in traditional object detection. At every possible object position, scale and aspect ratio, a score indicating the likelihood of an object being present is generated. Promising candidate boxes are further refined using a simple coarse-to-fine search. By using efficient data structures, the approach is capable of rapidly finding the top object proposals from among millions of potential candidates.

4.2 Cells Detection Strategy

Since the Edge Boxes approach has been proposed for more general tasks, we added some useful steps to address our specific task, that is cells detection, by incorporating knowledge-based constraints in the detection strategy. Initially, to enhance the edges, both on the border of the cells and between overlapped cells, we perform a morphological gradient to the original image using a disk-shaped structuring element and size related to the expected cell size. The gradient image is the input image for the Edge Boxes algorithm which returns a ranked set of objects proposals containing cells. However, the number of top-scoring proposals are still high; also only some of them include single cells, that are our objects of interest.

To reduce the candidate bounding boxes to those consistent to single cells regions, we incorporate knowledge-based constraints to refine the edge boxes proposals and so remove false positive bounding boxes from consideration. The first observation is related to the bounding box size. We are interested only to proposals whose size is well matched to expected cell size. As a consequence, a larger (or smaller) box than the expected one can be immediately removed from consideration. Another important feature related to a red blood cell is its medium grey tone that is always greater than the grey tone of a white blood cell. So a candidate box contains a red cell if its medium grey tone is higher than the grey tone typically related to a white cell (in all our experiment the thresholding value has been chosen equal to 110). Finally, a box is of interest if it contains a circular object with a radius consistent with the expected cell size. Otherwise, if it does not contain any circle, or it contains more then one, the candidate box has to be discarded. Indeed, we are considering significant only boxes containing a single cell. Since we are analysing bounding boxes whose size is compatible with cell sizes, a box could contain at most two agglomerated cells. But before discarding such a box, we perform a further check, considering that, sometimes, even a single cell with an abnormal shape (see Fig. 2) could produce more circles. Thus, we consider as significant, a box containing circles whose area of union less intersection is less than half of the bounding box size, meaning that they belong to the same cell. All the candidate proposals returned by Edge Boxes algorithm satisfying the described constraints are selected as cells bounding boxes.

Promising selected cells boxes are further refined using a Non-Maximal Suppression procedure to eliminate overlapping bounding boxes. The confidence score used in all the experiments has been chosen equal to 0.1, and the bounding box overlap ratio is the area of intersection divided by the area of the union of two boxes.

The resulting strongest bounding boxes are detected as the most representative red blood cells in the analysed image and can be then quantified by annotating each cell centroid with a blue dot. A pipeline of the proposed cells detection strategy is shown in Fig. 3.

Fig. 3.
figure 3

A pipeline of the cells detection strategy. (Color figure online)

5 Experimental Results

For our experimentation, we used the open source MATLAB implementation of Edge Boxes made available by the authors [8]. The algorithm proposes a set of default parameters, tuned for generic datasets such as PASCAL, CalTech and ImageNet. For our dataset, we had to refine some of them to improve the quality of region proposals. The parameters modified were Alpha and Beta. Alpha indicates the Intersection over Union (IoU) for neighbouring candidate boxes; step sizes of the sliding windows are determined such that one step results in an IoU of Alpha. Beta is the Non-Maximal Suppression threshold of an object proposal, that is, if a neighbouring box is with an IoU greater than Beta, the lower ranked box is removed. We found that these two variables are the most influential at changing Edge Boxes’ proposals. So, in our experiment Alpha and Beta have been chosen equal to 0.9 and 0.5, respectively. Another parameter modified was the max aspect ratio of boxes, chosen equal to 1.5. Other parameters such as the minimum score, Gamma and Kappa, did not affect the proposals significantly enough, so they were set as the defaults. Every other modules of the cells detection’s proposed system have been implemented in MATLAB too. Average time to complete the entire procedure has been measured over the whole dataset images, reaching on average 35.24 s per image.

5.1 Dataset Description

The Acute Lymphoblastic Leukaemia image database ALL-IDB [13] has been used to test the proposed method. It is a public image dataset of peripheral blood samples from healthy individuals and leukaemia patients. The experts collected these samples at the M. Tettamanti Research Centre for childhood leukaemia and haematological diseases, Monza, Italy. The ALL-IDB database is composed of two versions: ALL-IDB1 and ALL-IDB2. The ALL-IDB1 can be used for testing the segmentation capability of algorithms, as well as the classification systems and pre-processing image methods. This version is composed of 108 images captured with an optical laboratory microscope coupled with an Olympus Optical C2500L camera or a Canon PowerShot G5 camera. All images are in JPG format with 24-bit colour depth. The first 33 have \(1712 \times 1368\) resolution, while the remaining have \(2592\times 1944\) resolution. The images were taken with different magnifications of the microscope, ranging from 300 to 500, which brings the colour and brightness differences, that we managed by grouping together the images with same brightness characteristics. The ALL-IDB2 is a collection of cropped areas of interest of healthy and blast cells that have been extracted from the ALL-IDB1 dataset. It contains 260 images and the 50% of these represent lymphoblasts. Some image examples belonging to the ALL-IDB1 are shown in Fig. 4.

Fig. 4.
figure 4

Sample images from ALL-IDB1. They present different colouration and illumination conditions, therefore we identified 7 separate image sets. From top-left to bottom-right: examples representing each of the 7 identified subsets.

Table 1. Detection performances of our approach on ALL-IDB: all dataset images and each identified image subset.
Table 2. RBCs counting performances compared with the state-of-the-art.

5.2 Results and Comparison with State-of-Art

As proposed in literature we evaluated the counting performances using accuracy, precision, recall, F-measure. The dataset has been divided into 7 different image subsets by grouping the images according to their magnification, brightness and colour conditions. In particular, Set1 contains images ranging from 1 to 33; Set2 goes from 34 to 47; Set3 from 48 to 63; Set4 from 64 to 77; Set5 from 78 to 93; Set6 from 94 to 95 and, finally, Set7 from 96 to 108. Table 1 presents the statistical measures regarding both all the dataset and the different image subsets. The ground truth for all the images has been determined by an expert and used to validate the proposed method. The counting results on the whole ALL-IDB are reported in Table 2, where they have been directly compared with the results obtained by other authors on the same dataset. In particular, the method proposed in [17] uses the CHT applied to different colour spaces without any restriction on the area of interest. This method was tested over 10 images and produced an accuracy of 64%. In both [4, 23] the CHT has been applied after a segmentation phase, introducing some restriction on the area of interest to count the RBCs. In [4] the segmentation process has been performed using a fixed threshold over the G channel of the RGB colour space. They completed the counting just on 14 random images, obtaining an accuracy of 92.6%. In [23] they used K-means clustering for segmentation that brought to an accuracy of 90.9%, but the number of tested images is unknown. Also in [3], the circular Hough transform has been used, but in that case, the number of candidate circles has been reduced by selecting the one with the higher probability. This operation reduced the number of false positives but also increased the number of true positives, reaching an overall accuracy of 95% on 100 images. The method in [6] is based on a machine learning approach for segmentation by combining Nearest Neighbour and Support Vector Machine. It obtained an F-measure, a recall and a precision of 93%, 98% and 89%, respectively, yet limited to the first 33 dataset images. As it can be observed, the proposed approach improves the RBCs counting performances; in particular, it significantly enhances precision, reducing the number of false positives. To highlight the performances obtained with the proposed method, in Table 2 we also report the number of images used by the various authors to test their approach. Indeed none of the state-of-the-art techniques has been tested on the whole dataset. For this reason, we have also tested our method on different image subsets, demonstrating that some of them are easier than the others and at the same time to show how the performances can be affected by the number and kind of test set images.

6 Conclusions

In this paper, we proposed a novel and efficient method for detecting and quantifying red cells from a microscopic blood image. The proposed system is based on a region proposal approach, namely the Edge Boxes, considered as the state-of-art region proposal method that can evaluate millions of candidate boxes in a fraction of a second, returning a ranked set of a few thousand top-scoring proposals. Incorporating knowledge-based constraints, that can be extracted easily from some cell samples, we can find cells proposals rapidly and efficiently, also in case of overlapping or adjacent cells. Results show both improved accuracy and increased over the state-of-art. We are now working on an extension of the proposed system to create a fully complete blood cells detector, able to detect and count both red and white cells from a microscopic peripheral blood image, including morphological [5] and textural constraints [6]. Next future work will also include an evaluation of the framework on a new blood image dataset [14], containing images of malarial blood smears to detect and quantify the level of parasitaemia.